RAND Statistics Seminar Series

Strategies for Model-Based Imputation in High-Dimensional Incomplete Data Sets

Presented by Thomas R. Belin, UCLA Dept. of Biostatistics
Thursday, May 11, 2006
RAND Corporation, Santa Monica, CA
Please contact Stephanie Thomas if you would like to attend this seminar.

Abstract

It is common in applied research to have large numbers of variables measured on a modest number of cases, with a variety of types of data (continuous, binary, ordinal categorical, nominal categorical, etc.) Longitudinal data and other clustered data structures are also common. This talk will present various methods that have emerged as part of an effort to develop broadly applicable and flexible model-based imputation methods for high-dimensional data sets. Key ideas include handling missing continuously-scaled items using factor-analysis ideas to reduce the number of covariance parameters to be estimated in a multivariate normal model (Song and Belin 2004), using growth-curve models and factor-analysis ideas together for longitudinal continuously- scaled variables (Wang and Belin 2002); using a parameter-extended Metropolis-Hastings algorithm to sample the correlation matrix in a multivariate probit model in a manner that lends itself to extensions to several ordinal variables (Zhang, Boscardin, and Belin 2003; Boscardin, Zhang, and Belin 2004), and applying the parameter-extended Metropolis-Hastings idea to a multinomial probit model in a manner that lends itself to extensions to several nominal categorical variables (Zhang, Boscardin, and Belin 2005). Examples are offered to illustrate the methods, and simulation studies are used to explore statistical properties and to compare procedures with potential alternative approaches.

References

Boscardin WJ, Zhang X, Belin TR. Modeling a mixture of ordinal and continuous repeated measures. Proceedings of the American Statistical Association Section on Bayesian Statistical Science, 2004; 15-22.

Song J, Belin TR. Imputation for incomplete high-dimensional multivariate normal data using a common factor model. Statistics in Medicine, 2004; 23:2827-2843.

Wang J, Belin TR. Handling incomplete high dimensional multivariate longitudinal data by multiple imputation using a longitudinal factor analysis model. Proceedings of the American Statistical Association Section on Statistical Computing, 2002; 3615-3620.

Zhang X, Boscardin WJ, Belin TR. Sampling algorithms for correlation matrices. Proceedings of the American Statistical Association Section on Bayesian Statistical Science, 2003; 4743-4750.

Zhang X, Boscardin WJ, Belin TR. Multivariate extensions to multinomial probit models using parameter-extended Metropolis-Hastings. Proceedings of the American Statistical Association Section on Bayesian Statistical Science, 2005; 169-176.


Note: Washington, D.C. Conf. Rm. 7401; Pittsburgh Conf. Rm. 6202



Attending a Seminar

RAND Visitors are welcome to attend the statistics seminars, but must RSVP at least one day prior to the seminar. To ensure your attendance, contact Stephanie Thomas at sjthomas@rand.org with your name, company or university affiliation, and national citizenship (for security purposes). Light refreshments will be served.

For directions to RAND see: http://www.rand.org/about/locations/santa-monica.html
Visitors must enter through the north-parking garage, which is accessible from Main Street in our new office building. Inform the attendant that you are there for the Statistics Seminar Series and you will be directed to the appropriate parking area. If there is no attendant present, use the intercom and tell them that you are here for the Statistics Seminar Series. After parking, follow the instructions to the appropriate conference area. (1776 Main Street)

Reminder: the old RAND surface parking lots have been permanently closed.

For further information and to be added to the mailing list contact Stephanie Thomas at sjthomas@rand.org.