Sparse Nonparametric Bayesian Learning from Big Data

RAND Statistics Seminar Series

Sparse Nonparametric Bayesian Learning from Big Data

Presented by Dr. David Dunson—Dept. of Statistical Science, Duke University

Date: Thursday, May 3rd, 2012
Time: 10:30 a.m.–12:00 p.m. Pacific / 1:30–3:00pm Eastern
Host Location: Santa Monica, room, Forum 1224
Other Locations: Pittsburgh, room 6202 & Washington, DC room 4128

Please contact Fabiola Lopez if you would like to attend this seminar.

Abstract

In modern applications, data sets tend to be big and highly structured, with large p, small n problems commonly encountered. In such settings, sparse representations of the data are crucial and there is a rich frequentist literature focused on inducing sparsity through penalization (typically L1). Motivated by genetic epidemiology and imaging applications, we instead develop nonparametric Bayesian methods that avoid parametric assumptions while favoring low-dimensional representations of complex high-dimensional data.

In this talk, the particular focus is on Bayesian probabilistic tensor factorizations, which generalize low rank matrix factorizations, such as SVD, to higher orders. The framework accommodates general joint modeling of object data of different types (images, text, categorical, real, etc.) but for simplicity we focus on two applications: (1) high-dimensional multivariate categorical data analysis (contingency tables) and (2) estimation of lower dimensional manifolds from point cloud data. In the contingency table case, we propose a collapsed Tucker factorization and develop associated methods for testing of associations and interactions in huge sparse tables. In the manifold learning case, we propose a tensor product of basis functions for estimating 3d closed surfaces. In both settings, theoretical results are provided on large support and asymptotic properties & efficient computational methods are developed, which scale to large data sets.

*Joint work with Anirban Bhattacharya and Debdeep Pati

Speaker Bio

Dr. David Dunson is Professor of Statistical Science at Duke University. His research focuses on Bayesian statistical theory and methods motivated by high-dimensional and complex applications. A particular emphasis is on nonparametric probability models and on joint modeling of high-dimensional data of different types, including images, functions, shapes, text and other complex objects. Dr. Dunson is a fellow of the American Statistical Association and of the Institute of Mathematical Statistics. He is winner of the 2007 Mortimer Spiegelman Award for the top public health statistician, the 2010 Myrto Lefkopoulou Distinguished Lectureship at Harvard University and the 2010 COPSS Presidents’ Award for the top statistician under 41.

Attending the Seminar

RAND visitors are welcome to attend and must RSVP at least one day prior to the seminar. To ensure your attendance, please contact Fabiola Lopez at flopez@rand.org with your name, company (or university) affiliation, and national citizenship (for security purposes).

For parking and directions to RAND's Santa Monica office, please see: http://www.rand.org/about/locations/santa-monica.html.

For parking and directions to RAND's Pittsburgh office, please see: http://www.rand.org/about/locations/pittsburgh.html.

For parking and directions to RAND's Washington D.C. office, please see: http://www.rand.org/about/locations/washington.html.

For further information and to be added to the mailing list contact Fabiola Lopez at flopez@rand.org.