RAND Statistics Seminar Series

High Dimensional Classification Using Feature Annealed Independence Rules

Presented by Yingying Fan, Ph.D., University of Southern California
Thursday, January 14, 2010
10:30 a.m. – 12:00 p.m. PT / 1:30pm – 3:00pm ET
Conference Room 5312
RAND Corporation, Santa Monica, CA
Please contact Denise Miller if you would like to attend this seminar.

Abstract

Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose an independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Feature Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis strongly support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

Speaker Bio

Yingying Fan's research interests include high dimensional statistical inference, classification and variable selection, nonparametric statistics, and financial econometrics. Her papers have been published in journals including the Annals of Statistics, Journal of the American Statistical Association, Journal of Econometrics, and Journal of Financial Econometrics. She was Lecturer in the Department of Statistics at Harvard University from 2007-2008. She is the recipient of National Science Foundation Grant (2009-2012, PI). She also serves in the membership committee of the International Chinese Statistical Association (2009-Present).



Attending a Seminar

Other Locations/Times:
Washington, D.C. Conf. Rm. 4132: 1:30 p.m. ET
Pittsburgh Conf. Rm. 6202: 1:30 p.m. ET

RAND visitors are welcome to attend and must RSVP at least one day prior to the seminar. To ensure your attendance please contact Denise Miller at dmiller@rand.org with your name, company (or university) affiliation, and national citizenship (for security purposes).

For parking and directions to RAND's Santa Monica office, please see: http://www.rand.org/about/locations/santa-monica.html.

For parking and directions to RAND's Pittsburgh office, please see: http://www.rand.org/about/locations/pittsburgh.html.

For further information and to be added to the mailing list contact Denise Miller at dmiller@rand.org.