RAND Statistics Seminar Series

Valued Ties Tell Fewer Lies: Why Not To Dichotomize Network Edges With Thresholds

Presented by Andrew C. Thomas, Visiting Assistant Professor, Carnegie Mellon University
Thursday, December 16, 2010
1:30 p.m. – 3:00 p.m. ET
Conference Room 6202, RAND Corporation, Pittsburgh, PA

Other Locations/Times:
Washington, D.C., Conf. Rm. 4132 1:30 p.m. ET
Santa Monica, CA, Conf. Rm. 1226/1228 10:30 a.m. PT
Please contact Denise Miller if you would like to attend this seminar.


The vast majority of statistical recipes for modeling the edges in complex networks (and for processes on these networks) have focused on binary ties; as a result, investigators that have valued measurements for edge values often rely on ad hoc methods to produce a single binary substitute for use in the analysis to follow. The most common method is to dichotomize the values by choosing a threshold value, and declaring an edge to exist if the threshold is exceeded. While the consequences of dichotomization in other settings are well-known and manageable — such as the dichotomization of predictors in linear regression — I demonstrate that even using "principled" methods of dichotomization can lead to massive, and asymptotically unbounded, losses of efficiency when examining processes that occur on networks, such as contagious influence; in other cases, the dichotomization procedure can produce vastly warped perceptions of the importance of certain nodes and edges in the network itself. I conclude by proposing alternatives to the thresholding procedure that may preserve the features of interest from the original network, whether the resulting product is valued or binary. This is joint work with Joseph K. Blitzstein.

Speaker Bio

Andrew C. Thomas is a Visiting Assistant Professor in the Department of Statistics, and a visiting faculty member in iLab at the Heinz College, at Carnegie Mellon University. He received his PhD from Harvard University in 2009; his primary research interests are the modeling of and processes on complex networks, with particular interest in systems with variety in the strength of ties, as well as related topics in Bayesian hierarchical modeling and computation.

Attending a Seminar

Visitors to RAND's Santa Monica and Pittsburgh locations are welcome to attend and must RSVP at least one day prior to the seminar. To ensure your attendance please contact Denise Miller at dmiller@rand.org with your name, company (or university) affiliation, and national citizenship (for security purposes).

For parking and directions to RAND's Pittsburgh office, please see: http://www.rand.org/about/locations/pittsburgh.html.

For parking and directions to RAND's Santa Monica office, please see: http://www.rand.org/about/locations/santa-monica.html.

For further information and to be added to the mailing list contact Denise Miller at dmiller@rand.org.