2019 Colloquia
Causal Inference in the Presence of Interference
Michael Hudgens, Ph.D.
University of North Carolina at Chapel Hill
A fundamental assumption usually made in causal inference is that of no interference between individuals (or units), i.e., the potential outcomes of one individual are assumed to be unaffected by the treatment assignment of other individuals. However, in many settings, this assumption obviously does not hold. For example, in infectious diseases, whether one person becomes infected may depend on who else in the population is vaccinated. In this talk we will discuss recent approaches to assessing treatment effects in the presence of interference.
Thursday, November 14, 2019
3:30 p.m. - 4:45 p.m.
Helen Wood Hall - Room 1W-501
Classification of Human Activity Based on the Raw Accelerometry Data
Jaroslaw Harezlak, Ph.D.
Indiana University
Wearable accelerometers offer a noninvasive measure of physical activity (PA). However, quantification of PA in a free-living environment is a challenging task. I will summarize our work utilizing data collected from tri-axial wrist-worn accelerometers quantifying sedentary, upright and ambulatory behavior. A number of algorithms extracting features of physical activity and their associations with health outcomes will be presented. First, I will describe our work on classification of walking into level walking, descending stairs and ascending stairs. Second, I will discuss differentiation between the sedentary behavior and upright activities. Methodology developed will be illustrated using data collected by my group (walking activities) and my collaborators (sedentary vs. upright).
Thursday, October 24, 2019
Aligning Data Normalization with Analysis Goals for Reproducible Genomic Studies
Li-Xuan Qin, Ph.D.
Memorial Sloan Kettering Cancer Center
Data normalization is an important preprocessing step for genomic data that contain unwanted variations due to disparate experimental handling. While methods for data normalization have been developed in the context of group comparison with limited differential expression, they have encountered frequent uses for an otherwise unapproved inference goal such as sample classification, an important quantitative tool that is in dire need to tailor treatment choices for personalized medicine. To study this critical yet over-looked disconnection between the use of data normalization and the goal of subsequent analysis, we have collected a unique pair of microarray datasets on the same set of tumor samples at Memorial Sloan Kettering Cancer Center and conducted extensive simulation studies based on novel resampling schemes. In this talk, I will report our findings on how data normalization impacts the analyses of sample classification and group comparison with moderate differential expression, and suggest an alternative approach to more effectively deal with the unwanted variations in genomics data.
Thursday, October 17, 2019
Should We Model X in High-Dimensional Inference?
Lucas Janson, Ph.D.
Harvard University
Many important scientific questions are about the relationship between a response variable Y and a set of explanatory variables X, for instance, Y might be a disease state and the X's might be a person's SNPs, and the question is which of these SNPs are related to the disease. For answering such questions, most statistical methods focus their assumptions on the conditional distribution of Y given X (or Y | X for short). I will describe some benefits of shifting those assumptions from the conditional distribution Y | X to the joint distribution of X, especially for high-dimensional data. First, modeling X can lead to assumptions that are more realistic and verifiable. Second, there are substantial methodological payoffs in terms of much greater flexibility in the tools an analyst can bring to bear on their data while also being guaranteed exact (non-asymptotic) inference. I will briefly mention some of my recent and ongoing work on methods for high-dimensional inference that model X instead of Y, as well as some challenges and interesting directions for the future.
Thursday, October 3, 2019
Uncovering the Mechanisms of General Anesthesia: Where Neuroscience Meets Statistics
Emery Brown, M.D., Ph.D.
Massachusetts Institute of Technology
Harvard Medical School
2019 Andrei Yakovlev Colloquium
Thursday, September 19, 2019
Some Inferential Tools for Health Policy & Outcomes Research
Sharon-Lise Normand, Ph.D.
Harvard Medical School
2019 Charles L. Odoroff Memorial Lecture
Thursday, May 9, 2019
Discovering Effect Modification in Observational Studies
Dylan Small, Ph.D.
University of Pennsylvania
There is effect modification if the magnitude of a treatment effect varies with the level of an observed covariate. A larger treatment effect is typically less sensitive to bias from unmeasured covariates, so it is important to recognize effect modification when it is present. Additionally, effect modification is of interest for personalizing treatments based on an individual’s covariates. We present a method for conducting a sensitivity analysis in an observational study that empirically discovers effect modification by exploratory methods, but controls the family-wise error rate or false discovery rate in discovered groups. We will discuss an application of the method to an observational study of the effect of superior nursing at a hospital on surgical mortality.
Thursday, April 25, 2019
Data-Adaptive Regression Modeling in High Dimensions
Ashley Petersen, Ph.D.
University of Minnesota
In recent years, it has become easier and less expensive to collect and store large amounts of data in a number of fields. This has amplified interest in the development of statistical methods to adequately model this data. With high-dimensional data, the traditional plots used in exploratory data analysis can be limiting, given the large number of possible predictors. Thus, it can be helpful to fit sparse regression models, in which variable selection is adaptively performed, to explore the relationships between a large set of predictors and an outcome. For maximal utility, the functional forms of the covariate fits should be flexible enough to adequately reflect the unknown relationships and interpretable enough to be useful as a visualization technique. In this talk, we will provide an overview of recent work in the area of sparse additive modeling that can be used for visualization of relationships in big data. In addition, we will present recent novel work that fuses together the aims of these previous proposals in order to not only adaptively perform variable selection and flexibly fit included covariates, but also adaptively control the complexity of the covariate fits for increased interpretability.
Thursday, April 11, 2019
Statistical and Computational Aspects in the Analysis of Genomic Data from Family Based Designs
Ingo Ruczinski, Ph.D.
Johns Hopkins University
Family based study designs are regaining popularity because large-scale sequencing can help to interrogate the relationship between disease and variants too rare in the population to be detected through any test of association in a conventional case-control study, but may nonetheless co-segregate with disease within families. In addition, family based designs also allow for the assessment of de novo events and parent-of-origin effects. In this presentation, we focus on statistical and computational aspects in the analysis of sequencing data from nuclear families with affected probands and extended multiplex families, with an emphasis on improvements in scalability and new methods for causal variant detection.
Thursday, April 4, 2019
Parallel Markov Chain Monte Carlo Methods for Bayesian Analysis of Big Data
Erin Conlon, Ph.D.
University of Massachusetts
Recently, new parallel Markov chain Monte Carlo (MCMC) methods have been developed for massive data sets that are too large for traditional statistical analysis. These methods partition big data sets into subsets, and implement parallel Bayesian MCMC computation independently on the subsets. The posterior MCMC samples from the subsets are then joined to approximate the full data posterior distributions. Current strategies for combining the subset samples include averaging, weighted averaging and kernel smoothing approaches. Here, I will discuss our new method for combining subset MCMC samples that directly products the subset densities.
While our method is applicable for both Gaussian and non-Gaussian posteriors, we show in simulation studies that our method outperforms existing methods when the posteriors are non-Gaussian. I will also discuss computational tools we have developed for carrying out parallel MCMC computing in Bayesian analysis of big data.
Thursday, February 14, 2019