Practical Semiparametric Inference with Bayesian Nonparametric Ensembles
General Material Designation
[Thesis]
First Statement of Responsibility
Liu, Jeremiah Zhe
Subsequent Statement of Responsibility
Coull, Brent A.
.PUBLICATION, DISTRIBUTION, ETC
Name of Publisher, Distributor, etc.
Harvard University
Date of Publication, Distribution, etc.
2019
GENERAL NOTES
Text of Note
132 p.
DISSERTATION (THESIS) NOTE
Dissertation or thesis details and type of degree
Ph.D.
Body granting the degree
Harvard University
Text preceding or following the note
2019
SUMMARY OR ABSTRACT
Text of Note
Set in the practical situation where the data-generating process is not known and there are multiple imperfect candidate models available, this thesis studies how to construct an approximation model that optimally captures the relevant aspect of the data, for the purpose of conducting sound inference. We consider three types of inference objectives: hypothesis testing (Chapter 2), spatiotemporal prediction (i.e. estimating conditional mean) (Chapter 3), and uncertainty quantification (i.e. estimating distribution function) (Chapter 4). We focus on regression models for continuous outcome. Specifically, we propose Bayesian Nonparametric Ensemble (BNE), a general modeling approach that combines the a priori information encoded in candidate models using ensemble methods, and then addresses the systematic bias in the candidate models using Bayesian nonparametric machinery. As a result, BNE specifies a large model space that is centered around the ensemble of candidate models. Through both theoretical investigation and extensive numeric studies, we show that the proposed approach achieves a valid and powerful test for nonlinear effects (Chapter 2), improves predictive performance (Chapter 3), and provides calibrated quantification of its varying degree of model uncertainty over the feature space (Chapter 4). The proposed method is applied to the detection of nutrition-environment interaction effect on early-stage neuro-development in Bangladesh children, and the integration of multiple spatial prediction models for PM 2.5 levels in Eastern Massachusetts, USA.