Analysis of large data sets: Bayesian methods and applications in energy and health economics
The availability of large data sets is increasing dramatically, reshaping decision-making in many domains, such as energy, education and health. Data sets may be large in two dimensions: in the number of observations and in the number of variables. This thesis mainly deals with the first case. Often, large data sets arise as a byproduct of emerging technologies, possibly allowing very detailed measurements in space or time. For instance, current continuous glucose monitoring systems measure blood sugar levels every five minutes, while smartphone data may provide precise information on the location of persons. While existing statistical methods were not developed for small data sets per se, their direct application to large data sets is often problematic, even though many methods are justified by large sample asymptotics. These problems may be inferential, for instance common testing procedures may break down in practice. This thesis is mainly concerned with computational problems, as common estimation algorithms are often too time or memory consuming to use with large data sets. The analysis of large data sets using the appropriate methodology allows researchers to ask new kinds of research questions or to recast old ones, benefiting from the resultant statistical precision. Furthermore, large data methods offer useful tools to solve applied problems.
This thesis aims to contribute to the statistical analysis of large data sets. These contributions are twofold: First, this thesis lightens the computational demands of existing methods, improving applicability to large-scale problems. Second, this thesis uses large data methods to solve problems with important policy implications in energy and health economics. Consequently, this thesis is divided into a methodological and an econometric part, where each part consists of two essays. The first part consists of two single-authored essays developing statistical methods for Bayesian analysis of large data sets. The second part of this thesis consists of two co-authored essays in health and energy economics.
Preview
Cite
Rights
Use and reproduction:
This work may be used under aCreative Commons Attribution - NonCommercial - NoDerivatives 4.0 License (CC BY-NC-ND 4.0)
.