The Problem of Data Adequacy in Applied Statistics
This thesis focuses on some important methodological issues in applied statistics. More precisely, the adequacy of data for empirical research is discussed. In particular, three stages of the process of producing and using official statistics are investigated and their degree of uncertainty is quantified. The stages are data processing, validation and analysis. At each of these three stages one of the following research questions is answered. First, the producer view is taken by asking "how should price indices be calculated from micro data in official statistics?" Second, the views of both users and producers are considered by posing the question "how reliable are timely published official statistics partially based on estimates?" Last, the problem "how should one estimate econometric models with micro data from official statistics?" is treated from the user perspective. In answering these questions the following significant contributions to the literature are made in each of the three main chapters. The first main chapter analyses the processing of raw data in terms of calculating elementary price indices in foreign trade statistics. Most of the literature on elementary price indices discusses the choice of a particular index formula based on the axiomatic approach (cf. Eichhorn, 1978, and Diewert, 1995). This approach states properties which an index formula should desirably fulfil and checks which axioms are actually fulfilled. The importance of axioms in general depends heavily on the purpose of the index formula in question and, to some extent, on personal preferences. In any case, the axiomatic approach is of little guidance in choosing the elementary index (for which weights are not available) corresponding to the characteristics of the index at the second stage (where weights are actually available). It exclusively deals with the mathematical properties of an index formula. Thus, it is an all or nothing decision in favour of or against it. It consequently completely neglects the most relevant issue in practice as to what extent a condition is not fulfilled. The statistical approach newly developed in this chapter fills this void. It contributes to the literature by looking at how numerical equivalence between an unweighted elementary index and a weighted aggregate index can be achieved, independent of the axiomatic properties. It is shown that the solution to the problem of elementary indices that correspond to a desired aggregate index depends on the empirical correlation between prices and quantities, in particular on the price elasticity. Based on this, consistency between price and volume measurement is achieved. In addition to the analytical derivation, this is demonstrated empirically in an application using data from German foreign trade statistics. In the second main chapter the validity of the data is checked by decomposition of revisions of real time data in a seasonal adjustment context. Revisions occur if preliminary data are updated and estimates are replaced with actual figures. Moreover, new data lead to revisions of the seasonally adjusted time series - even if old preliminary data remain unchanged. The limited literature on decomposition of revisions deals almost completely with the question of whether revisions are "news or noise" (cf. Mankiw and Shapiro, 1986). This strand of the literature discusses the informational content of revisions. Revisions are considered to be "news" if they are orthogonal to the full information set available at the time the first estimate was published, i.e. they are unpredictable. Vice versa, if revisions are indeed predictable, the first estimate is a "noisy" measure of the most recent one, hence it is an inefficient forecast as revisions correlate with a subset of the data available for the first estimate. In contrast, this chapter is concerned with decomposition of revisions into their sources. For that, a new procedure is developed and implemented within the framework of the seasonal adjustment method X-12-ARIMA which is the contribution to the literature. This is relevant because the ability of a seasonal adjustment method to produce low revisions from its technical procedure can be thought of as being a quality characteristic for it. In an empirical application to five important German business cycle indicators, revisions of unadjusted real time data are found to play a larger role than those stemming from the seasonal adjustment method. This result is not self-evident as, for example, for European time series it is frequently argued that the seasonal adjustment method is the main reason for revisions. The focus of the third main chapter is on data analysis; here, dynamic panel data models are estimated by means of GMM with more robust, in this case factorised, instruments. Almost all of the empirical literature and the major proportion of the theoretical literature focus exclusively on the exogeneity assumption. Exogeneity means that the instruments are uncorrelated with the error term. If this were true, white noise processes would be ideal instruments as they per definitionem correlate with nothing. But in fact this is only half of the story: white noise processes as instruments are consistent with virtually any estimate of the parameters of interest. The other half of the story is the relevance assumption (cf. Staiger and Stock, 1997). This assumption states that the instruments correlate with the endogenous regressor(s). The weaker the instrument set becomes, the worse are the small-sample properties of any instrumental variables estimator, such as GMM. The contribution of this chapter to the literature is the proposition of a methodology that has improved finite-sample properties. In particular, the instrument set is factorised so that its informational content is condensed in a much lower number of instruments employed in the estimation. This results in lower biases and lower RMSEs which is shown by Monte Carlo simulations.