Statistical and Information Theoretic Approaches to Model Selection and Averaging

Antti Liski

    Research output: Book/ReportDoctoral thesisCollection of Articles

    121 Downloads (Pure)


    In this thesis we consider model selection (MS) and its alternative, model averaging (MA), in seven research articles and in an introductory summary of the articles. The utilization of the minimum description length (MDL) principle is a common theme in five articles. In three articles we approach MA by estimating model weights using MDL and by making use of the idea of shrinkage estimation with special emphasis on the weighted average least squares (WALS) and penalized least squares (PenLS) estimation. We also apply MS and MA techniques to data on hip fracture treatment costs in seven hospital districts in Finland. Implementation of the MDL principle for MS is put into action by using the normalized maximum likelihood (NML). However, the straightforward use of the NML technique in Gaussian linear regression fails because the normalization coeffcient is not finite. Rissanen has proposed an elegant solution to the problem by constraining the data space properly. We demonstrate the effect of data constraints on the MS criterion and present a general convex constraint in data space and disscuss two particular cases: the rhomboidal and ellipsoidal constraints. From these findings we derive four new NML-based criteria. One particular constraint is related to the case when collinearity is present in data. We study WALS estimation which has the potential for a good risk profile. WALS is attractive in regression especially when the number of explanatory variables is large because its computational burden is light. We also apply WALS to estimation and comparison of hip fracture treatment costs between hospital districts in Finland. We present the WALS estimators as a special case of shrinkage estimators and we characterize a class of shrinkage estimators for which we derive the effciency bound. We demonstrate how shrinkage estimators are obtained by using the PenLS technique and we prove suffcient conditions for the PenLS estimator to belong to the class of shrinkage estimators. Through this connection we may derive new MA estimators and effectively utilize certain previously known estimators in MA. We also study the performance of the estimators by using simulation experiments based on hip fracture treatment cost data. We propose an MA estimator with weights selected by the NML criterion. The resulting mixture estimator usually performs better than the corresponding MS estimator. We report on simulation experiments where the performance potential of MDL weight selection is compared with the corresponding potential of the AIC, BIC and Mallow's MA estimators. We also exploit the finding that a smoothing spline estimator may be rewritten as a linear mixed model (LMM). We present the NML criterion for LMM's and propose an automatic data-based smoothing method based on this criterion. The performance of the MDL criterion is compared to AIC, BIC and generalized cross-validation criteria in simulation experiments. Finally we consider the sequential NML (sNML) criterion in logistic regression. We show that while the NML criterion becomes quickly computationally infeasible as the number of covariates and amount of data increases, the sNML criterion can still be exploited in MS. We also develop a risk adjustment model for hip fracture mortality in Finland by choosing comorbidities that have an effect on mortality after hip fracture.
    Translated title of the contributionStatistical and Information Theoretic Approaches to Model Selection and Averaging
    Original languageEnglish
    PublisherTampere University of Technology
    Number of pages52
    ISBN (Electronic)978-952-15-3054-8
    ISBN (Print)978-952-15-3041-8
    Publication statusPublished - 12 Apr 2013
    Publication typeG5 Doctoral dissertation (articles)

    Publication series

    NameTampere University of Technology. Publication
    PublisherTampere University of Technology
    ISSN (Print)1459-2045


    Dive into the research topics of 'Statistical and Information Theoretic Approaches to Model Selection and Averaging'. Together they form a unique fingerprint.

    Cite this