Shrinking and Regularizing Finite Mixture Models
Shrinking and Regularizing Finite Mixture Models
Disciplines
Computer Sciences (5%); Mathematics (85%); Economics (10%)
Keywords
-
Finite mixture model,
Unobserved heterogeneity,
Bayesian estimation,
Conjugate prior,
Regularization,
Shrinkage
The presence of groups of observations with different characteristics is often suspected in data. However, the group memberships are either not available or not observable. Such a situation requires the application of a statistical method in the data analysis which allows to explicitly account for the presence of these latent groups and which aims at determining the group sizes as well as the group characteristics. The standard model-based tool in statistical analysis for this problem is the finite mixture model. Finite mixture models have been used for more than 100 years and represent a flexible and generally applicable statistical tool with many extensions and variations already proposed. However, some problems remain still unresolved such as the correct selection of variables to include in the analysis which drive the group structure and the choice of a suitable model which avoids overfitting the heterogeneity in order to ensure easy interpretability and precise estimation of parameters. In this research project we will aim at improving the application of finite mixture models by providing tools based on shrinkage and regularization which allow selecting a suitable model where relevant variables and irrelevant variables are automatically distinguished and the parameters are chosen in a way to avoid overfitting heterogeneity. Theoretical results will be complemented by applications and software implementations as add-on package for the open-source software R, an environment for statistical computing and graphics (http://www.R-project.org). The availability of improved statistical methods in combination with software implementations allows for a better analysis and increased understanding of data in empirical quantitative research. Due to the wide applicability of finite mixture models, for example in astronomy, biology, economic, marketing, medicine and psychology, results of this research project are assumed to have an impact also on other areas of research, by allowing for improved insights into latent group structures which are present in the data.
Cluster analysis is a statistical method which allows to identify structure in data by grouping observations together. Cluster analysis is applied in many different areas where data are analyzed. The model-based approach for cluster analysis embeds the clustering problem within a statistical inference framework and uses mixture models as the underlying data generating process. The model-based approach is appealing because statistical inference methods can be used to resolve crucial questions appearing in cluster analysis applications. In addition, extensions are readily available by considering different statistical models for the components of the mixture model. Pursuing a Bayesian approach for inference facilitates the inclusion of prior information on the cluster shapes or their number. This reduces the ambiguity in the clustering problem and helps to find sensible clustering solutions. In this project we advanced Bayesian mixture modeling focusing on the model-based clustering context. We pursued a holistic approach covering aspects of both model and prior specification as well as model estimation and general considerations for applications. We aimed at bridging finite and infinite mixture models and were able to highlight important aspects which allow to identify similarities but also crucial differences between the two model classes. We investigated general aspects such as the inclusion of a prior on the number of components to account for model uncertainty and obtain a fully Bayesian model specification which resulted in the generalized mixture of finite mixtures (MFM) model. We shed light on how explicit prior specifications impact on implicitly induced priors which are often of more interest in clustering applications. We advanced estimation of the Bayesian mixture model by proposing the telescoping sampler for the generalized MFM model and an efficient Markov chain Monte Carlo sampling schemes for the case of a mixture-of-experts model. Overall, the project results help to successfully apply Bayesian mixture modeling techniques and enhance their use for model-based clustering and thus broaden and improve the statistical toolbox available for data analysis. Aiming at a suitable dissemination of the project results, we did not only publish several research articles in peer-reviewed journals targeted at experts in the field, but also ensured to provide more accessible contributions, in particular by co-editing and contributing chapters to the ``CRC Handbook of Mixture Analysis'' and contributing an entry to ``Wiley StatsRef: Statistics Reference Online''. R packages implementing some of the computational methods developed during this project are available open-source from the Comprehensive R Archive Network.
- Wirtschaftsuniversität Wien - 100%
- Sara Dolnicar, University of Queensland - Australia
Research Output
- 518 Citations
- 25 Publications
- 2 Software
- 3 Disseminations
- 4 Scientific Awards
- 1 Fundings
-
2023
Title Clusterwise multivariate regression of mixed-type panel data DOI 10.1007/s11222-023-10304-5 Type Journal Article Author Vávra J Journal Statistics and Computing Pages 46 -
2021
Title How many data clusters are in the Galaxy data set? DOI 10.1007/s11634-021-00461-8 Type Journal Article Author Grün B Journal Advances in Data Analysis and Classification Pages 325-349 Link Publication -
2021
Title How many data clusters are in the Galaxy data set? Bayesian cluster analysis in action DOI 10.48550/arxiv.2101.12686 Type Preprint Author Grün B -
2022
Title Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis DOI 10.1111/anzs.12350 Type Journal Article Author Greve J Journal Australian & New Zealand Journal of Statistics Pages 205-229 Link Publication -
2022
Title Advances in Bayesian Mixture Modelling: Contributions to Cluster and Regression Analysis Type Other Author Malsiner-Walli G -
2022
Title Efficient Bayesian Modeling of Binary and Categorical Data in R: The UPG Package Type Other Author Frühwirth-Schnatter S Link Publication -
2022
Title Ultimate Plya Gamma Samplers - Efficient MCMC for Possibly Imbalanced Binary and Categorical Data Type Other Author Frühwirth-Schnatter S Link Publication -
2022
Title Bayesian Finite Mixture Models; In: Wiley StatsRef: Statistics Reference Online Type Book Chapter Author Grün B Publisher John Wiley & Sons -
2020
Title Generalized mixtures of finite mixtures and telescoping sampling DOI 10.48550/arxiv.2005.09918 Type Preprint Author Frühwirth-Schnatter S -
2021
Title Generalized Mixtures of Finite Mixtures and Telescoping Sampling DOI 10.1214/21-ba1294 Type Journal Article Author Frühwirth-Schnatter S Journal Bayesian Analysis Pages 1279-1307 Link Publication -
2022
Title Advances in Bayesian mixture modelling: Contributions to cluster and regression analysis Type Postdoctoral Thesis Author Gertraud Malsiner-Walli -
2022
Title Clusterwise multivariate regression of mixed-type panel data DOI 10.21203/rs.3.rs-1882841/v1 Type Preprint Author Vávra J Link Publication -
2019
Title Handbook of Mixture Analysis DOI 10.1201/9780429055911 Type Book editors Frühwirth-Schnatter S, Celeux G, Robert C Publisher Taylor & Francis Link Publication -
2019
Title Mixture of Experts Models DOI 10.1201/9780429055911-12 Type Book Chapter Author Gormley I Publisher Taylor & Francis Pages 271-307 -
2019
Title Model-Based Clustering DOI 10.1201/9780429055911-8 Type Book Chapter Author Grün B Publisher Taylor & Francis Pages 157-192 -
2019
Title Model Selection for Mixture Models – Perspectives and Strategies DOI 10.1201/9780429055911-7 Type Book Chapter Author Celeux G Publisher Taylor & Francis Pages 117-154 Link Publication -
2019
Title Computational Solutions for Bayesian Inference in Mixture Models DOI 10.1201/9780429055911-5 Type Book Chapter Author Celeux G Publisher Taylor & Francis Pages 73-96 Link Publication -
2019
Title Special issue on “Advances on model-based clustering and classification” DOI 10.1007/s11634-019-00355-w Type Journal Article Author Frühwirth-Schnatter S Journal Advances in Data Analysis and Classification Pages 1-5 Link Publication -
2019
Title Semi-parametric Regression under Model Uncertainty: Economic Applications DOI 10.1111/obes.12294 Type Journal Article Author Malsiner-Walli G Journal Oxford Bulletin of Economics and Statistics Pages 1117-1143 Link Publication -
2019
Title Keeping the balance—Bridge sampling for marginal likelihood estimation in finite mixture, mixture of experts and Markov mixture models DOI 10.1214/19-bjps446 Type Journal Article Author Frühwirth-Schnatter S Journal Brazilian Journal of Probability and Statistics Pages 706-733 Link Publication -
2017
Title From here to infinity - sparse finite versus Dirichlet process mixtures in model-based clustering DOI 10.48550/arxiv.1706.07194 Type Preprint Author Frühwirth-Schnatter S -
2017
Title Effect fusion using model-based clustering DOI 10.48550/arxiv.1703.07603 Type Preprint Author Malsiner-Walli G -
2018
Title From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering DOI 10.1007/s11634-018-0329-y Type Journal Article Author Frühwirth-Schnatter S Journal Advances in Data Analysis and Classification Pages 33-64 Link Publication -
2018
Title Effect fusion using model-based clustering DOI 10.1177/1471082x17739058 Type Journal Article Author Malsiner-Walli G Journal Statistical Modelling Pages 175-196 Link Publication -
2017
Title The resurrection of the PIDDosome – emerging roles in the DNA-damage response and centrosome surveillance DOI 10.1242/jcs.203448 Type Journal Article Author Sladky V Journal Journal of Cell Science Pages 3779-3787 Link Publication
-
2019
Link
Title Organizing the workshop "26th Summer Working Group on Model-Based Clustering" Type Participation in an activity, workshop or similar Link Link -
2022
Link
Title Session Organizer at the "Austrian and Slovenian Statistical Days 2022" Type Participation in an activity, workshop or similar Link Link -
2020
Link
Title Organizing the workshop "Bayes@Austria" Type Participation in an activity, workshop or similar Link Link
-
2022
Title Austrian and Slovenian Statistical Days Type Personally asked as a key note speaker to a conference Level of Recognition Continental/International -
2021
Title 22nd European Young Statisticians Meeting Type Personally asked as a key note speaker to a conference Level of Recognition Continental/International -
2021
Title 3rd Insurance Data Science Conference Type Personally asked as a key note speaker to a conference Level of Recognition Continental/International -
2019
Title CLADAG Type Personally asked as a key note speaker to a conference Level of Recognition Continental/International
-
2020
Title WU Projects Type Research grant (including intramural programme) Start of Funding 2020