Flexible Detection of Groups in Data
Flexible Detection of Groups in Data
Disciplines
Other Social Sciences (40%); Computer Sciences (10%); Mathematics (40%); Economics (10%)
Keywords
-
Mixture Models,
Regularization,
Unobserved Heterogeneity,
R,
Em Algorithm,
Market Segmentation
Observations often come from a heterogeneous population which consists of different groups. However, the information from which group each observation stems is not observed. This occurs either due to difficulties in the measurement of the group indicator or because not a single characteristic could be identified that captures the grouping. In statistical modeling finite mixtures have been used for more than 100 years as a flexible model class to describe this kind of data and determine the group memberships of the given observations as well as the group sizes and a group-specific statistical model. The areas of application consist of astronomy, biology, economics, marketing and medicine. The usefulness of the application of finite mixture models often suffers from the fact that a-priori knowledge about certain characteristics of the grouping is available, but cannot be easily included in the model. This project aims at overcoming this drawback by offering a suitable approach for fitting a finite mixture model while also taking this additional information into account. Especially the possibility to include information on which observations are likely to be in the same group or should rather end up in different groups will be considered. A possible area of application for this newly developed approach is market segmentation. In market segmentation the aim is to partition the market into sub-markets. Segments are often defined to consist of consumers with similar behavior. However, the possibility to implement a successful marketing strategy is only ensured if these segments do not only differ in their behavior, but also with respect to socio-demographic characteristics. A combined approach taking all requirements on the segments directly into account will ease the statistical analysis and improve the finally derived solution. In addition the rigorous application of advanced mixtures of regression models will be investigated for two different problems: the validation of credit ratings systems using a latent variable approach and the simultaneous accounting for response style heterogeneity among respondents in a segmentation study when survey data is available.
In data often latent groups are suspected to be present, but the group memberships are unobserved. In this case statistical methods are required to unravel the latent structure and learn about the group specific characteristics. Finite mixture models constitute the state-of the-art technique to perform this task with a statistical model-based approach. In this project several extensions of the general model class of finite mixtures were considered. These extensions allow to suitably model different types of data in a range of applications and enlarge the toolbox of statistical methods in order to better capture available information in data. The theoretical statistical properties and estimation methods of these models were analyzed and the algorithms were implemented in the freely available open-source add-on package flexmix for the statistical software environment R. Within the Bayesian framework prior choices were investigated and developed which lead to sparse solutions.Applications included the modeling of HIV RNV levels over time using mixtures of linear mixed-effects models for censored data, of time-course gene expression levels over time using mixtures of linear additive models, of reading skill evaluations in children using mixtures of beta regression models and text corpora using topic models based on the latent Dirichlet allocation model and mixtures of von Mises-Fisher distributions. In addition sample size recommendations were developed for market segmentation applications in tourism.
- Universität Linz - 100%
- Sara Dolnicar, University of Queensland - Australia
Research Output
- 2465 Citations
- 28 Publications
- 1 Datasets & models
-
2013
Title Dynamic, Interactive Survey Questions Can Increase Survey Data Quality DOI 10.1080/10548408.2013.827546 Type Journal Article Author Dolnicar S Journal Journal of Travel & Tourism Marketing Pages 690-699 -
2012
Title Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned Type Journal Article Author Gruen Bettina Journal JOURNAL OF STATISTICAL SOFTWARE Pages 1-25 -
2017
Title Effect fusion using model-based clustering DOI 10.48550/arxiv.1703.07603 Type Preprint Author Malsiner-Walli G -
2017
Title Identifying Mixtures of Mixtures Using Bayesian Estimation DOI 10.1080/10618600.2016.1200472 Type Journal Article Author Malsiner-Walli G Journal Journal of Computational and Graphical Statistics Pages 285-295 Link Publication -
2016
Title Increasing sample size compensates for data problems in segmentation studies DOI 10.1016/j.jbusres.2015.09.004 Type Journal Article Author Dolnicar S Journal Journal of Business Research Pages 992-999 Link Publication -
2018
Title Market Segmentation Analysis, Understanding It, Doing It, and Making It Useful DOI 10.1007/978-981-10-8818-6 Type Book Author Dolnicar S Publisher Springer Nature -
2014
Title On standard conjugate families for natural exponential families with bounded natural parameter space DOI 10.1016/j.jmva.2014.01.003 Type Journal Article Author Hornik K Journal Journal of Multivariate Analysis Pages 14-24 Link Publication -
2012
Title Extended Beta Regression in R : Shaken, Stirred, Mixed, and Partitioned DOI 10.18637/jss.v048.i11 Type Journal Article Author Grün B Journal Journal of Statistical Software Link Publication -
2012
Title ‘Pick Any’ Measures Contaminate Brand Image Studies DOI 10.2501/ijmr-54-6-821-834 Type Journal Article Author Dolnicar S Journal International Journal of Market Research Pages 821-834 Link Publication -
2012
Title Modelling Human Immunodeficiency Virus Ribonucleic Acid Levels with Finite Mixtures for Censored Longitudinal Data DOI 10.1111/j.1467-9876.2011.01007.x Type Journal Article Author Grün B Journal Journal of the Royal Statistical Society Series C: Applied Statistics Pages 201-218 Link Publication -
2012
Title Validly Measuring Destination Image in Survey Studies DOI 10.1177/0047287512457267 Type Journal Article Author Dolnicar S Journal Journal of Travel Research Pages 3-14 Link Publication -
2014
Title Gingival Tissue Transcriptomes Identify Distinct Periodontitis Phenotypes DOI 10.1177/0022034514527288 Type Journal Article Author Kebschull M Journal Journal of Dental Research Pages 459-468 Link Publication -
2014
Title Including Don't know answer options in brand image surveys improves data quality DOI 10.2501/ijmr-2013-043 Type Journal Article Author Dolnicar S Journal International Journal of Market Research Pages 33-50 -
2014
Title Branding water DOI 10.1016/j.watres.2014.03.056 Type Journal Article Author Dolnicar S Journal Water Research Pages 325-338 Link Publication -
2014
Title Model-based clustering based on sparse finite Gaussian mixtures DOI 10.1007/s11222-014-9500-2 Type Journal Article Author Malsiner-Walli G Journal Statistics and Computing Pages 303-324 Link Publication -
2014
Title movMF: An R Package for Fitting Mixtures of von Mises-Fisher Distributions Type Journal Article Author Gruen Bettina Journal JOURNAL OF STATISTICAL SOFTWARE Pages 1-31 -
2012
Title Water conservation behavior in Australia DOI 10.1016/j.jenvman.2012.03.042 Type Journal Article Author Dolnicar S Journal Journal of Environmental Management Pages 44-52 Link Publication -
2011
Title topicmodels : An R Package for Fitting Topic Models DOI 10.18637/jss.v040.i13 Type Journal Article Author Grün B Journal Journal of Statistical Software Link Publication -
2016
Title Model-based clustering based on sparse finite Gaussian mixtures DOI 10.48550/arxiv.1606.06828 Type Preprint Author Malsiner-Walli G -
2015
Title Response style corrected market segmentation for ordinal data DOI 10.1007/s11002-015-9375-9 Type Journal Article Author Grün B Journal Marketing Letters Pages 729-741 Link Publication -
2015
Title Identifying Mixtures of Mixtures Using Bayesian Estimation DOI 10.48550/arxiv.1502.06449 Type Preprint Author Malsiner-Walli G -
2013
Title “Translating” between survey answer formats DOI 10.1016/j.jbusres.2012.02.029 Type Journal Article Author Dolnicar S Journal Journal of Business Research Pages 1298-1306 Link Publication -
2013
Title Required Sample Sizes for Data-Driven Market Segmentation Analyses in Tourism DOI 10.1177/0047287513496475 Type Journal Article Author Dolnicar S Journal Journal of Travel Research Pages 296-306 Link Publication -
2013
Title On conjugate families and Jeffreys priors for von Mises–Fisher distributions DOI 10.1016/j.jspi.2012.11.003 Type Journal Article Author Hornik K Journal Journal of Statistical Planning and Inference Pages 992-999 Link Publication -
2013
Title Amos-type bounds for modified Bessel function ratios DOI 10.1016/j.jmaa.2013.05.070 Type Journal Article Author Hornik K Journal Journal of Mathematical Analysis and Applications Pages 91-101 Link Publication -
2013
Title On maximum likelihood estimation of the concentration parameter of von Mises–Fisher distributions DOI 10.1007/s00180-013-0471-0 Type Journal Article Author Hornik K Journal Computational Statistics Pages 945-957 Link Publication -
2014
Title movMF : An R Package for Fitting Mixtures of von Mises-Fisher Distributions DOI 10.18637/jss.v058.i10 Type Journal Article Author Hornik K Journal Journal of Statistical Software Link Publication -
2011
Title Modelling time course gene expression data with finite mixtures of linear additive models DOI 10.1093/bioinformatics/btr653 Type Journal Article Author Grün B Journal Bioinformatics Pages 222-228 Link Publication
-
2016
Link
Title Identifying Mixtures of Mixtures Using Bayesian Estimation DOI 10.6084/m9.figshare.3439301 Type Database/Collection of data Public Access Link Link