Predicting Rare Events More Accurately (PREMA)
Predicting Rare Events More Accurately (PREMA)
Bilaterale Ausschreibung: Slowenien
Disciplines
Other Human Medicine, Health Sciences (50%); Biology (50%)
Keywords
-
Class Imbalance,
Cross-Validation,
Logistic Regression,
Panalized Liklihood,
Prognostic Model,
Rare Events
Logistic regression is one of the most commonly used statistical methods to estimate prognostic models that relate a binary outcome (with levels event and non-event) to a number of binary, categorical or continuous explanatory variables. A low prevalence of events, encountered frequently in clinical or epidemiological studies, but also in other fields of empirical research, causes underestimation and instability of estimates of the event probability in subjects who are likely to experience the rare event. This happens because the analysis is disproportionally influenced by the subjects without events. This effect is even more pronounced when the number of explanatory variables approaches or exceeds the number of outcome events. Recently, penalized likelihood regression (PLR) methods have become popular for analyses with high- dimensional explanatory variable spaces. PLR methods shrink the estimates of regression coefficients towards zero in order to decrease their mean squared error. While this also decreases the overall mean squared error of predicted event probabilities, in the rare events situation (RES) poor predictions for the subjects which are at high risk for an event are still encountered. The main objective of this project is to further develop PLR with regard to the high-dimensional RES by elaborating and evaluating novel approaches to estimation, tuning and validation. At the estimation stage, there are several possibilities for an enhancement of PLR. Besides modifications of weighting methods, various types of penalties, e.g., the combination of different classical penalties (Firth and LASSO or Firth and ridge) and generalizations of the Firth-type penalty, will be taken into account. Moreover, the tuning criteria commonly used in PLR to control the amount of penalization of the model, such as optimization of cross-validated deviance or overall misclassification error, are not expected to behave well in the RES. Here, we will deduce tuning criteria that put more weight to the observations with events in order to obtain more accurate event predictions for subjects with higher underlying event probabilities, at the cost of less accurate estimation for subjects not susceptible to events. In a similar way, measures of predictive accuracy used for model validation will be adapted to the RES. The performance of the proposed methods will be evaluated on real-life data sets and in comprehensive simulation studies. Implemented in statistical software packages, the results of this project will be of practical value, whenever predictions for strongly imbalanced binary outcomes have to be derived from high-dimensional data.
In this project, novel statistical methods for the prognosis of rare events based on characteristics available at the time of prognosis were developed. These new methods were investigated with regard to their statistical properties and compared with existing methods to gain new insight in what could be the most efficient statistical analysis to obtain precise predictions. The main contribution of this project lies in the development of modifications of the logistic regression analysis method. Two new extensions were developed, abbreviated by FLIC and FLAC, which open new routes to precise prediction of the occurrence of rare events such as side effects of drugs. In order to make these new methodology accessible to the international research community, computer programs were developed which are freely available.
Research Output
- 679 Citations
- 29 Publications
- 1 Policies
- 2 Software
- 1 Disseminations
- 5 Scientific Awards
-
2021
Title To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets DOI 10.48550/arxiv.2101.11230 Type Preprint Author Å inkovec H -
2021
Title On resampling methods for model assessment in penalized and unpenalized logistic regression DOI 10.48550/arxiv.2101.07640 Type Preprint Author Geroldinger A -
2021
Title Firth's logistic regression with rare events: accurate effect estimates AND predictions? DOI 10.48550/arxiv.2101.07620 Type Preprint Author Puhr R -
2021
Title Solutions to problems of nonexistence of parameter estimates and sparse data bias in Poisson regression DOI 10.1177/09622802211065405 Type Journal Article Author Joshi A Journal Statistical Methods in Medical Research Pages 253-266 Link Publication -
2021
Title The roles of predictors in cardiovascular risk models - a question of modeling culture? DOI 10.1186/s12874-021-01487-4 Type Journal Article Author Wallisch C Journal BMC Medical Research Methodology Pages 284 Link Publication -
2022
Title An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes DOI 10.1186/s12874-022-01641-6 Type Journal Article Author Geroldinger A Journal BMC Medical Research Methodology Pages 168 Link Publication -
2022
Title Additional file 1 of An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes DOI 10.6084/m9.figshare.20046960.v1 Type Other Author Blagus R Link Publication -
2022
Title Additional file 1 of An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes DOI 10.6084/m9.figshare.20046960 Type Other Author Blagus R Link Publication -
2019
Title "Bring More Data!" – A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size DOI 10.20944/preprints201910.0321.v1 Type Preprint Author Šinkovec H Link Publication -
2019
Title Interrelations of Sphingolipid and Lysophosphatidate Signaling with Immune System in Ovarian Cancer DOI 10.1016/j.csbj.2019.04.004 Type Journal Article Author Meshcheryakova A Journal Computational and Structural Biotechnology Journal Pages 537-560 Link Publication -
2019
Title Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size DOI 10.3390/ijerph16234658 Type Journal Article Author Šinkovec H Journal International Journal of Environmental Research and Public Health Pages 4658 Link Publication -
2021
Title To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets DOI 10.1186/s12874-021-01374-y Type Journal Article Author Å inkovec H Journal BMC Medical Research Methodology Pages 199 Link Publication -
2021
Title Additional file 1 of To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets DOI 10.6084/m9.figshare.16714206.v1 Type Other Author Heinze G Link Publication -
2021
Title Additional file 1 of To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets DOI 10.6084/m9.figshare.16714206 Type Other Author Heinze G Link Publication -
2021
Title Additional file 2 of The roles of predictors in cardiovascular risk models - a question of modeling culture? DOI 10.6084/m9.figshare.17284181.v1 Type Other Author Agibetov A Link Publication -
2021
Title Additional file 2 of The roles of predictors in cardiovascular risk models - a question of modeling culture? DOI 10.6084/m9.figshare.17284181 Type Other Author Agibetov A Link Publication -
2021
Title Additional file 1 of The roles of predictors in cardiovascular risk models - a question of modeling culture? DOI 10.6084/m9.figshare.17284178.v1 Type Other Author Agibetov A Link Publication -
2021
Title Additional file 3 of The roles of predictors in cardiovascular risk models - a question of modeling culture? DOI 10.6084/m9.figshare.17284184.v1 Type Other Author Agibetov A Link Publication -
2021
Title Additional file 1 of The roles of predictors in cardiovascular risk models - a question of modeling culture? DOI 10.6084/m9.figshare.17284178 Type Other Author Agibetov A Link Publication -
2021
Title Additional file 3 of The roles of predictors in cardiovascular risk models - a question of modeling culture? DOI 10.6084/m9.figshare.17284184 Type Other Author Agibetov A Link Publication -
2019
Title Hotspots of vascular plant endemism in a global biodiversity hotspot in Southwest Asia suffer from significant conservation gaps DOI 10.1016/j.biocon.2019.07.005 Type Journal Article Author Noroozi J Journal Biological Conservation Pages 299-307 Link Publication -
2022
Title An Investigation of Penalization and Data Augmentation to Improve Convergence of Generalized Estimating Equations for Clustered Binary Outcomes DOI 10.21203/rs.3.rs-1369776/v1 Type Preprint Author Geroldinger A Link Publication -
2023
Title Leave-one-out cross-validation, penalization, and differential bias of some prediction model performance measures—a simulation study DOI 10.1186/s41512-023-00146-0 Type Journal Article Author Geroldinger A Journal Diagnostic and Prognostic Research Pages 9 Link Publication -
2021
Title sj-docx-1-smm-10.1177_09622802211065405 - Supplemental material for Solutions to problems of nonexistence of parameter estimates and sparse data bias in Poisson regression DOI 10.25384/sage.17697919 Type Other Author Geroldinger A Link Publication -
2020
Title Tuning in ridge logistic regression to solve separation DOI 10.48550/arxiv.2011.14865 Type Preprint Author Å inkovec H -
2020
Title Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling DOI 10.1002/sim.8779 Type Journal Article Author Wallisch C Journal Statistics in Medicine Pages 369-381 Link Publication -
2017
Title Separation in Logistic Regression: Causes, Consequences, and Control DOI 10.1093/aje/kwx299 Type Journal Article Author Mansournia M Journal American Journal of Epidemiology Pages 864-870 Link Publication -
2017
Title Firth's logistic regression with rare events: accurate effect estimates and predictions? DOI 10.1002/sim.7273 Type Journal Article Author Puhr R Journal Statistics in Medicine Pages 2302-2317 Link Publication -
2022
Title A comparison of full model specification and backward elimination of potential confounders when estimating marginal and conditional causal effects on binary outcomes from observational data DOI 10.1002/bimj.202100237 Type Journal Article Author Luijken K Journal Biometrical Journal Pages 2100237 Link Publication
-
2020
Title Systematic Review of COVID-19 prediction models DOI 10.1136/bmj.m1328 Type Membership of a guideline committee
-
2017
Title Keynote at Young Statisticians Meeting Type A talk or presentation
-
2021
Title Guest Editor of Special Issue in the International Journal of Environmental Research and Public Health Type Appointed as the editor/advisor to a journal or book series Level of Recognition Continental/International -
2019
Title Poster prize of the Austro-Swiss region of the International Biometric Society Type Poster/abstract prize Level of Recognition Regional (any country) -
2019
Title Associate Editor of Statistics in Medicine Type Appointed as the editor/advisor to a journal or book series Level of Recognition Continental/International -
2018
Title Associate Editor of Diagnostic and Prognostic Research Type Appointed as the editor/advisor to a journal or book series Level of Recognition Continental/International -
2018
Title Keynote speaker at the BMS-ANed (Dutch Biometric Society) 2018 Spring Meeting Type Personally asked as a key note speaker to a conference Level of Recognition Regional (any country)