Long-Term Risk Prediction from Short-Term Big Data
Long-Term Risk Prediction from Short-Term Big Data
Disciplines
Other Human Medicine, Health Sciences (100%)
Keywords
-
Big Data,
Long-Term Prediction,
Simulation,
Microsimulation,
Time-To-Event,
Competing Risk
Long-term predictions must be based on long observation periods. Thus, long-term prediction models may already be outdated at their first use. In medicine, recommended long-term risk prediction models are based on data collected from individuals many years ago. This implies that possible changes in, e.g., average weight or prevalence of smoking, treatment modalities, or outcome frequencies over time are not adequately reflected. The era of Big Data made contemporary data sets with 100,000s of subjects available. However, these big data do not have the required long observation time. For example, we developed a formula predicting an individuals five-year cardiovascular risk using anonymized data from 2.4M individuals who participated in a standardized health- screening examination between 2009 and 2015. However, especially for younger individuals, a lifetime prediction would be of much more interest. Therefore, in this research project, we will investigate two methodological approaches to allow Long-Term Risk Prediction from Short-Term Big Data. In the first approach, we will link repeated measurements of annual health-screening examinations from several similar individuals covering different periods of life into super-records, i.e., we try to merge many short observations into fewer long observations. This will sacrifice some of the bigness of the data to obtain data with a long observation period which is more suited to our needs. The problem here is to figure out how best to link the files in an automated way and whether this might introduce systematic biases. We will jointly model the development of risk factors over age and the time until a cardiovascular event occurs using these linked super- records. So-called joint models or machine learning approaches can be used for this task. Second, we will investigate methods, which simulate long-term outcomes by repeatedly applying or mathematically integrating short-term prediction models. These synthetic data covering long observations can be analysed similarly as the above-mentioned super-records. For developing and validating our methodology, we can make use of an anonymous longitudinal data set on health-screening examinations from more than 185,000 individuals with a median observation period of almost 30 years from the Vorarlberg Health Monitoring and Promotion Programme. Our vision is that the successfully completed project may provide the necessary methodology to establish a population-based health model for the Austrian population, covering cardiovascular and other chronic diseases. Moreover, the methodology could be transferred to other fields of research, within or outside medicine, where long-term prognosis based on contemporary big data is of interest.
- Hanno Ulmer, Medizinische Universität Innsbruck , national collaboration partner
- Georg Dorffner, Medizinische Universität Wien , national collaboration partner
- Georg Lukas Heinze, Medizinische Universität Wien , national collaboration partner