Probabilistic Machine Learning
Probabilistic Machine Learning
Disciplines
Mathematics (70%); Economics (30%)
Keywords
-
Machine learning,
Probabilistic,
Big Data,
Malnutrition
Assoz. Prof. Dr. Nikolaus Umlauf, Principal Investigator Mag. Johannes Seiler, PhD, Co-Investigator Univ.-Prof. Dr. Stefan Lang, Co-Investigator in collaboration with Dr. Kenneth Harttgen This project aims to better explain the problems of childhood malnutrition in low- and middle-income countries through probabilistic machine learning and to contribute to the monitoring of the Sustainable Development Goals (SGD), proposed at the United Nations Conference on Sustainable Development in Rio de Janeiro in 2012. Recent literature emphasizes the high heterogeneity at both the national and sub-national level, and fo- cuses on identifying drivers of malnutrition with flexible regression models. While incorporating complex modelling approaches, the applied methods are not sufficient to account for all important interactions, i.e., certain factors remain undetected, that could make a significant contribution to the overall situation. We aim to significantly improve monitoring through: (a) an improved database and (b) development of new algorithms for non- standard interactions that will be embedded in the framework of fully probabilis- tic distributional regression models. The novel algorithms will be based on ideas from machine learning such as decision trees (and random decision forests) and stochastic gradient descent type algorithms, suitable for very large data sets. The presented methods can be used for a variety of applications. The modeling approach focuses on the decomposition into main effects and (possibly) complex but interpretable interactions. The new algorithms are extremely memory efficient (including variable selection) and can be applied to virtually any number of observations on a conventional computer. With the methods developed so far, it is not possible to compute such large probabilistic models. Therefore the methods are also very useful for other applications, e.g., in the field of meteorology, real estate modelling, ecology, medicine, etc. 1
Probabilistic Machine Learning Worldwide, millions of children are affected by malnutrition, especially in countries with limited access to healthcare, clean water, or sufficient food. A common consequence is anemia, or low levels of hemoglobin in the blood. To understand the scale of this problem and take effective action, reliable and meaningful data are essential. In our research project funded by the Austrian Science Fund (FWF), we developed an entirely new statistical algorithm that allows complex health data to be analyzed in a fundamentally new way. Our method can simultaneously process millions of data points, from health surveys, satellite imagery, climate and environmental indicators, and socioeconomic variables, and automatically identifies which factors are most relevant. What makes our approach unique is that our models do not just predict average values (such as the average risk of anemia), but the entire probability distribution. This means that for every region, we can estimate not only how likely anemia is, but also how uncertain the prediction is, or, for example, how high the risk is for particularly severe cases. This allows us to identify areas where the health situation is especially unstable or uncertain. One of the central outcomes of our project is the creation of high-resolution maps for over 50 countries. These maps show, often down to the village level, how severely children are affected by anemia. One striking finding: Differences within countries are often greater than differences between countries. These insights help design aid efforts that are more targeted and equitable. Our new algorithm also makes it easier to explain complex relationships, such as how poverty, climate extremes, and inadequate healthcare systems reinforce each other. Our methods are openly available, have been published in leading international journals, and are freely accessible to other research groups. Our project demonstrates that modern statistical methods can uncover hidden patterns in large datasets, creating a solid foundation for better, evidence-based decision-making.
- Universität Innsbruck - 100%
- Kenneth Harttgen, ETH Zürich - Switzerland
Research Output
- 37 Citations
- 16 Publications
-
2024
Title Cholesky-based multivariate Gaussian regression DOI 10.1016/j.ecosta.2022.03.001 Type Journal Article Author Mayr G Journal Econometrics and Statistics -
2025
Title Distributional Regression for High-Dimensional and Big Data: Methods and Applications Type PhD Thesis Author Mattias Wetscher -
2025
Title Leveraging remote observations for calibrating surface energy- and mass balance models: a case study on Hintereisferner DOI 10.5194/egusphere-egu25-10390 Type Other Author Arndt A -
2025
Title High-resolution spatial prediction of anemia risk among children aged 6 to 59 months in low- and middle-income countries DOI 10.1038/s43856-025-00765-2 Type Journal Article Author Seiler J Journal Communications Medicine -
2023
Title Functional thresholds alter the relationship of plant resistance and recovery to drought. DOI 10.1002/ecy.3907 Type Journal Article Author Ingrisch J Journal Ecology -
2023
Title A multilevel analysis of real estate valuation using distributional and quantile regression DOI 10.1177/1471082x231157205 Type Journal Article Author Brunauer W Journal Statistical Modelling -
2021
Title Functional thresholds of plant resistance and recovery to drought DOI 10.5194/egusphere-egu21-8333 Type Other Author Ingrisch J -
2022
Title Amplification of annual and diurnal cycles of alpine lightning over the past four decades DOI 10.5194/egusphere-egu22-1314 Type Journal Article Author Simon T -
2022
Title An index of access to essential infrastructure to identify where physical distancing is impossible DOI 10.1038/s41467-022-30812-8 Type Journal Article Author Günther I Journal Nature Communications Pages 3355 Link Publication -
2022
Title Pedestrian exposure to black carbon and PM2.5 emissions in urban hot spots: new findings using mobile measurement techniques and flexible Bayesian regression models. DOI 10.1038/s41370-021-00379-5 Type Journal Article Author Alas Hd Journal Journal of exposure science & environmental epidemiology Pages 604-614 -
2023
Title Amplification of annual and diurnal cycles of alpine lightning. DOI 10.1007/s00382-023-06786-8 Type Journal Article Author Mayr Gj Journal Climate dynamics Pages 4125-4137 -
2023
Title Scalable Estimation for Structured Additive Distributional Regression DOI 10.48550/arxiv.2301.05593 Type Preprint Author Seiler J Link Publication -
2022
Title Climatic legacy effects on the drought response of the Amazon rainforest. DOI 10.1111/gcb.16336 Type Journal Article Author Van Passel J Journal Global change biology Pages 5808-5819 -
2022
Title Distributional Adaptive Soft Regression Trees DOI 10.48550/arxiv.2210.10389 Type Preprint Author Klein N Link Publication -
2021
Title Bayesian Gaussian distributional regression models for more efficient norm estimation. DOI 10.1111/bmsp.12206 Type Journal Article Author Kneib T Journal The British journal of mathematical and statistical psychology Pages 99-117 -
2021
Title bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond) DOI 10.18637/jss.v100.i04 Type Journal Article Author Umlauf N Journal Journal of Statistical Software Pages 1-53 Link Publication