Generalized relative data and robustness in Bayes spaces
Generalized relative data and robustness in Bayes spaces
Weave: Österreich - Belgien - Deutschland - Luxemburg - Polen - Schweiz - Slowenien - Tschechien
Disciplines
Geosciences (10%); Mathematics (90%)
Keywords
-
Compositional Data,
Compositional Tables,
Probability Density Functions,
Functional Data,
Robust Statistics
Compositional data are known as relative data, where the essential information is contained in the (log-)ratios between the variables, also called compositional parts. This type of information requires an appropriate data processing, which leads to the topic of compositional data analysis (CoDa). There are numerous examples where CoDa has been very successful in getting deeper insights into the problem, such as the analysis of concentration data (e.g. element concentrations of geochemical or archaeological measurements), infection rates in virology, expenditures on different sectors, identification of relevant biomarkers, etc. The theory of CoDa offers a variety of approaches for the appropriate treatment of compositional data in classification and regression problems, and for statistical tasks in general. The goal of this project is to extend CoDa methods to more general data structures as they frequently appear nowadays in practice: Measurements can occur as continuous functions for which the overall sum or integral over the single contributions is constrained, for example probability density functions. Or they occur in form of compositional tables or objects of even higher order, because the underlying information is grouped according to factors such as gender, age, etc. In the continuous case this would generalize to compositional bivariate or higher-order densities. Compositional data of higher complexity are here called generalized relative data. Within this project, the so-called Bayes space technology will be used as a mathematical framework to allow for a weighting scheme for variables, observations, and single cells of the data array. Particularly in the high-dimensional case, variable weighting can be very useful for adjusting the relative effect which one variable has on the others. Observation and cell weighting, on the other hand, controls the influence on a resulting estimator. This is very useful if observations or single cells are outlying, and thus weighting opens the door to developing robust versions of CoDa methods.
This project was devoted to developing methodology for a robust analysis of relative functional data, thus random functions residing in the Bayes space, such as density functions. An important task in robust statistics is outlier detection, which refers to identifying functions that are deviating from the main pattern, for example, deviating w.r.t. the shape, or because of spikes. A common tool for outlier detection in multivariate statistics is the Mahalanobis distance, which requires a robust estimation of the mean and the covariance. One unique challenge was the infinite-dimensional nature of functional data and the need for appropriate regularization. Within this project we developed a unifying framework based on an existing functional regularized Mahalanobis distance, by extending and adapting it to several important settings in FDA. A key contribution was the development of a robust covariance estimator, used for outlier detection of univariate functional data. The resulting method, the "Minimum regularized covariance trace estimator" was published in Technometrics, one of the highly ranked statistics journals (DOI: 10.1080/00401706.2024.2336542). This distance was extended to the Bayes space, accounting for the constraints inherited in density functions. Based on this extension, we developed robust FPCA for relative data (RDPCA) as a novel approach for accurate estimation of principal components in the presence of outliers. The performance of RDPCA was assessed during simulation studies and real-data examples. They show the ability of the method to improve covariance estimation and PCA compared to traditional methods. This was joint work with our partner from Czech Republic (Karel Hron), and also another international collaborator (Alessandra Menafoglio, Politecnico di Milano). The paper is currently under review in the journal Technometrics. Finally, our regularized functional Mahalanobis distance was generalized to multivariate functional processes. Such processes often pose challenges due to the high dimensionality of covariance structures, particularly in spatio-temporal models. Here, the covariance matrix scales quadratically with the number of spatial and temporal observations, requiring substantial data and computational resources. To address this, our research focused on robust parameter estimation under the assumption of a separable covariance structure. In this collaboration with Tomas Masak (WU Wien), we were able to connect Mahalanobis distances for multivariate processes to univariate processes. An adaptation of this concept to the clustering framework has shown excellent performance when compared to competitors. This work will be submitted in the near future. We also could publish additional papers related to our core innovations, and most importantly, the work resulted in a PhD thesis of Jeremy Oguamalam, which was successfully defended in October 2025. Several presentations were given at international workshops and conferences, including invited presentations by Una Radojičić.
- Technische Universität Wien - 100%
- Johanna Neslehova, McGill University, Montreal - Canada
- Tomas Matys Grygar, Academy of Sciences of the Czech Republic - Czechia
- Karel Hron, Palacky University Olomouc - Czechia, international project partner
- Alessandra Menafoglio, Polytechnic University of Milan - Italy
- Matthias Templ, Zürcher Hochschule für Angewandte Wissenschaften - Switzerland
Research Output
- 4 Citations
- 14 Publications
- 1 Datasets & models
-
2025
Title Correspondence Analysis From the Viewpoint of Compositional Tables DOI 10.1002/sam.70023 Type Journal Article Author Fačevicová K Journal Statistical Analysis and Data Mining: An ASA Data Science Journal -
2023
Title Principal balances of compositional data for regression and classification using partial least squares DOI 10.1002/cem.3518 Type Journal Article Author Nesrstová V Journal Journal of Chemometrics -
2023
Title Minimum regularized covariance trace estimator and outlier detection for functional data DOI 10.48550/arxiv.2307.13509 Type Other Author Oguamalam J Link Publication -
2025
Title Regularized Mahalanobis Distance for Functional Data Type PhD Thesis Author Oguamalam, Jeremy -
2025
Title Robust Covariance Estimation and Explainable Outlier Detection for Matrix-Valued Data DOI 10.1080/00401706.2025.2475781 Type Journal Article Author Mayrhofer M Journal Technometrics -
2025
Title Identifying Important Pairwise Logratios in Compositional Data with Sparse Principal Component Analysis. DOI 10.1007/s11004-024-10159-0 Type Journal Article Author Nesrstová V Journal Mathematical geosciences Pages 333-358 -
2024
Title Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements DOI 10.1016/j.gexplo.2024.107416 Type Journal Article Author Grygar T Journal Journal of Geochemical Exploration -
2023
Title Identifying Important Pairwise Logratios in Compositional Data with Sparse Principal Component Analysis DOI 10.48550/arxiv.2311.13911 Type Preprint Author Nesrstová V Link Publication -
2024
Title Minimum Regularized Covariance Trace Estimator and Outlier Detection for Functional Data DOI 10.1080/00401706.2024.2336542 Type Journal Article Author Oguamalam J Journal Technometrics -
2024
Title Robust functional PCA for density data DOI 10.34726/8739 Type Other Author Filzmoser P Link Publication -
2022
Title Principal Balances of Compositional Data for Regression and Classification using Partial Least Squares DOI 10.48550/arxiv.2211.01686 Type Preprint Author Nesrstová V -
2023
Title Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements DOI 10.48550/arxiv.2310.13761 Type Preprint Author Grygar T Link Publication -
2022
Title Compositional cubes: a new concept for multi-factorial compositions DOI 10.1007/s00362-022-01350-8 Type Journal Article Author Facevicová K Journal Statistical Papers Pages 955-985 Link Publication -
2022
Title Compositional Cubes: A New Concept for Multi-factorial Compositions DOI 10.48550/arxiv.2201.10321 Type Preprint Author Facevicová K
-
2024
Link
Title Minimum Regularized Covariance Trace Estimator and Outlier Detection for Functional Data DOI 10.6084/m9.figshare.25766304 Type Database/Collection of data Public Access Link Link