New Methodological Developments for GAMs and VGAMs
New Methodological Developments for GAMs and VGAMs
Disciplines
Computer Sciences (10%); Mathematics (90%)
Keywords
-
Additive models,
Outliers,
Robust statistics,
VGAM,
GAM
Additive and Generalized Additive Models are frequently used in nonparametric regression analysis, since they avoid overfitting by the use of smoothing spline penalties. The underlying assumption is that the ground truth is overall smooth. However, in practice this assumption is often unrealistic, since there can be outliers in the response, spatially highly varying curvature, or jump signals. Violations of the model assumptions can lead to inaccurate results, and thus it is necessary to weaken the assumptions. In this project we cope with these issues by robustifying the definition of the smoothing splines in different ways. The modifications are made such that the computations are still feasible. The resulting nonparametric estimates will allow for quick local changes while still preserving an overall smoothing property. The newly developed methodology will therefore yield a flexible and powerful framework which ensures that data being multivariate in the response and covariates can be analyzed appropriately. The developed methods will be implemented in the prominent VGAM package of the software environment R.
Many phenomena that we observe in nature have an underlying nonlinear relationship. An example are concentrations of chemical elements measured in the soil. In case of a location which is promising for mineral exploration, we would expect an increase in some element concentration values around this target, and the increase might follow a nonlinear trend. There might also be different sources of error which cause noise in the signal, and this could result in problems for estimating the underlying signal. One of the objectives of this project was to develop new methods for nonlinear trend smoothing which are less sensitive to noise in the measurements, but which are still sensitive enough to identify relevant peaks. As the methods are less sensitive to spikes and other forms of artifacts, they are called robust against outliers. Robust smoothing methods have also been considered for other models, being more general than just smoothing methods. Once the signals are robustly smoothened, it can still happen that signals from some observations are atypical. For example, Covid infection data from different countries over time might be very heterogeneous due to varying policies in the countries. Even after smoothing, these differences in the signals are visible. If the focus is on several signals, e.g. Covid infections, hospitalizations, deaths, etc., it is no longer straightforward to visually investigate if some countries show atypical joint behavior. Moreover, the degree of heterogeneity should not be evaluated based on the absolute numbers, as they are mainly driven by the population sizes. In statistics, there is a way to look at relative information, which is called compositional data analysis. We developed a new approach to identify outlying observations (countries) of smoothed multivariate signals which are treated as functions over time, and where the functional observations are considered as compositions. Further work has been devoted to the identification of groups of variables in multivariate data that interact together. For example, for metabolomic data in bioinformatics it is known that some metabolites interact jointly as a group, and that several of these groups can explain an underlying phenomenon, such as the type of a disease. At the same time, many other measured metabolites are completely irrelevant for the disease identification. Several methods to cope with this problem have been proposed the literature, but they do not consider the data as compositions, where only relative information is relevant. An example are metabolomic data, where the measured values depend on external settings, and by modifying the settings, the measured values could increase or decrease by a certain factor. We linked the compositional data analysis methodology with the identification of a network or graph structure in variables, which allows for a significantly simplified interpretable model outcome.
- Technische Universität Wien - 100%
Research Output
- 5 Citations
- 10 Publications
-
2022
Title Spatial dependence, trends, functional outliers and sparsity in Compositional Data Analysis Type Other Author Rieser C Link Publication -
2023
Title Edgewise Outliers of Network Indexed Signals DOI 10.1109/tsp.2023.3347646 Type Journal Article Author Rieser C Journal IEEE Transactions on Signal Processing Pages 762-773 -
2023
Title Extending compositional data analysis from a graph signal processing perspective DOI 10.1016/j.jmva.2023.105209 Type Journal Article Author Rieser C Journal Journal of Multivariate Analysis Pages 105209 Link Publication -
2023
Title Edgewise outliers of network indexed signals DOI 10.48550/arxiv.2307.11239 Type Preprint Author Rieser C -
2022
Title Extending compositional data analysis from a graph signal processing perspective DOI 10.48550/arxiv.2201.10610 Type Preprint Author Rieser C -
2020
Title A Method to Identify Geochemical Mineralization on Linear Transects DOI 10.17713/ajs.v49i4.1133 Type Journal Article Author Mikšová D Journal Austrian Journal of Statistics Pages 89-98 Link Publication -
2021
Title Identification of Mineralization in Geochemistry Along a Transect Based on the Spatial Curvature of Log-Ratios DOI 10.1007/s11004-021-09930-4 Type Journal Article Author Mikšová D Journal Mathematical Geosciences Pages 1513-1533 Link Publication -
2021
Title Compositional trend filtering DOI 10.33039/ami.2021.02.004 Type Journal Article Author Rieser C Journal Annales Mathematicae et Informaticae Pages 257-270 Link Publication -
2021
Title Outlier Detection for Pandemic-Related Data Using Compositional Functional Data Analysis DOI 10.1007/978-3-030-78334-1_12 Type Book Chapter Author Rieser C Publisher Springer Nature Pages 251-266 Link Publication -
2021
Title Identification of Mineralization in Geochemistry for Grid Sampling Using Generalized Additive Models DOI 10.1007/s11004-021-09929-x Type Journal Article Author Mikšová D Journal Mathematical Geosciences Pages 1861-1880