Algorithms for topological data analysis
Algorithms for topological data analysis
Disciplines
Computer Sciences (45%); Mathematics (55%)
Keywords
-
Topological Data Analysis,
Persistent Homology,
Computational Topology,
Computational Geometry,
Algorithm Engineering,
Mathematical Software
We are living in a time in which huge amounts of data are constantly generated. For instance around 300 hours of video material are uploaded to YouTube every minute. This mass of data calls for intelligent methods to extract the relevant information from a data collection; this task is called data analysis. Application areas include recommendation systems for users of audio or video streaming services or the automatic detection of anomalies on surveillance cameras. In the past 15 years, a novel and perhaps surprising approach of data analysis has been developed: the qualitative properties of a data set are studied with the mathematical theory of algebraic topology. The rough idea is that the data is transformed into a geometric shape and it is studied in what way this shape is connected. As an example, a ball and a doughnut are connected differently because the latter has a hole in its center. Such a topological analysis often reveals valuable information about the data set, as demonstrated by a multitude of applications to realistic scenarios, for instance, in biology, physics, or computer graphics. However, the computation of topological properties on a computer requires a lot of time for large data sets, restricting current uses of the technique to rather small data. The goal of project is to change that: we want to develop computer programs that are able to process data sizes which are out-of-reach with current technology. In this way, we significantly expand the range of applications for which topological data analysis can provide new insights. Such an advancement, however, requires novel ways to compute the relevant topological information from data. To prove the advantages of our novel methods, we will analyze them in the framework of theoretical computer science, but also compare them with existing approaches on realistic data sets. While the latter seems to be the natural measure of quality, the former is equally important because it allows the comparison of computational methods over all possible future inputs, while experiments can only measure the performance of currently available inputs. Designing solutions efficient in both way yields to long-lasting contributions to the research field. This project has a remarkably interdisciplinary character, bringing together algebraic topology, data analysis, geometry, theoretical computer science, and software engineering. It establishes efficient computations as the bridge between abstract mathematics and real-world applications and paves the way of using topological methods as a standard tool in the context of data analytics.
In this project, we have set the goal to examine algorithmic questions in the area of topological data analysis and extend the borders of computational feasibility. For that, we both used methods of theoretical computer science as well as algorithm engineering in order to tackle questions from different points of view. We introduce some of our main results: (1) A main problem of topological data analysis is the computation of persistent barcodes. In there, we face a discrepancy between the theoretical complexity of the problem (the algorithm requires cubic time in the size of the input in the worst case) and the practically observed complexity (linear and therefore, fortunately, very fast). We could show that the complexity of barcode computation for realistic models is better than cubic in expectation. That means, we were able to formalize and prove the folklore statement that worst-case instances are ``rare'', using methods from probabilistic topology. (2) Multi-parameter persistence is a sub-area of topological data analysis with increasing importance. An unsolved problem in that context was the efficient computation of the interleaving distance between two instances. The significance of the question stems from the fact that the interleaving distance is, in a mathematically provable sense, the ``best'' distance measure. We could show that the computation of that distance is NP-hard, as well as the approximation of the distance up to a factor of 3. This implies that the existence of an efficient algorithm is very unlikely. A positive effect of this result is that researchers can now focus on other distance measures. Our project also has proposed several efficient algorithms for the computation of the alternative matching distance of multi-parameter persistence. (3) An important ingredient for the fast computation with data is compression: how can we describe the essence of a data set as efficient as possible? For multi-parameterized data sets, we have developed an algorithm for that: it compresses data in the size of gigabytes within a few seconds without changing the topological properties. This significantly facilitates the analysis of a dataset. Our software is publicly available. The discovery of many significant results confirms our comprehensive approach to the problem. We expect that both our theoretical and practical results will further advance the research field in the next future and will render further applications of persistent homology possible. Because topological data analysis is used in many different fields (in practically all natural sciences and beyond), our results have indirect impact far beyond the field of computational topology.
- Technische Universität Graz - 100%
- Jean-Daniel Boissonnat, INRIA Sophia Antipolis - France
- Pawel Dlotko, Polish Academy of Sciences - Poland
- Dmitriy Morozov, Lawrence Berkeley National Laboratory - USA
- Peter Bubenik, University of Florida - USA
- Sharath Raghvendra, Virginia Polytechnic Institute and State University - USA
Research Output
- 2334 Citations
- 61 Publications
- 1 Software
- 2 Scientific Awards
- 1 Fundings
-
2020
Title Exact computation of the matching distance on 2-parameter persistence modules DOI 10.20382/jocg.v11i2a2 Type Other Author Kerber M Link Publication -
2020
Title Metric Spaces with Expensive Distances DOI 10.1142/s0218195920500077 Type Journal Article Author Kerber M Journal International Journal of Computational Geometry & Applications Pages 141-165 Link Publication -
2020
Title Efficient Approximation of the Matching Distance for 2-Parameter Persistence DOI 10.4230/lipics.socg.2020.53 Type Conference Proceeding Abstract Author Kerber M Conference LIPIcs, Volume 164, SoCG 2020 Pages 53:1 - 53:16 Link Publication -
2022
Title Keeping it sparse: Computing Persistent Homology revisited DOI 10.48550/arxiv.2211.09075 Type Preprint Author Bauer U -
2022
Title Average Complexity of Matrix Reduction for Clique Filtrations DOI 10.1145/3476446.3535474 Type Conference Proceeding Abstract Author Giunti B Pages 187-196 Link Publication -
2022
Title Average complexity of matrix reduction for clique filtrations Type Conference Proceeding Abstract Author Giunti B Conference 47th International Symposium on Symbolic and Algebraic Computation -
2022
Title A Unified View on the Functorial Nerve Theorem and its Variations Type Other Author Bauer U Link Publication -
2021
Title Computing the Multicover Bifiltration DOI 10.48550/arxiv.2103.07823 Type Preprint Author Corbet R -
2021
Title Improved approximate rips filtrations with shifted integer lattices and cubical complexes DOI 10.17169/refubium-30499 Type Other Author Choudhary A Link Publication -
2021
Title Fast Minimal Presentations of Bi-graded Persistence Modules; In: 2021 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX) DOI 10.1137/1.9781611976472.16 Type Book Chapter Publisher Society for Industrial and Applied Mathematics -
2021
Title Computing the multicover bifiltration Type Conference Proceeding Abstract Author Corbet R Conference 37th International Symposium on Computational Geometry Pages 27:1-27:17 Link Publication -
2021
Title Notes on Pivot Pairing Type Conference Proceeding Abstract Author Giunti B Conference 37th European Workshop on Computational Geometry -
2021
Title Computing the Multicover Bifiltration DOI 10.4230/lipics.socg.2021.27 Type Conference Proceeding Abstract Author Corbet R Conference LIPIcs, Volume 189, SoCG 2021 Pages 27:1 - 27:17 Link Publication -
2024
Title Amplitudes in persistence theory DOI 10.1016/j.jpaa.2024.107770 Type Journal Article Author Giunti B Journal Journal of Pure and Applied Algebra Pages 107770 Link Publication -
2019
Title Exact computation of the matching distance on 2-parameter persistence modules Type Conference Proceeding Abstract Author Kerber M Conference 35th International Symposium on Computational Geometry Pages 46:1-46:15 Link Publication -
2019
Title Chunk Reduction for Multi-Parameter Persistent Homology Type Conference Proceeding Abstract Author Fugacci U Conference 35th International Symposium on Computational Geometry Pages 37:1-37:14 Link Publication -
2019
Title Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms DOI 10.1137/1.9781611975482 Type Book editors Chan T Publisher Society for Industrial & Applied Mathematics (SIAM) Link Publication -
2019
Title Improved Topological Approximations by Digitization; In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms DOI 10.1137/1.9781611975482.166 Type Book Chapter Publisher Society for Industrial and Applied Mathematics -
2019
Title Exact Computation of the Matching Distance on 2-Parameter Persistence Modules DOI 10.4230/lipics.socg.2019.46 Type Conference Proceeding Abstract Author Kerber M Conference LIPIcs, Volume 129, SoCG 2019 Pages 46:1 - 46:15 Link Publication -
2019
Title Chunk Reduction for Multi-Parameter Persistent Homology DOI 10.4230/lipics.socg.2019.37 Type Conference Proceeding Abstract Author Fugacci U Conference LIPIcs, Volume 129, SoCG 2019 Pages 37:1 - 37:14 Link Publication -
2021
Title Improved Approximate Rips Filtrations with Shifted Integer Lattices and Cubical Complexes DOI 10.48550/arxiv.2105.05151 Type Preprint Author Choudhary A -
2021
Title Improved approximate rips filtrations with shifted integer lattices and cubical complexes DOI 10.1007/s41468-021-00072-4 Type Journal Article Author Choudhary A Journal Journal of Applied and Computational Topology Pages 425-458 Link Publication -
2021
Title Compression for 2-Parameter Persistent Homology DOI 10.48550/arxiv.2107.10924 Type Preprint Author Fugacci U -
2021
Title Amplitudes in persistence theory DOI 10.48550/arxiv.2107.09036 Type Preprint Author Giunti B -
2021
Title Notes on pivot pairings DOI 10.48550/arxiv.2101.00451 Type Preprint Author Giunti B -
2023
Title A unified view on the functorial nerve theorem and its variations DOI 10.1016/j.exmath.2023.04.005 Type Journal Article Author Bauer U Journal Expositiones Mathematicae Pages 125503 -
2019
Title A kernel for multi-parameter persistent homology DOI 10.1016/j.cagx.2019.100005 Type Journal Article Author Corbet R Journal Computers & Graphics: X Pages 100005 Link Publication -
2022
Title A Unified View on the Functorial Nerve Theorem and its Variations DOI 10.48550/arxiv.2203.03571 Type Preprint Author Bauer U -
2020
Title Efficient Approximation of the Matching Distance for 2-parameter persistence Type Conference Proceeding Abstract Author Kerber M Conference 36th International Symposium on Computational Geometry Pages 53:1-53:16 Link Publication -
2020
Title Stable and consistent density-based clustering Type Other Author Rolle A Link Publication -
2025
Title Expected Complexity of Barcode Reduction DOI 10.1007/s41468-025-00218-8 Type Journal Article Author Giunti B Journal Journal of Applied and Computational Topology Pages 29 Link Publication -
2025
Title Stable and consistent density-based clustering via multiparameter persistence DOI 10.48550/arxiv.2005.09048 Type Preprint Author Rolle A -
2025
Title Expected Complexity of Persistence Barcode Computation via Matrix Reduction DOI 10.48550/arxiv.2111.02125 Type Preprint Author Giunti B -
2017
Title The Representation Theorem of Persistent Homology Revisited and Generalized DOI 10.48550/arxiv.1707.08864 Type Preprint Author Corbet R -
2017
Title Improved Approximate Rips Filtrations with Shifted Integer Lattices DOI 10.48550/arxiv.1706.07399 Type Preprint Author Choudhary A -
2017
Title Barcodes of Towers and a Streaming Algorithm for Persistent Homology DOI 10.48550/arxiv.1701.02208 Type Preprint Author Kerber M -
2018
Title A Note on Planar Monohedral tilings Type Conference Proceeding Abstract Author Aichholzer O Conference 34th European Workshop on Computational Geometry Pages 31:1-31:6 Link Publication -
2017
Title Polynomial-Sized Topological Approximations Using the Permutahedron DOI 10.1007/s00454-017-9951-2 Type Journal Article Author Choudhary A Journal Discrete & Computational Geometry Pages 42-80 Link Publication -
2019
Title Discrete Morse Theory for Computing Zigzag Persistence DOI 10.1007/978-3-030-24766-9_39 Type Book Chapter Author Maria C Publisher Springer Nature Pages 538-552 -
2019
Title Topology-Preserving Terrain Simplification DOI 10.48550/arxiv.1912.03032 Type Preprint Author Fugacci U -
2019
Title Computing the Interleaving Distance is NP-Hard DOI 10.1007/s10208-019-09442-y Type Journal Article Author Bjerkevik H Journal Foundations of Computational Mathematics Pages 1237-1271 Link Publication -
2019
Title Local-Global Merge Tree Computation with Local Exchanges DOI 10.1145/3295500.3356188 Type Conference Proceeding Abstract Author Nigmetov A Pages 1-13 Link Publication -
2018
Title The representation theorem of persistence revisited and generalized DOI 10.1007/s41468-018-0015-3 Type Journal Article Author Corbet R Journal Journal of Applied and Computational Topology Pages 1-31 Link Publication -
2018
Title Barcodes of Towers and a Streaming Algorithm for Persistent Homology DOI 10.1007/s00454-018-0030-0 Type Journal Article Author Kerber M Journal Discrete & Computational Geometry Pages 852-879 Link Publication -
2018
Title Chunk Reduction for Multi-Parameter Persistent Homology DOI 10.48550/arxiv.1812.08580 Type Preprint Author Fugacci U -
2018
Title Discrete Morse Theory for Computing Zigzag Persistence DOI 10.48550/arxiv.1807.05172 Type Preprint Author Maria C -
2018
Title Improved Topological Approximations by Digitization DOI 10.48550/arxiv.1812.04966 Type Preprint Author Choudhary A -
2018
Title Computing the interleaving distance is NP-hard DOI 10.48550/arxiv.1811.09165 Type Preprint Author Bjerkevik H -
2018
Title A Kernel for Multi-Parameter Persistent Homology DOI 10.48550/arxiv.1809.10231 Type Preprint Author Corbet R -
2018
Title Exact computation of the matching distance on 2-parameter persistence modules DOI 10.48550/arxiv.1812.09085 Type Preprint Author Kerber M -
2017
Title Improved Approximate Rips Filtrations with Shifted Integer Lattices DOI 10.4230/lipics.esa.2017.28 Type Conference Proceeding Abstract Author Choudhary A Conference LIPIcs, Volume 87, ESA 2017 Pages 28:1 - 28:13 Link Publication -
2017
Title Barcodes of Towers and a Streaming Algorithm for Persistent Homology DOI 10.4230/lipics.socg.2017.57 Type Conference Proceeding Abstract Author Kerber M Conference LIPIcs, Volume 77, SoCG 2017 Pages 57:1 - 57:16 Link Publication -
2020
Title Decomposing filtered chain complexes: geometry behind barcoding algorithms DOI 10.48550/arxiv.2012.01033 Type Preprint Author Chachólski W -
2020
Title Topology-Preserving Terrain Simplification DOI 10.1145/3397536.3422237 Type Conference Proceeding Abstract Author Fugacci U Pages 36-47 Link Publication -
2023
Title Pruning vineyards: updating barcodes and representative cycles by removing simplices DOI 10.48550/arxiv.2312.03925 Type Preprint Author Giunti B -
2023
Title Discrete Morse Theory for Computing Zigzag Persistence DOI 10.1007/s00454-023-00594-x Type Journal Article Author Maria C Journal Discrete & Computational Geometry Pages 708-737 Link Publication -
2023
Title Computing the Multicover Bifiltration DOI 10.1007/s00454-022-00476-8 Type Journal Article Author Corbet R Journal Discrete & Computational Geometry Pages 376-405 Link Publication -
2023
Title Decomposing filtered chain complexes: Geometry behind barcoding algorithms DOI 10.1016/j.comgeo.2022.101938 Type Journal Article Author Chachólski W Journal Computational Geometry Pages 101938 Link Publication -
2023
Title Compression for 2-parameter persistent homology DOI 10.1016/j.comgeo.2022.101940 Type Journal Article Author Fugacci U Journal Computational Geometry Pages 101940 Link Publication -
0
DOI 10.1145/3476446 Type Other -
0
DOI 10.1145/3397536 Type Other
-
2020
Link
Title Reeber v2.0 DOI 10.11578/dc.20210301.26 Link Link
-
2022
Title Distinguished Student Author Award of ISSAC 2022 Type Research prize Level of Recognition Continental/International -
2019
Title Best paper award at SMI 2019 Type Research prize Level of Recognition Continental/International
-
2021
Title Multi-parameter Persistent Homology Type Research grant (including intramural programme) Start of Funding 2021