• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • ERA-NET TRANSCAN
        • Alternative Methods to Animal Testing
        • European Partnership Biodiversa+
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

Classification – Preprocessed and high-dimensional data sets

Classification – Preprocessed and high-dimensional data sets

Lukas Steinberger (ORCID: 0000-0002-2376-114X)
  • Grant DOI 10.55776/I5484
  • Funding program International - Multilateral Initiatives
  • Status ended
  • Start September 1, 2021
  • End August 31, 2025
  • Funding amount € 167,076

DFG-Forschungsgruppen

Disciplines

Computer Sciences (50%); Mathematics (50%)

Keywords

    Classification, Differential Privacy, High-Dimensional Data, Imbalanced Data

Abstract Final report

In this project, so-called "classification methods" are to be developed and mathematically investigated, which address different new challenges in modern data processing. A "classification method" is a precisely defined algorithm that assigns an object to one of several possible classes based on the objects properties. These objects can, for example, be people who are assigned either to the class "healthy" or to the class "sick" based on the results of medical examinations. Another example would be the automatic assignment of pieces of music to genres such as `Rock`, `Country`, `Classic`, etc. Nowadays there are of course a large number of statistical methods for such classification problems available, which in their respective area of application also deliver very good or even in a certain sense optimal results. However, many modern classification problems, such as those mentioned above, pose particular challenges for data analysis, which are often no longer met by traditional methods. The project specifically addresses two of these new challenges. On the one hand, classification procedures are to be developed for which the original data are not available, due to considerations of data privacy protection, but only suitably anonymized data are provided. The original data for example sensitive patient data in medical research must first be preprocessed in order to ensure the anonymity of the patient. The diagnostic classification method that is subsequently used must take this anonymization step into account in a suitable manner in order to minimize the expected number of possible misclassifications. Another challenge with modern classification problems is the large amount of available data. Even though the basic principle "the more information there is about the object to be classified, the better" still holds true, it is by no means always clear how the relevant information, that ensures reliable assignment to one of the possible classes, is to be extracted from the existing flood of data. This becomes clear, for instance, from the above example of music classification, where, from the computer`s point of view, the available information is given as millions of zeros and ones. Classification methods for such large amounts of data are also confronted with the additional problem that large scale data processing is very time consuming. Conventional methods would therefore require too much time to generate a classification, which is why computationally more efficient approximate methods are to be developed and investigated in this project.

The goal of this project was to develop and mathematically analyze so-called "classification methods" that address various new challenges arising in modern data processing. A "classification method" is a precisely defined algorithm that assigns an object to one of several possible classes based on its properties. Such objects might, for example, be individuals who-based on the results of medical examinations-are assigned to either the "healthy" or "ill" class. Another example would be the automatic categorization of musical pieces into classes such as "rock," "country," "classical," etc. Two of the challenges in classification that are of great interest today concern, on the one hand, the protection of data owners' privacy (as in the example of patient data) and, on the other hand, the enormous computational effort required to make classification decisions due to the complexity and high dimensionality of the data (as in the example of musical recordings). However, as the project progressed, it soon became clear that in order to develop suitable classification methods, the mathematically simpler case of regression methods needed to be addressed first, since there was no sufficient preliminary work in this area for our purposes. Therefore, the main project work focused on the development of regression methods that account for the issues of privacy protection and computational feasibility in high-dimensional settings. For the problem of data processing under privacy protection, we succeeded-using the increasingly popular concept of differential privacy-in developing a comprehensive statistical theory of optimal estimation methods, along with numerical techniques for their practical computation. The idea of differential privacy is to modify potentially sensitive raw data using an appropriate random mechanism before sharing or publishing them, so that the data no longer reveal information about any particular individual but still, in aggregate, allow valid conclusions about the population. Here, we managed to identify the random mechanism that is optimal under the strong criterion of statistical efficiency, and we developed an algorithm for its practical implementation. For the problem of time-consuming computation of regression methods in high-dimensional data, we succeeded in developing an algorithm based on a slight modification of the popular gradient method. It can be shown mathematically that this algorithm is not only faster to execute than the standard statistical methods currently in use, but also yields, in a strong sense, the best possible predictions. In a follow-up project, the insights gained and the methods developed will be extended to the classification setting.

Research institution(s)
  • Universität Wien - 100%
International project participants
  • Cristina Butucea, ENSAE - France
  • Angelika Rohde, Universität Freiburg - Germany

Research Output

  • 11 Citations
  • 8 Publications
Publications
  • 2025
    Title Comparing regularisation paths of (conjugate) gradient estimators in ridge regression
    DOI 10.48550/arxiv.2503.05542
    Type Preprint
    Author Hucker L
    Link Publication
  • 2025
    Title Implicit vs. explicit regularization for high-dimensional gradient descent
    DOI 10.48550/arxiv.2502.10578
    Type Preprint
    Author Stark T
    Link Publication
  • 2025
    Title Robust signal recovery in Hadamard spaces
    DOI 10.1016/j.jmva.2025.105469
    Type Journal Article
    Author Köstenberger G
    Journal Journal of Multivariate Analysis
    Pages 105469
    Link Publication
  • 2023
    Title Uncertainty quantification via cross-validation and its variants under algorithmic stability
    DOI 10.48550/arxiv.2312.14596
    Type Preprint
    Author Amann N
    Link Publication
  • 2023
    Title Interactive versus noninteractive locally differentially private estimation: Two elbows for the quadratic functional
    DOI 10.1214/22-aos2254
    Type Journal Article
    Author Butucea C
    Journal The Annals of Statistics
  • 2024
    Title Efficiency in local differential privacy
    DOI 10.1214/24-aos2425
    Type Journal Article
    Author Steinberger L
    Journal The Annals of Statistics
  • 2024
    Title Efficient Estimation of a Gaussian Mean with Local Differential Privacy
    DOI 10.48550/arxiv.2402.04840
    Type Preprint
    Author Kalinin N
    Link Publication
  • 2023
    Title Conditional predictive inference for stable algorithms
    DOI 10.1214/22-aos2250
    Type Journal Article
    Author Steinberger L
    Journal The Annals of Statistics

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF