• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Birgit Mitter
      • Oliver Spadiut
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • Alternative Methods to Animal Testing
        • European Partnership BE READY
        • European Partnership Biodiversa+
        • European Partnership BrainHealth
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • LUKE – Ukraine
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Korea
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

On Valid and Reliable Experiments in Music IR

On Valid and Reliable Experiments in Music IR

Arthur Flexer (ORCID: 0000-0002-1691-737X)
  • Grant DOI 10.55776/P31988
  • Funding program Principal Investigator Projects
  • Status ended
  • Start May 1, 2019
  • End April 30, 2023
  • Funding amount € 347,476
  • Project website

Disciplines

Other Humanities (15%); Computer Sciences (70%); Arts (15%)

Keywords

    Algorithmic Fairness, Annotation, Music Information Retrieval, Evaluation, Machine Learning, Hubness

Abstract Final report

Every experimental science is based on the notion of valid and reliable experiments, i.e. experiments that really measure what one wants to examine and experiments which yield repeatable results. Music Information Retrieval (MIR), as the interdisciplinary science of retrieving information from music, conducts experiments with a multitude of methods from machine learning, statistics, signal processing, artificial intelligence, etc. It relies on the proper evaluation of all these methods to measure the success of new algorithms, or, in more general terms, chart the progress of the whole field of MIR. The principal role of computer experiments and their statistical evaluation within MIR is now widely accepted and understood, but the more fundamental notions of validity and reliability in MIR experiments are still rarely discussed within the field. This lack of awareness for valid and reliable MIR experimentation is at the heart of a number of seemingly puzzling phenomena in recent MIR research. Marginally and imperceptibly altered data, so-called adversarial examples, are able to drastically reduce performance of state of the art MIR systems. It has even been claimed that such easily fooled MIR systems therefore do not use musical knowledge at all. Other authors have pointed out that, due to a lack of inter-rater agreement when annotating ground truth data, performance in many MIR tasks can never exceed a certain glass ceiling, since it is not meaningful for an algorithm to model specific raters. A problem of algorithmic bias are difficulties of learning in high dimensional spaces, where some data objects act as `hubs`, being abnormally close to many other data objects thereby causing disturbances in music recommendation, since hub songs are being recommended over and over again. Although a small but growing body of work and literature concerning these MIR problems exists, what is still lacking is an understanding of their true nature: they are problems of validity and reliability in MIR experimentation. Since a failure to comprehend this fundamental issue at the heart of MIR is severely impeding progress in the field, our main goals in this project are: (i) to provide a framework for valid and reliable experimentation in MIR; (ii) to advance the state of the art concerning adversarial examples, inter-rater agreement and algorithmic bias by conducting exemplary valid and reliable MIR experiments. The main focus of this project is on MIR where the above mentioned phenomena are especially apparent, but the very same problems of course have ramifications in general machine learning also, making sure that our research has the potential to advance the progress in MIR and far beyond.

On Valid and Reliable Experiments in Music Information Retrieval (MIR) Every experimental science is based on the notion of valid and reliable experiments. Validity is the truth of an inference made from evidence, such as data collected in an experiment, while reliable experiments are experiments which yield repeatable results. MIR, as the interdisciplinary science of retrieving information from music, conducts experiments with a multitude of methods from machine learning, statistics, signal processing, artificial intelligence, etc. It relies on the proper evaluation of all these methods to measure the success of new algorithms, or, in more general terms, chart the progress of the whole field of MIR. At the outset of this project, the principal role of computer experiments within MIR was already widely accepted and understood, but the more fundamental notions of validity and reliability in MIR experiments were still in need of thorough discussion and clarification. This was clearly apparent when we researched a number of seemingly puzzling phenomena in MIR research and understood their true nature - they are problems of validity and reliability: (i) marginally and imperceptibly altered data, so-called adversarial examples, are able to drastically reduce performance of state of the art MIR systems (lack of construct validity and reliability); (ii) due to low inter-rater agreement when annotating ground truth training data for MIR systems, performance in many MIR tasks can never exceed a certain glass ceiling, since perfect performance can only be achieved for individual annotators, never for a group of users that are in disagreement (lack of external validity and reliability); (iii) a prominent problem of algorithmic bias are difficulties of learning in high dimensional spaces, where some data objects act as "hubs", being abnormally close to many other data objects thereby causing unfair music recommendation, since hub songs are being recommended over and over again (lack of internal validity). In our project we were able to advance the state of the art concerning adversarial examples, inter-rater agreement and algorithmic bias by conducting exemplary valid and reliable MIR experiments. Most importantly our main result is a report and theoretical framework discussing what a valid and reliable experiment in MIR is. To achieve this, we illustrated four major types of validity and discussed threats to each type arising during experiments. Our discussion was grounded with a prototypical MIR experiment on music classification. We also provided concrete guidance to MIR practitioners on how to make valid inferences from data collected from their experiments. All this together aims to bring within the realm of MIR what validity means, why it is important, and how it can be threatened.

Research institution(s)
  • Universität Linz - 100%
International project participants
  • Julián Urbano, Delft University of Technology - Netherlands
  • Bob L. Sturm, KTH Royal Institute of Technology - Sweden

Research Output

  • 66 Citations
  • 31 Publications
  • 3 Disseminations
  • 1 Fundings
Publications
  • 2023
    Title Validity in Music Information Research Experiments
    Type Other
    Author Flexer A.
    Link Publication
  • 2023
    Title A Review of Validity and its Relationship to Music Information Research
    Type Conference Proceeding Abstract
    Author Flexer A
    Conference 24th International Society for Music Information Retrieval Conference
    Link Publication
  • 2023
    Title Validity in Music Information Research Experiments
    DOI 10.48550/arxiv.2301.01578
    Type Preprint
    Author Flexer A
    Link Publication
  • 2023
    Title A Review of Validity and Its Relationship to Music Information Research
    DOI 10.5281/zenodo.10265218
    Type Conference Proceeding Abstract
    Author Arthur Flexer
    Link Publication
  • 2023
    Title A Review of Validity and Its Relationship to Music Information Research
    DOI 10.5281/zenodo.10265219
    Type Conference Proceeding Abstract
    Author Arthur Flexer
    Link Publication
  • 2022
    Title Concept-Based Techniques for "Musicologist-friendly" Explanations in a Deep Music Classifier
    DOI 10.48550/arxiv.2208.12485
    Type Preprint
    Author Foscarin F
  • 2022
    Title Constructing adversarial examples to investigate the plausibility of explanations in deep audio and image classifiers
    DOI 10.1007/s00521-022-07918-7
    Type Journal Article
    Author Hoedt K
    Journal Neural Computing and Applications
    Pages 10011-10029
    Link Publication
  • 2021
    Title On Evaluation of Inter- and Intra-Rater Agreement in Music Recommendation
    DOI 10.5334/tismir.107
    Type Journal Article
    Author Flexer A
    Journal Transactions of the International Society for Music Information Retrieval
    Pages 182
    Link Publication
  • 2022
    Title Defending a Music Recommender Against Hubness-Based Adversarial Attacks
    Type Conference Proceeding Abstract
    Author Flexer A.
    Conference Proceedings of the 19th Sound and Music Computing Conference
    Link Publication
  • 2022
    Title Concept-Based Techniques for "Musicologist-friendly" Explanations in a Deep Music Classifier
    Type Conference Proceeding Abstract
    Author Foscarin F.
    Conference Proceedings of the 23rd International Society for Music Information Retrieval Conference
    Link Publication
  • 2021
    Title On End-to-End White-Box Adversarial Attacks in Music Information Retrieval
    DOI 10.5334/tismir.85
    Type Journal Article
    Author Prinz K
    Journal Transactions of the International Society for Music Information Retrieval
    Pages 93
    Link Publication
  • 2021
    Title On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples
    DOI 10.48550/arxiv.2107.09045
    Type Preprint
    Author Praher V
  • 2020
    Title End-to-End Adversarial White Box Attacks on Music Instrument Classification
    Type Other
    Author Flexer A.
    Link Publication
  • 2020
    Title The Impact of Label Noise on a Music Tagger
    DOI 10.48550/arxiv.2008.06273
    Type Preprint
    Author Prinz K
  • 2020
    Title End-to-End Adversarial White Box Attacks on Music Instrument Classification
    DOI 10.48550/arxiv.2007.14714
    Type Preprint
    Author Prinz K
  • 2020
    Title DeepNOG: fast and accurate protein orthologous group assignment
    DOI 10.1093/bioinformatics/btaa1051
    Type Journal Article
    Author Feldbauer R
    Journal Bioinformatics
    Pages 5304-5312
    Link Publication
  • 2019
    Title scikit-hubness: Hubness Reduction and Approximate Neighbor Search
    DOI 10.48550/arxiv.1912.00706
    Type Preprint
    Author Feldbauer R
  • 2022
    Title Defending a Music Recommender Against Hubness-Based Adversarial Attacks
    DOI 10.48550/arxiv.2205.12032
    Type Preprint
    Author Hoedt K
  • 2022
    Title Concept-Based Techniques for "Musicologist-Friendly" Explanations in Deep Music Classifiers
    DOI 10.5281/zenodo.7316804
    Type Conference Proceeding Abstract
    Author Foscarin F
    Link Publication
  • 2022
    Title Defending a Music Recommender Against Hubness-Based Adversarial Attacks
    DOI 10.5281/zenodo.6573391
    Type Conference Proceeding Abstract
    Author Flexer A
    Link Publication
  • 2022
    Title Defending a Music Recommender Against Hubness-Based Adversarial Attacks
    DOI 10.5281/zenodo.6573390
    Type Conference Proceeding Abstract
    Author Flexer A
    Link Publication
  • 2022
    Title Concept-Based Techniques for "Musicologist-Friendly" Explanations in Deep Music Classifiers
    DOI 10.5281/zenodo.7316803
    Type Conference Proceeding Abstract
    Author Foscarin F
    Link Publication
  • 2022
    Title Defending a Music Recommender Against Hubness-Based Adversarial Attacks
    DOI 10.5281/zenodo.6798200
    Type Conference Proceeding Abstract
    Author Flexer A
    Link Publication
  • 2021
    Title On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples
    Type Conference Proceeding Abstract
    Author Praher V.
    Conference Proceedings of the 22nd International Society for Music Information Retrieval Conference
    Link Publication
  • 2021
    Title On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples
    DOI 10.5281/zenodo.5624470
    Type Conference Proceeding Abstract
    Author Praher V
    Link Publication
  • 2021
    Title On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples
    DOI 10.5281/zenodo.5624471
    Type Conference Proceeding Abstract
    Author Praher V
    Link Publication
  • 2020
    Title scikit-hubness: Hubness Reduction and Approximate Neighbor Search
    DOI 10.21105/joss.01957
    Type Journal Article
    Author Feldbauer R
    Journal Journal of Open Source Software
    Pages 1957
    Link Publication
  • 2020
    Title The Impact of Label Noise on a Music Tagger
    Type Conference Proceeding Abstract
    Author Flexer A.
    Conference Proceedings of the 13th International Workshop on Machine Learning and Music
    Link Publication
  • 2019
    Title Weak Multi-Label Audio-Tagging with Class Noise
    Type Other
    Author Flexer A.
    Link Publication
  • 2019
    Title Audio Tagging With Convolutional Neural Networks Trained With Noisy Data
    Type Other
    Author Paischer F.
    Link Publication
  • 2019
    Title Can We Increase Inter- and Intra-Rater Agreement in Modeling General Music Similarity?
    Type Conference Proceeding Abstract
    Author Flexer A.
    Conference Proceedings of 20th International Society for Music Information Retrieval Conference
    Link Publication
Disseminations
  • 2020 Link
    Title Research visit and public talk Bob Sturm
    Type A talk or presentation
    Link Link
  • 2020 Link
    Title Special session on validity of MIR research
    Type A formal working group, expert panel or dialogue
    Link Link
  • 2023
    Title Interview with Austrian radio station
    Type A press release, press conference or response to a media enquiry/interview
Fundings
  • 2023
    Title A Music Information Retrieval Approach to Pop Music Culture
    Type Research grant (including intramural programme)
    Start of Funding 2023
    Funder Austrian Science Fund (FWF)

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF