Projectdetail

Grant DOI 10.55776/P31988
Funding program Principal Investigator Projects
Status ended
Start May 1, 2019
End April 30, 2023
Funding amount € 347,476
Project website

Disciplines

Other Humanities (15%); Computer Sciences (70%); Arts (15%)

Keywords

Algorithmic Fairness, Annotation, Music Information Retrieval, Evaluation, Machine Learning, Hubness

Abstract

Final report

Every experimental science is based on the notion of valid and reliable experiments, i.e. experiments that really measure what one wants to examine and experiments which yield repeatable results. Music Information Retrieval (MIR), as the interdisciplinary science of retrieving information from music, conducts experiments with a multitude of methods from machine learning, statistics, signal processing, artificial intelligence, etc. It relies on the proper evaluation of all these methods to measure the success of new algorithms, or, in more general terms, chart the progress of the whole field of MIR. The principal role of computer experiments and their statistical evaluation within MIR is now widely accepted and understood, but the more fundamental notions of validity and reliability in MIR experiments are still rarely discussed within the field. This lack of awareness for valid and reliable MIR experimentation is at the heart of a number of seemingly puzzling phenomena in recent MIR research. Marginally and imperceptibly altered data, so-called adversarial examples, are able to drastically reduce performance of state of the art MIR systems. It has even been claimed that such easily fooled MIR systems therefore do not use musical knowledge at all. Other authors have pointed out that, due to a lack of inter-rater agreement when annotating ground truth data, performance in many MIR tasks can never exceed a certain glass ceiling, since it is not meaningful for an algorithm to model specific raters. A problem of algorithmic bias are difficulties of learning in high dimensional spaces, where some data objects act as `hubs`, being abnormally close to many other data objects thereby causing disturbances in music recommendation, since hub songs are being recommended over and over again. Although a small but growing body of work and literature concerning these MIR problems exists, what is still lacking is an understanding of their true nature: they are problems of validity and reliability in MIR experimentation. Since a failure to comprehend this fundamental issue at the heart of MIR is severely impeding progress in the field, our main goals in this project are: (i) to provide a framework for valid and reliable experimentation in MIR; (ii) to advance the state of the art concerning adversarial examples, inter-rater agreement and algorithmic bias by conducting exemplary valid and reliable MIR experiments. The main focus of this project is on MIR where the above mentioned phenomena are especially apparent, but the very same problems of course have ramifications in general machine learning also, making sure that our research has the potential to advance the progress in MIR and far beyond.

On Valid and Reliable Experiments in Music Information Retrieval (MIR) Every experimental science is based on the notion of valid and reliable experiments. Validity is the truth of an inference made from evidence, such as data collected in an experiment, while reliable experiments are experiments which yield repeatable results. MIR, as the interdisciplinary science of retrieving information from music, conducts experiments with a multitude of methods from machine learning, statistics, signal processing, artificial intelligence, etc. It relies on the proper evaluation of all these methods to measure the success of new algorithms, or, in more general terms, chart the progress of the whole field of MIR. At the outset of this project, the principal role of computer experiments within MIR was already widely accepted and understood, but the more fundamental notions of validity and reliability in MIR experiments were still in need of thorough discussion and clarification. This was clearly apparent when we researched a number of seemingly puzzling phenomena in MIR research and understood their true nature - they are problems of validity and reliability: (i) marginally and imperceptibly altered data, so-called adversarial examples, are able to drastically reduce performance of state of the art MIR systems (lack of construct validity and reliability); (ii) due to low inter-rater agreement when annotating ground truth training data for MIR systems, performance in many MIR tasks can never exceed a certain glass ceiling, since perfect performance can only be achieved for individual annotators, never for a group of users that are in disagreement (lack of external validity and reliability); (iii) a prominent problem of algorithmic bias are difficulties of learning in high dimensional spaces, where some data objects act as "hubs", being abnormally close to many other data objects thereby causing unfair music recommendation, since hub songs are being recommended over and over again (lack of internal validity). In our project we were able to advance the state of the art concerning adversarial examples, inter-rater agreement and algorithmic bias by conducting exemplary valid and reliable MIR experiments. Most importantly our main result is a report and theoretical framework discussing what a valid and reliable experiment in MIR is. To achieve this, we illustrated four major types of validity and discussed threats to each type arising during experiments. Our discussion was grounded with a prototypical MIR experiment on music classification. We also provided concrete guidance to MIR practitioners on how to make valid inferences from data collected from their experiments. All this together aims to bring within the realm of MIR what validity means, why it is important, and how it can be threatened.

Research institution(s)

Universität Linz - 100%

International project participants

Julián Urbano, Delft University of Technology - Netherlands
Bob L. Sturm, KTH Royal Institute of Technology - Sweden

Research Output

66 Citations
31 Publications
3 Disseminations
1 Fundings

Publications

Title	Validity in Music Information Research Experiments
Type	Other
Author	Flexer A.
Link	Publication

Title	A Review of Validity and its Relationship to Music Information Research
Type	Conference Proceeding Abstract
Author	Flexer A
Conference	24th International Society for Music Information Retrieval Conference
Link	Publication

Title	Validity in Music Information Research Experiments
DOI	10.48550/arxiv.2301.01578
Type	Preprint
Author	Flexer A
Link	Publication

Title	A Review of Validity and Its Relationship to Music Information Research
DOI	10.5281/zenodo.10265218
Type	Conference Proceeding Abstract
Author	Arthur Flexer
Link	Publication

Title	A Review of Validity and Its Relationship to Music Information Research
DOI	10.5281/zenodo.10265219
Type	Conference Proceeding Abstract
Author	Arthur Flexer
Link	Publication

Title	Concept-Based Techniques for "Musicologist-friendly" Explanations in a Deep Music Classifier
DOI	10.48550/arxiv.2208.12485
Type	Preprint
Author	Foscarin F

Title	Constructing adversarial examples to investigate the plausibility of explanations in deep audio and image classifiers
DOI	10.1007/s00521-022-07918-7
Type	Journal Article
Author	Hoedt K
Journal	Neural Computing and Applications
Pages	10011-10029
Link	Publication

Title	On Evaluation of Inter- and Intra-Rater Agreement in Music Recommendation
DOI	10.5334/tismir.107
Type	Journal Article
Author	Flexer A
Journal	Transactions of the International Society for Music Information Retrieval
Pages	182
Link	Publication

Title	Defending a Music Recommender Against Hubness-Based Adversarial Attacks
Type	Conference Proceeding Abstract
Author	Flexer A.
Conference	Proceedings of the 19th Sound and Music Computing Conference
Link	Publication

Title	Concept-Based Techniques for "Musicologist-friendly" Explanations in a Deep Music Classifier
Type	Conference Proceeding Abstract
Author	Foscarin F.
Conference	Proceedings of the 23rd International Society for Music Information Retrieval Conference
Link	Publication

Title	On End-to-End White-Box Adversarial Attacks in Music Information Retrieval
DOI	10.5334/tismir.85
Type	Journal Article
Author	Prinz K
Journal	Transactions of the International Society for Music Information Retrieval
Pages	93
Link	Publication

Title	On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples
DOI	10.48550/arxiv.2107.09045
Type	Preprint
Author	Praher V

Title	End-to-End Adversarial White Box Attacks on Music Instrument Classification
Type	Other
Author	Flexer A.
Link	Publication

Title	The Impact of Label Noise on a Music Tagger
DOI	10.48550/arxiv.2008.06273
Type	Preprint
Author	Prinz K

Title	End-to-End Adversarial White Box Attacks on Music Instrument Classification
DOI	10.48550/arxiv.2007.14714
Type	Preprint
Author	Prinz K

Title	DeepNOG: fast and accurate protein orthologous group assignment
DOI	10.1093/bioinformatics/btaa1051
Type	Journal Article
Author	Feldbauer R
Journal	Bioinformatics
Pages	5304-5312
Link	Publication

Title	scikit-hubness: Hubness Reduction and Approximate Neighbor Search
DOI	10.48550/arxiv.1912.00706
Type	Preprint
Author	Feldbauer R

Title	Defending a Music Recommender Against Hubness-Based Adversarial Attacks
DOI	10.48550/arxiv.2205.12032
Type	Preprint
Author	Hoedt K

Title	Concept-Based Techniques for "Musicologist-Friendly" Explanations in Deep Music Classifiers
DOI	10.5281/zenodo.7316804
Type	Conference Proceeding Abstract
Author	Foscarin F
Link	Publication

Title	Defending a Music Recommender Against Hubness-Based Adversarial Attacks
DOI	10.5281/zenodo.6573391
Type	Conference Proceeding Abstract
Author	Flexer A
Link	Publication

Title	Defending a Music Recommender Against Hubness-Based Adversarial Attacks
DOI	10.5281/zenodo.6573390
Type	Conference Proceeding Abstract
Author	Flexer A
Link	Publication

Title	Concept-Based Techniques for "Musicologist-Friendly" Explanations in Deep Music Classifiers
DOI	10.5281/zenodo.7316803
Type	Conference Proceeding Abstract
Author	Foscarin F
Link	Publication

Title	Defending a Music Recommender Against Hubness-Based Adversarial Attacks
DOI	10.5281/zenodo.6798200
Type	Conference Proceeding Abstract
Author	Flexer A
Link	Publication

Title	On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples
Type	Conference Proceeding Abstract
Author	Praher V.
Conference	Proceedings of the 22nd International Society for Music Information Retrieval Conference
Link	Publication

Title	On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples
DOI	10.5281/zenodo.5624470
Type	Conference Proceeding Abstract
Author	Praher V
Link	Publication

Title	On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples
DOI	10.5281/zenodo.5624471
Type	Conference Proceeding Abstract
Author	Praher V
Link	Publication

Title	scikit-hubness: Hubness Reduction and Approximate Neighbor Search
DOI	10.21105/joss.01957
Type	Journal Article
Author	Feldbauer R
Journal	Journal of Open Source Software
Pages	1957
Link	Publication

Title	The Impact of Label Noise on a Music Tagger
Type	Conference Proceeding Abstract
Author	Flexer A.
Conference	Proceedings of the 13th International Workshop on Machine Learning and Music
Link	Publication

Title	Weak Multi-Label Audio-Tagging with Class Noise
Type	Other
Author	Flexer A.
Link	Publication

Title	Audio Tagging With Convolutional Neural Networks Trained With Noisy Data
Type	Other
Author	Paischer F.
Link	Publication

Title	Can We Increase Inter- and Intra-Rater Agreement in Modeling General Music Similarity?
Type	Conference Proceeding Abstract
Author	Flexer A.
Conference	Proceedings of 20th International Society for Music Information Retrieval Conference
Link	Publication

Disseminations

Title	Research visit and public talk Bob Sturm
Type	A talk or presentation
Link	Link

Title	Special session on validity of MIR research
Type	A formal working group, expert panel or dialogue
Link	Link

Title	Interview with Austrian radio station
Type	A press release, press conference or response to a media enquiry/interview

Fundings

Title	A Music Information Retrieval Approach to Pop Music Culture
Type	Research grant (including intramural programme)
Start of Funding	2023
Funder	Austrian Science Fund (FWF)

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

On Valid and Reliable Experiments in Music IR

On Valid and Reliable Experiments in Music IR

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

On Valid and Reliable Experiments in Music IR

On Valid and Reliable Experiments in Music IR

Disciplines

Keywords

Research Output