On High Dimensional Data Analysis in Music Information Retrieval
On High Dimensional Data Analysis in Music Information Retrieval
Disciplines
Computer Sciences (85%); Arts (15%)
Keywords
-
Music Information Retrieval,
Artificial Intelligence,
Machine Learning,
Multimedia,
Hubness,
High Dimensional Data Analysis
Learning in high dimensional spaces poses a number of challenges which are referred to as the curse of dimensionality. Music Information Retrieval (MIR), as the interdisciplinary science of retrieving information from music, is very often relying on high dimensional feature representations and models. The existence of a new aspect of the curse of dimensionality, the so-called hubness, has been first documented and established in MIR as a problem of computing music similarity. Hub songs are, according to the music similarity function, similar to very many other songs and as a consequence appear in very many recommendation lists preventing other songs from being recommended at all. The hubness phenomenon has since then been identified as a general problem of machine learning in high dimensional spaces. It is due to the property of distance concentration which causes all points in a high dimensional data space to be at almost the same distance to each other. Our own previous research efforts have focused on the impact of distance concentration and hubness on nearest neighbor based music recommendation and genre classification. As a result we have developed a general unsupervised method to pre-process and rescale distance spaces which is able to decisively diminish hubness and its adverse effects in music databases but also general machine learning datasets. Research by our own and other research groups has also made it clear that concentration and hubness have an impact on many more distance based algorithms being used in high dimensional data analysis. This proposed project will explore existing and develop new approaches to deal with these problems by studying their effects on a wide range of methods in MIR, but also multimedia and machine learning. In particular we are planning to (i) study and unify rescaling methods to avoid distance concentration, (ii) explore the role of hubness in unsupervised (clustering, visualization) and supervised learning (classification) in high dimensional spaces. The main focus of this project is on MIR since this is where the majority of results on hubness and concentration exist. But the evaluation of our results in the broader field of multimedia and machine learning will make sure that our research has the potential to solve an important problem in MIR and at the same time a general problem of learning in high dimensional spaces.
Learning in high dimensional spaces poses a number of challenges which are referred to as the curse of dimensionality. Music Information Retrieval (MIR), as the interdisciplinary science of retrieving information from music, is very often relying on high dimensional feature representations and models. The existence of a new aspect of the curse of dimensionality, the so-called hubness, has been first documented and established in MIR as a problem of computing music similarity. Hub songs are, according to the music similarity function, similar to very many other songs and as a consequence appear in very many recommendation lists preventing other songs from being recommended at all. The hubness phenomenon has since then been identified as a general problem of machine learning in high dimensional spaces. It is due to the property of distance concentration which causes all points in a high dimensional data space to be at almost the same distance to each other. In this project we have developed, studied and unified methods to reduce hubness by either re-scaling of distance-spaces, data centering or usage of alternative distance norms. We conducted a large-scale empirical evaluation of all twelve available versions of hubness reduction methods on fifty data sets. We also developed a hubness analysis workflow which, based on some simple criteria, helps practitioners to decide which hubness reduction method to use for their problem at hand. In addition, we explored the negative impact of hubness on unsupervised (clustering, visualization, outlier detection) and supervised machine learning (classification) in high dimensional spaces. All these distance-based machine learning algorithms suffer from a range of hubness related problems which can be alleviated via hubness reduction. In summary, within the course of our project we were able to develop new methods of hubness reduction, clarify which hubness reduction methods work best under which conditions, and document the influence of hubness and its reduction on the full breadth of machine learning. This allowed us to solve an important problem in MIR and at the same time a general problem of learning in high dimensional spaces.
- Emmanuel Vincent, INRIA Rennes - France
- Nenad Tomasev, Jozef Stefan Institute - Slovenia
Research Output
- 146 Citations
- 9 Publications
-
2018
Title Hubness as a case of technical algorithmic bias in music recommendation DOI 10.1109/icdmw.2018.00154 Type Conference Proceeding Abstract Author Flexer A Pages 1062-1069 -
2018
Title A comprehensive empirical comparison of hubness reduction in high-dimensional spaces DOI 10.1007/s10115-018-1205-y Type Journal Article Author Feldbauer R Journal Knowledge and Information Systems Pages 137-166 Link Publication -
2015
Title Choosing lp norms in high-dimensional spaces based on hub analysis DOI 10.1016/j.neucom.2014.11.084 Type Journal Article Author Flexer A Journal Neurocomputing Pages 281-287 Link Publication -
2017
Title Mutual proximity graphs for improved reachability in music recommendation DOI 10.1080/09298215.2017.1354891 Type Journal Article Author Flexer A Journal Journal of New Music Research Pages 17-28 Link Publication -
2018
Title Fast Approximate Hubness Reduction for Large High-Dimensional Data DOI 10.1109/icbk.2018.00055 Type Conference Proceeding Abstract Author Feldbauer* R Pages 358-367 -
2016
Title The Problem of Limited Inter-rater Agreement in Modelling Music Similarity DOI 10.1080/09298215.2016.1200631 Type Journal Article Author Flexer A Journal Journal of New Music Research Pages 239-251 Link Publication -
2016
Title An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection DOI 10.1109/icdmw.2016.0106 Type Conference Proceeding Abstract Author Flexer A Pages 716-723 -
2016
Title Centering Versus Scaling for Hubness Reduction DOI 10.1007/978-3-319-44778-0_21 Type Book Chapter Author Feldbauer R Publisher Springer Nature Pages 175-183 -
2015
Title The Unbalancing Effect of Hubs on K-Medoids Clustering in High-Dimensional Spaces DOI 10.1109/ijcnn.2015.7280303 Type Conference Proceeding Abstract Author Schnitzer D Pages 1-8