Preventing Hubness in Music Information Retrieval
Preventing Hubness in Music Information Retrieval
Disciplines
Computer Sciences (85%); Arts (15%)
Keywords
-
Music Information Retrieval,
Artificial Intelligence,
Machine Learing,
Music,
Audio Signal Processing,
Information Retrieval
In a number of recent publications the so-called ``hubness`` phenomenon has been described and explored as a general problem of machine learning in high dimensional data spaces. Hubs are data points which keep appearing unwontedly often in nearest neighbor lists of many other data points. This effect is particularly problematic in algorithms for similarity search, as the same ``similar`` objects are found over and over again. But it has also adverse effects for the many machine learning algorithms that make use of distance information. The effect has been shown to be a natural consequence of high dimensionality and as such is yet another aspect of the curse of dimensionality. The hub problem has gained particular attention in the field of Music Information Retrieval (MIR) which is the interdisciplinary science of extracting information from music. In MIR, the hub problem has been primarily studied in the context of music recommendation based on modeling of audio similarity. Songs which act as hubs are reported as being similar to very many other songs and hence keep a significant proportion of the audio collection from being recommended at all. Since proper modeling of audio similarity is the central challenge in MIR, a problem like hubness interfering with this endeavor is of major concern for MIR in general. Similar effects exist for other forms of multimedia retrieval and recommendation. The main goal of this project is to conduct an in-depth study of the hubness problem in the context of MIR with the aim of finding ways to avoid or at least attenuate its adverse effects. Our research will focus on three possible solutions: - finding parameterizations of audio similarity which are less prone to hubness - transforming audio similarity spaces thereby avoiding asymmetries that lead to hubness - considering audio similarity spaces as nearest neighbor graphs and using graph theoretic results to avoid hub nodes Although the emphasis of this project is on MIR, results concerning the prevention of hubs will also be of interest and applicability in the broader field of machine learning. Such additional ramifications will be explored where possible and will make sure that our research has not only the potential to solve an important problem in MIR but in general multimedia retrieval and machine learning also.
The so-called hubness'' phenomenon is a general problem of machine learning in high dimensional data spaces. Hubs are data points which keep appearing unwontedly often in nearest neighbor lists of many other data points. This effect is particularly problematic in algorithms for similarity search, as the same similar'' objects are found over and over again. But it has also adverse effects for the many machine learning algorithms that make use of distance information. The effect has been shown to be a natural consequence of high dimensionality and as such is yet another aspect of the curse of dimensionality.The hub problem has gained particular attention in the field of Music Information Retrieval (MIR), which is the interdisciplinary science of extracting information from music. In MIR, the hub problem has been primarily studied in the context of music recommendation based on modeling of audio similarity. Songs which act as hubs are reported as being similar to very many other songs and hence keep a significant proportion of the audio collection from being recommended at all. The main goal of this project was to conduct an in-depth study of the hubness problem in the context of MIR. We were able to develop three different approaches that decisively reduce the negative effects of hubness. Two methods re-scale the problematic high-dimensional distance spaces either locally or globally, resulting in a new transformed distance space not showing the problematic hub effects. The third method chooses a distance function different from the ubiquitous Euclidean norm based on hubness analysis. In all these new distance spaces, songs which acted as hub-songs do not crowd the recommendation lists anymore and the full audio collections are accessible again. These methods have also been evaluated with standard machine learning data sets and in the context of image and text retrieval, collaborative filtering, speaker verification and speech recognition. In all these application scenarios hubness is greatly reduced and performance indexes like accuracy or precision and recall are improved. It is our belief that work in this project not only solved an important problem in MIR, but also a general problem of learning in high dimensional spaces.
Research Output
- 187 Citations
- 14 Publications
-
2013
Title The neglected user in music information retrieval research DOI 10.1007/s10844-013-0247-6 Type Journal Article Author Schedl M Journal Journal of Intelligent Information Systems Pages 523-539 Link Publication -
2013
Title Hybrid retrieval approaches to geospatial music recommendation DOI 10.1145/2484028.2484146 Type Conference Proceeding Abstract Author Schedl M Pages 793-796 Link Publication -
2012
Title A MIREX meta-analysis of hubness in audio music similarity. Type Conference Proceeding Abstract Author Flexer A Conference Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR'12), Porto, Portugal, October 8th-12th -
2014
Title Location-Aware Music Artist Recommendation DOI 10.1007/978-3-319-04117-9_19 Type Book Chapter Author Schedl M Publisher Springer Nature Pages 205-213 -
2014
Title Choosing the Metric in High-Dimensional Spaces Based on Hub Analysis. Type Conference Proceeding Abstract Author Flexer A Conference Proceedings of the 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2014 -
2014
Title A Case for Hubness Removal in High–Dimensional Multimedia Retrieval DOI 10.1007/978-3-319-06028-6_77 Type Book Chapter Author Schnitzer D Publisher Springer Nature Pages 687-692 -
2014
Title Improving Neighborhood-Based Collaborative Filtering by Reducing Hubness DOI 10.1145/2578726.2578747 Type Conference Proceeding Abstract Author Knees P Pages 161-168 -
2014
Title An investigation of likelihood normalization for robust ASR. Type Conference Proceeding Abstract Author Flexer A Et Al -
2014
Title An investigation of likelihood normalization for robust ASR DOI 10.21437/interspeech.2014-149 Type Conference Proceeding Abstract Author Vincent E Pages 621-625 Link Publication -
2012
Title Putting the User in the Center of Music Information Retrieval. Type Conference Proceeding Abstract Author Flexer A Conference Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR'12), Porto, Portugal, October 8th-12th -
2013
Title Can Shared Nearest Neighbors Reduce Hubness in High-Dimensional Spaces? Type Conference Proceeding Abstract Author Flexer A -
2013
Title Using mutual proximity for novelty detection in audio music similarity. Type Conference Proceeding Abstract Author Flexer A -
2013
Title The Relation of Hubs to the Doddington Zoo in Speaker Verification. Type Conference Proceeding Abstract Author Schlüter J Et Al Conference Proceedings of the 21st European Signal Processing Conference (EUSIPCO'2013), September 9-13, Marrakech, Morocco, 2013 -
2013
Title Can Shared Nearest Neighbors Reduce Hubness in High-Dimensional Spaces? DOI 10.1109/icdmw.2013.101 Type Conference Proceeding Abstract Author Flexer A Pages 460-467