• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • ERA-NET TRANSCAN
        • Alternative Methods to Animal Testing
        • European Partnership BE READY
        • European Partnership Biodiversa+
        • European Partnership BrainHealth
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • LUKE – Ukraine
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Korea
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

Automatic Segmentation, Labelling, and Characterisation of Audio Streams

Automatic Segmentation, Labelling, and Characterisation of Audio Streams

Gerhard Widmer (ORCID: 0000-0003-3531-1282)
  • Grant DOI 10.55776/TRP307
  • Funding program Translational Research
  • Status ended
  • Start February 1, 2013
  • End June 30, 2017
  • Funding amount € 447,716

Disciplines

Electrical Engineering, Electronics, Information Engineering (10%); Computer Sciences (85%); Arts (5%)

Keywords

    Music Information Retrieval (MIR), Machine Learning, Audio and Music Classification

Abstract Final report

The goal of this project is to develop technologies for the automatic segmentation and interpretation of audio files and audio streams deriving from different media worlds: music repositories, (Web and terrestrial) radio streams, TV broadcasts, etc. A specific focus is on streams in which music plays an important role. Specifically, the technologies to be developed should address the following tasks: (1) automatic segmentation (with or without meta-information) of audio streams into coherent or otherwise meaningful units or segments (based on general sound or rhythm similarity or homogeneity, on specific types of content and characteristics, on repeated occurrences of subsections, etc.); (2) the automatic categorisation of such audio segments into classes, and the association of segments and classes with meta-data derived from various sources (including the Web); (3) the automatic characterisation of audio segments and sound objects in terms of concepts intuitively understandable to humans. To this end, we plan to develop and/or improve and optimise computational methods that analyse audio streams, identify specific kinds of audio content (e.g., music, singing, speech, applause, commercials, ...), detect boundaries and transitions between songs, and classify musical and other segments into appropriate categories; that combine information from various sources (the audio signal itself, databases, the Internet) in order to refine the segmentation and gain meta-information; that automatically discover and optimise audio features that improve segmentation and classification; and that learn to derive comprehensible descriptions of audio contents from such audio features (via machine learning). The research is motivated by a large class of challenging applications in the media world that require efficient and robust audio segmentation and classification. Application scenarios include audio streaming services and Web stream analysis, automatic media monitoring, content- and descriptor-based search in large multimedia (audio) databases, and artistic applications. That there is a strong and very concrete demand for such methods is documented, among other things, by the fact that several companies from the media world have pledged to support this project with large amounts of real-world data and valuable meta-information.

of this project was to develop technologies for the automatic segmentation and interpretation of audio files and audio streams deriving from different media worlds: music repositories, radio streams, TV broadcasts, etc. A specific focus was to be placed on streams in which music plays an important role. For these domains, we have conducted fundamental research and developed commercial applications side by side. The key technology used in this project were Convolutional Neural Networks (CNNs), a relatively new and powerful tool in the domain of machine learning, which we were among the first to apply to music recordings. Specifically, we addressed the tasks of onset detection (detecting the starting point of any musical notes), music segmentation (detecting the boundaries between parts of a music piece), singing voice detection (detecting where in a music piece there are vocalizations), and beat annotation (detecting the metrical structure of a music piece). Our work served both as pioneering examples for other researchers, and demonstrated the versatility of training CNNs on spectral input, questioning the need for hand-designed features. While we have improved the state of the art in all tasks we considered, we obtained the most marked improvements for music segmentation, a key concern for this project. We could also show that for music segmentation and music similarity estimation, current state-of-the- art results have nearly reached an upper bound stemming from the ambiguity of the tasks or subjectivity of human judgements. In the quest of learning to categorize audio segments in the face of scarce ground-truth data, we have investigated data augmentation schemes for music recordings and learning from imprecise annotations ('weak labels'). We have pursued additional directions of highly application-driven research suitable for a translational project: We used deep learning to accelerate an existing music similarity measure to become applicable to commercial-scale collections, we improved music similarity estimation using a technique borrowed from speech processing, we developed a novel audio identification method robust to pitch and tempo changes, we developed methods for real-time singing voice detection as well as for real-time music, speech and applause detection. Several of our methods are already being employed or tested by commercial parties. We also participated in an international challenge of detecting bird calls in audio recordings. Our approach achieved the best results, showing that the methodology we used for music analysis also applies to more generic audio analysis.

Research institution(s)
  • ÖFAI - Österreichisches Forschungsinstitut für Artifical Intelligence - 100%

Research Output

  • 394 Citations
  • 18 Publications
Publications
  • 2016
    Title The Problem of Limited Inter-rater Agreement in Modelling Music Similarity
    DOI 10.1080/09298215.2016.1200631
    Type Journal Article
    Author Flexer A
    Journal Journal of New Music Research
    Pages 239-251
    Link Publication
  • 2015
    Title Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks.
    Type Conference Proceeding Abstract
    Author Grill T
    Conference Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Malaga, Spain
  • 2015
    Title Music Boundary Detection Using Neural Networks on Combined Features and Two-Level Annotations.
    Type Conference Proceeding Abstract
    Author Grill T
    Conference Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Malaga, Spain
  • 2015
    Title A Low-Latency, Real-Time-Capable Singing Voice Detection Method with Lstm Recurrent Neural Networks
    DOI 10.1109/eusipco.2015.7362337
    Type Conference Proceeding Abstract
    Author Lehner B
    Pages 21-25
    Link Publication
  • 2015
    Title Music Boundary Detection Using Neural Networks on Spectrograms and Self-Similarity Lag Matrices
    DOI 10.1109/eusipco.2015.7362593
    Type Conference Proceeding Abstract
    Author Grill T
    Pages 1296-1300
    Link Publication
  • 2017
    Title Two Convolutional Neural Networks for Bird Detection in Audio Signals
    DOI 10.23919/eusipco.2017.8081512
    Type Conference Proceeding Abstract
    Author Grill T
    Pages 1764-1768
    Link Publication
  • 2016
    Title Learning To Pinpoint Singing Voice From Weakly Labeled Examples.
    DOI 10.5281/zenodo.1417650
    Type Other
    Author Schlüter J
    Link Publication
  • 2016
    Title Learning To Pinpoint Singing Voice From Weakly Labeled Examples.
    DOI 10.5281/zenodo.1417651
    Type Other
    Author Schlüter J
    Link Publication
  • 2016
    Title Learning to Pinpoint Singing Voice from Weakly Labeled Examples.
    Type Conference Proceeding Abstract
    Author Schlüter J
    Conference Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), New York, USA
  • 2014
    Title On the Reduction of False Positives in Singing Voice Detection
    DOI 10.1109/icassp.2014.6855054
    Type Conference Proceeding Abstract
    Author Lehner B
    Pages 7480-7484
  • 2014
    Title Improved Musical Onset Detection with Convolutional Neural Networks
    DOI 10.1109/icassp.2014.6854953
    Type Conference Proceeding Abstract
    Author Schlüter J
    Pages 6979-6983
  • 2014
    Title On World Construction, Variation: Duoddaris.
    Type Conference Proceeding Abstract
    Author Grill T
    Conference Proceedings of the Second conference on Computation, Communication, Aesthetics and X (xCoax), Porto, Portugal
  • 2015
    Title A Low-Latency, Real-Time-Capable Singing Voice Detection Method With Lstm Recurrent Neural Networks
    DOI 10.5281/zenodo.38849
    Type Other
    Author Böck S
    Link Publication
  • 2015
    Title Robust Quad-Based Audio Fingerprinting
    DOI 10.1109/taslp.2015.2509248
    Type Journal Article
    Author Sonnleitner R
    Journal IEEE/ACM Transactions on Audio, Speech, and Language Processing
    Pages 409-421
  • 2013
    Title Musical Onset Detection with Convolutional Neural Networks.
    Type Conference Proceeding Abstract
    Author Böck S
    Conference 6th International Workshop on Machine Learning and Music (MML) in conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), Prague, Czech Republic
  • 2013
    Title Learning Binary Codes for Efficient Large-Scale Music Similarity Search.
    Type Conference Proceeding Abstract
    Author Schlüter J
    Conference Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR), Curitiba, Brazil
  • 2015
    Title Improving Voice Activity Detection in Movies.
    Type Conference Proceeding Abstract
    Author Lehner B
    Conference Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Dresden, Germany.
  • 2014
    Title Boundary Detection in Music Structure Analysis using Convolutional Neural Networks.
    Type Conference Proceeding Abstract
    Author Grill T Et Al
    Conference Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF