• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Birgit Mitter
      • Oliver Spadiut
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • Alternative Methods to Animal Testing
        • European Partnership BE READY
        • European Partnership Biodiversa+
        • European Partnership BrainHealth
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • LUKE – Ukraine
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Korea
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol-South Tyrol-Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

Perceptual Optimization of Time-Frequency Audio Representations and Coding

Perceptual Optimization of Time-Frequency Audio Representations and Coding

Piotr Majdak (ORCID: 0000-0003-1511-6164)
  • Grant DOI 10.55776/I1362
  • Funding program Principal Investigator Projects International
  • Status ended
  • Start March 1, 2014
  • End October 31, 2017
  • Funding amount € 237,174
  • Project website

Bilaterale Ausschreibung: Frankreich

Disciplines

Electrical Engineering, Electronics, Information Engineering (50%); Mathematics (20%); Psychology (30%)

Keywords

    Auditory Masking, Efficiency, Time-Frequency Representations, Gabor, Audio Coding

Abstract Final report

One of the greatest challenges in signal processing is to develop efficient signal representations. Such a representation extracts relevant information and describes it with a minimal amount of data. In the specific context of sound processing, and especially in audio coding, where the goal is to minimize the size of binary data required for storage or transmission, it is desirable that the representation takes into account human auditory perception and allows reconstruction with a controlled amount of perceived distortion. Over the last decades, many psychoacoustical studies investigated auditory masking, an important property of auditory perception. Masking refers to the degradation of the detection threshold of a sound in presence of another sound. The results were used to develop models of either spectral or temporal masking. Attempts were made to simply combine these models to account for time-frequency (t-f) masking in perceptual audio codecs. We recently conducted psychoacoustical studies on t-f masking which revealed the inaccuracy of such simple models. These new data represent a crucial basis to account for masking effects in t-f representations of sounds. Although t-f representations are standard tools in audio processing, the development of a representation that is mathematically-founded, perception-based, perfectly invertible, and possibly with a minimum amount of redundancy, remains a challenge. POTION addresses the following main questions: To what extent is it possible to obtain a perception-based (i.e., as close as possible to "what we get is what we hear"), perfectly invertible, and possibly minimally redundant t-f representation of sound signals? Such a representation is essential for modeling complex masking interactions in the t-f domain and is expected to improve our understanding of auditory sound processing. Moreover, it is of fundamental interest for many audio applications involving sound analysis-synthesis. Is it possible to improve current perceptual audio codecs by considering a joint t-f approach? To reduce the size of digital audio files, perceptual audio codecs like MP3 apply a frequency transform and use spectral masking models to control the sub-quantization of transform coefficients. Thus, current codecs follow mainly a spectral approach, although temporal masking effects are taken into account in some implementations. By combining an efficient perception-based t-f transform with a joint t-f masking model in an audio codec, we expect to achieve significant performance improvements. To investigate these issues, a multidisciplinary approach is required. Accordingly, POTION is based on a consortium involving the Laboratory for Mechanics and Acoustics (LMA, France) and the Acoustics Research Institute (ARI, Austria). The LMA features international experts in signal processing methods for analysis- synthesis of non-stationary audio signals and audio coding. The ARI features international experts in mathematics, t-f analysis, and psychoacoustics. By establishing strong interactions between the two institutions and disciplines, the members of POTION represent an optimum consortium to successfully achieve these goals.

The fundamental research in POTION aimed at developing new methods for the representation and interpretation of audio signals. An efficient representation extracts relevant information and describes it with a minimal amount of data. In the specific context of sound processing, and especially in audio coding, where the goal is to minimize the size of binary data required for storage or transmission, it is desirable that the representation takes into account human auditory perception and allows reconstruction with a controlled amount of perceived distortion. The main goal was to obtain a perceptually optimized representation, i.e., which displays only the significantly audible components of sound signals. To achieve this goal, the research in POTION focused on both time-frequency (TF) analysis methods and psychoa coustics.TF representations are standard tools in audio processing. They allow to display the temporal evolution (x-coordinates) of each spectral component (y-coordinates) of a signal as an image. The temporal and spectral resolution of the image depend on the mathematical proper- ties and implementation of the representation. Currently, there is no TF representation avail - able that mimics the auditory TF resolution and allows perfect reconstruction. In POTION, such a representation was developed: the framework Audlet provides a versatile and efficient filter bank design for the analysis and synthesis of audio signal using auditory frequency scales. It is highly suitable for audio applications requiring stability, perfect reconstruction, and a flexible choice of redundancy.To obtain a perceptually optimized TF representation, it was necessary to investigate auditory masking. Masking refers to the degradation of the detection threshold of a sound in presence of another sound. Over the last decades, many psycho-acoustical studies investigated masking. Their results were used to develop models of either spectral or temporal masking. At- tempts were made to simply combine these models to account for TF masking. However, preliminary TF masking data collected before the project begin revealed the inaccuracy of such simple models. To propose an accurate model of TF masking, additional masking data were collected in POTION. These data were implemented in a filter to display only audible components of the Audlet representation. Moreover, psychoacoustics experiments conducted in POTION contributed new methods and data on the measurements of cochlear compression in humans.Another research question in POTION was: Is it possible to improve the performance of lossy coding algorithms? To reduce the size of digital audio files, perceptual audio codecs like MP3 decompose sounds into variable-length time segments, apply a frequency transform, and use masking models to control the sub-quantization of transform coefficients within each segment. Thus, current codecs follow mainly a spectral approach. By combining an efficient perception-based TF transform with a joint TF masking model in an audio codec, it was expected to achieve significant performance improvements. A variant of the Audlet was developed and adapted to audio coding: the ERB-MDCT. A lossy coder/decoder was then implemented. It combines the ERB-MDCT and a sparse decomposition algorithm that uses the TF model developed in the project. This coder was optimized for low bitrates (24-48 kbps), and challenges state-of-the art codecs (AAC HEv2).

Research institution(s)
  • Österreichische Akademie der Wissenschaften - 86%
  • Universität Wien - 14%
Project participants
  • Martin Ehler, Universität Wien , associated research partner
International project participants
  • Olivier Derrien, Centre National de Recherche Scientifique (CNRS) - France

Research Output

  • 123 Citations
  • 7 Publications
Publications
  • 2015
    Title A Quasi-Orthogonal, Invertible, and Perceptually Relevant Time-Frequency Transform for Audio Coding
    DOI 10.1109/eusipco.2015.7362493
    Type Conference Proceeding Abstract
    Author Derrien O
    Pages 799-803
    Link Publication
  • 2018
    Title Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales
    DOI 10.3390/app8010096
    Type Journal Article
    Author Necciari T
    Journal Applied Sciences
    Pages 96
    Link Publication
  • 2017
    Title Frame Theory for Signal Processing in Psychoacoustics
    DOI 10.1007/978-3-319-54711-4_10
    Type Book Chapter
    Author Balazs P
    Publisher Springer Nature
    Pages 225-268
  • 2016
    Title Auditory Time-Frequency Masking for Spectrally and Temporally Maximally-Compact Stimuli
    DOI 10.1371/journal.pone.0166937
    Type Journal Article
    Author Necciari T
    Journal PLOS ONE
    Link Publication
  • 2016
    Title The role of compression in the simultaneous masker phase effecta)
    DOI 10.1121/1.4964328
    Type Journal Article
    Author Tabuchi H
    Journal The Journal of the Acoustical Society of America
    Pages 2680-2694
    Link Publication
  • 2013
    Title THE ERBLET TRANSFORM: AN AUDITORY-BASED TIME-FREQUENCY REPRESENTATION WITH PERFECT RECONSTRUCTION
    DOI 10.1109/icassp.2013.6637697
    Type Conference Proceeding Abstract
    Author Necciari T
    Pages 498-502
  • 2014
    Title Perceptual Matching Pursuit with Gabor Dictionaries and Time-Frequency Masking
    DOI 10.1109/icassp.2014.6854171
    Type Conference Proceeding Abstract
    Author Chardon G
    Pages 3102-3106

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF