Restoration of lost information in digital signals
Restoration of lost information in digital signals
Bilaterale Ausschreibung: Tschechien
Disciplines
Electrical Engineering, Electronics, Information Engineering (30%); Mathematics (55%); Psychology (15%)
Keywords
-
Inpainting,
Signal Restoration,
Adaptive Representations,
Optimization,
Signal Models,
Auditory Perception
From historic speech and music recordings to the transmission of data, e.g. over a wireless connection, we frequently encounter the loss of significant segments of a signal. As with the settings for this kind of data loss, the reasons can be various, e.g. material defects or connectivity problems. As a consequence, we often have to work with data that is damaged and distorted, often to an unacceptable degree. By combining know-how from the areas of applied mathematics, optimization, signal modeling, signal and image processing and human auditory perception, MERLIN will provide new, innovative methods for the automatice recovery of lost signal segments and concealment of damaged signal content. In particular, MERLIN will devise methods specifically focused on audio restoration. In contrast to previous work in this field, the schemes designed in MERLIN will, whenever possible, harness existing signal-dependent information to assist the restoration process. This prior information can range from clues obtained directly from the signal, to meta-data on the signal source, e.g. instrumentation, speaker information and musical score. Audio restoration is a challenging and unique sub-field of signal restoration not only for the tremendous variety of audio signals and their everyday relevance, but also for the relevance of the inner workings of the human auditory system for their perception. To improve and simplify the restoration process, MERLIN will take into account state-of-the-art knowledge on the perception of sound. Together, these aspects will lead to significant advances in restoration quality. The algorithms designed throughout MERLIN will be supplied to the scientific community in a software toolbox, freely available for research purposes in conjunction with an extensive database of real and synthetic test signals.
Audio signals, such as speech and music recordings, are ubiquitous: We interact with audio signals in phone calls with our loved ones, online meetings with colleagues and customers, in entertainment media, but also, often unsolicited, in public transportation, stores and many other daily situations. The recording, storage and transmission of audio data is often prone to errors, which can, in the worst case, lead to the effective loss of significant segments of important data. The correction of such faulty segments is often referred to as audio inpainting, a terminology adapted from similar techniques in image processing, or more generally as error concealment. Both expressions correctly suggest that the exact recovery of the lost data is not always necessary, or even possible. Instead, the primary goal of audio inpainting is the production of a signal that is perceived as error-free. Apart from the mere correction of signal defects, audio inpainting offers notable opportunities for creative-artistic application. The project partners in MERLIN at the Acoustics Research Institute of the Austrian Academy of Sciences and the Signal Processing Laboratoy of Brno University of Technology were able to provide the theoretical and technical foundation, and implementations, of pioneering audio inpainting schemes. This success is based on fundamental research in the mathematical subjects of time-frequency analysis and harmonic analysis, and advances in the development of novel signal processing methods considering human auditory perception and exploiting cutting-edge machine learning and optimization schemes. Time-frequency representations are important tools for audio analysis and processing. Likewise, they are an essential component of successful audio inpainting methods, including those developed in MERLIN. This project provided the necessary foundation to draw on the full potential of adapted time-frequency representations, e.g., to human auditory perception, through the study of their fundamental, mathematical structure and properties. The use of such representations in matched, advanced optimization and machine learning schemes enabled the development of signal models for audio that are capable of describing the structure of such data more accurately than prior models which were often rather simplistic. The potential of these models was demonstrated by their use in MERLIN for the audio inpainting task, providing notable qualitative improvements over the state of the art.
- Pavel Rajmic, Brno University of Technology - Czechia
- Richard Kronland-Martinet, CNRS - France
- Rémi Gribonval, Ecole normale supérieure de Lyon - France
- Matthieu Kowalski, Universite de Paris-Sud 3 - France
- Nathanael Perraudin, ETH Zürich - Switzerland
- Ilker Bayram, Istanbul Technical University - Turkey
Research Output
- 254 Citations
- 38 Publications
- 1 Patents
- 4 Datasets & models
- 1 Software
- 2 Disseminations
- 4 Scientific Awards
-
2018
Title Non-Iterative Filter Bank Phase (Re)Construction DOI 10.5281/zenodo.1159690 Type Other Author Holighaus N Link Publication -
2018
Title Phase Vocoder Done Right DOI 10.5281/zenodo.1159429 Type Other Author Holighaus N Link Publication -
2018
Title Phase Vocoder Done Right DOI 10.5281/zenodo.1159430 Type Other Author Holighaus N Link Publication -
2018
Title Non-Iterative Filter Bank Phase (Re)Construction DOI 10.5281/zenodo.1159689 Type Other Author Holighaus N Link Publication -
2017
Title Non-Iterative Filter Bank Phase (Re)Construction DOI 10.23919/eusipco.2017.8081342 Type Conference Proceeding Abstract Author Prûša Z Pages 922-926 Link Publication -
2017
Title Phase Vocoder Done Right DOI 10.23919/eusipco.2017.8081353 Type Conference Proceeding Abstract Author Pruša Z Pages 976-980 -
2018
Title Inpainting of Long Audio Segments With Similarity Graphs DOI 10.1109/taslp.2018.2809864 Type Journal Article Author Perraudin N Journal IEEE/ACM Transactions on Audio, Speech, and Language Processing Pages 1083-1094 Link Publication -
2018
Title Designing Gabor windows using convex optimization DOI 10.1016/j.amc.2018.01.035 Type Journal Article Author Perraudin N Journal Applied Mathematics and Computation Pages 266-287 Link Publication -
2018
Title Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales DOI 10.3390/app8010096 Type Journal Article Author Necciari T Journal Applied Sciences Pages 96 Link Publication -
2022
Title Coorbit theory of warped time-frequency systems in $\mathbb{R}^d$ DOI 10.48550/arxiv.2208.01342 Type Preprint Author Holighaus N -
2024
Title Coorbit Theory of Warped Time-Frequency Systems in Rd DOI 10.1007/s00041-024-10098-8 Type Journal Article Author Holighaus N Journal Journal of Fourier Analysis and Applications Pages 62 Link Publication -
2019
Title Characterization of Analytic Wavelet Transforms and a New Phaseless Reconstruction Algorithm DOI 10.1109/tsp.2019.2920611 Type Journal Article Author Holighaus N Journal IEEE Transactions on Signal Processing Pages 3894-3908 Link Publication -
2019
Title Audio Inpainting of Music by Means of Neural Networks Type Conference Proceeding Abstract Author Holighaus N Conference 146th Audio Engineering Society Convention Link Publication -
2019
Title Adversarial Generation of Time-Frequency Features with application in audio synthesis Type Conference Proceeding Abstract Author Marafioti A Conference 36th International Conference on Machine Learning (ICML) Pages 4352-4362 Link Publication -
2019
Title Characterization of Analytic Wavelet Transforms and a New Phaseless Reconstruction Algorithm DOI 10.48550/arxiv.1906.00738 Type Preprint Author Holighaus N -
2019
Title A Context Encoder For Audio Inpainting DOI 10.1109/taslp.2019.2947232 Type Journal Article Author Marafioti A Journal IEEE/ACM Transactions on Audio, Speech, and Language Processing Pages 2362-2372 -
2023
Title Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation DOI 10.48550/arxiv.2301.01640 Type Preprint Author Holighaus N -
2023
Title Grid-Based Decimation for Wavelet Transforms With Stably Invertible Implementation DOI 10.1109/taslp.2023.3235197 Type Journal Article Author Holighaus N Journal IEEE/ACM Transactions on Audio, Speech and Language Processing Pages 789-801 -
2020
Title Exemlar-based audio inpainting in musical signals Type Other Author Marafioti A Link Publication -
2020
Title Accelerating Matching Pursuit for Multiple Gabor Dictionaries Type Conference Proceeding Abstract Author Holighaus N Conference 23rd International Conference on Digital Audio Effects (DAFx20) Pages 181-186 Link Publication -
2020
Title Sparse and Cosparse Audio Dequantization Using Convex Optimization DOI 10.48550/arxiv.2003.04222 Type Preprint Author Záviška P -
2020
Title A Class of Warped Filter Bank Frames Tailored to Non-linear Frequency Scales DOI 10.1007/s00041-020-09726-w Type Journal Article Author Holighaus N Journal Journal of Fourier Analysis and Applications Pages 22 -
2020
Title GACELA: A Generative Adversarial Context Encoder for Long Audio Inpainting of Music DOI 10.1109/jstsp.2020.3037506 Type Journal Article Author Marafioti A Journal IEEE Journal of Selected Topics in Signal Processing Pages 120-131 Link Publication -
2020
Title Schur-type Banach modules of integral kernels acting on mixed-norm Lebesgue spaces DOI 10.48550/arxiv.2006.01083 Type Preprint Author Holighaus N -
2021
Title Editorial: Reconstruction of Audio From Incomplete or Highly Degraded Observations DOI 10.1109/jstsp.2021.3052087 Type Journal Article Author Rajmic P Journal IEEE Journal of Selected Topics in Signal Processing Pages 2-4 Link Publication -
2021
Title Time-Frequency Phase Retrieval for AudioThe Effect of Transform Parameters DOI 10.1109/tsp.2021.3088581 Type Journal Article Author Marafioti A Journal IEEE Transactions on Signal Processing Pages 3585-3596 Link Publication -
2021
Title SEDENOSS: SEparating and DENOising Seismic Signals with dual-path recurrent neural network architecture DOI 10.1002/essoar.10504944.2 Type Preprint Author Novoselov A Link Publication -
2021
Title Fast Matching Pursuit with Multi-Gabor Dictionaries DOI 10.1145/3447958 Type Journal Article Author Pruša Z Journal ACM Transactions on Mathematical Software (TOMS) Pages 1-20 Link Publication -
2021
Title Schur-type Banach modules of integral kernels acting on mixed-norm Lebesgue spaces DOI 10.1016/j.jfa.2021.109197 Type Journal Article Author Holighaus N Journal Journal of Functional Analysis Pages 109197 Link Publication -
2020
Title Sparse and Cosparse Audio Dequantization Using Convex Optimization DOI 10.1109/tsp49548.2020.9163566 Type Conference Proceeding Abstract Author Záviška P Pages 216-220 Link Publication -
2022
Title SEDENOSS: SEparating and DENOising Seismic Signals With Dual-Path Recurrent Neural Network Architecture DOI 10.1029/2021jb023183 Type Journal Article Author Novoselov A Journal Journal of Geophysical Research: Solid Earth Link Publication -
2022
Title Fast Matching Pursuit with Multi-Gabor Dictionaries DOI 10.48550/arxiv.2202.12380 Type Preprint Author Pruša Z -
2022
Title Non-iterative Filter Bank Phase (Re)Construction DOI 10.48550/arxiv.2202.07498 Type Preprint Author Pruša Z -
2022
Title Phase-Based Signal Representations for Scattering DOI 10.48550/arxiv.2202.07484 Type Preprint Author Haider D -
2022
Title Audio Inpainting via $\ell_1$-Minimization and Dictionary Learning DOI 10.48550/arxiv.2202.07479 Type Preprint Author Rajbamshi S -
2022
Title Phase Vocoder Done Right DOI 10.48550/arxiv.2202.07382 Type Preprint Author Prusa Z -
2021
Title Phase-Based Signal Representations for Scattering DOI 10.23919/eusipco54536.2021.9616285 Type Conference Proceeding Abstract Author Haider D Pages 6-10 Link Publication -
2021
Title Audio Inpainting via $\ell_{1}$-Minimization and Dictionary Learning DOI 10.23919/eusipco54536.2021.9616132 Type Conference Proceeding Abstract Author Rajbamshi S Pages 2149-2153 Link Publication
-
2019
Patent Id:
WO2019038275
Title METHOD FOR PHASE CORRECTION IN A PHASE VOCODER AND DEVICE Type Patent application published patentId WO2019038275 Website Link
-
2021
Link
Title GACELA - Generative adversarial context encoder for audio inpainting Type Computer model/algorithm Public Access Link Link -
2019
Link
Title (Contributions to) The Large Time-Frequency Analysis Toolbox Type Computer model/algorithm Public Access Link Link -
2019
Link
Title Audio inpainting with a context encoder Type Computer model/algorithm Public Access Link Link -
2019
Link
Title TiFGAN: Time Frequency Generative Adversarial Networks Type Computer model/algorithm Public Access Link Link
-
2020
Title 2020 Best Paper Award (Jubiläumsfonds der Stadt Wien für die ÖAW) of the Austrian Academy of Sciences Type Research prize Level of Recognition National (any country) -
2019
Title Guest editor - IEEE Selected Topics On Signal Processing, Special Issue 'Reconstruction of audio from incomplete or highly degraded observations' Type Appointed as the editor/advisor to a journal or book series Level of Recognition Continental/International -
2019
Title Best Paper Award (22nd International Conference on Digital Audio Effects) Type Research prize Level of Recognition Continental/International -
2018
Title Axiom Poster Award (3rd prize) Type Poster/abstract prize Level of Recognition Regional (any country)