Probabilistic Graphical Models for Time-Series Signal Mixtures
Probabilistic Graphical Models for Time-Series Signal Mixtures
Disciplines
Computer Sciences (100%)
Keywords
-
Bayesian Networks,
Discriminative Learning,
Factorial Hidden Markov Models,
Single Chaannel Source Separation,
Multipitch Tracking
Robustness against reverberation, noise, and interfering audio signals is one of the grand challenges in speech recognition, speech understanding, and audio analysis technology. One avenue to approach this challenge is single- channel audio separation. Recently, factorial hidden Markov have won the single-channel speech separation and recognition challenge. These models are capable of modeling acoustic scenes with multiple sources interacting over time. While these models reach super-human performance on specific tasks, there are still serious limitations restricting the applicability in many areas. We aim to generalize these models and enhance their applicability in several aspects: (i) Introduction of discriminative large margin learning techniques. This allows to focus the model specification on the most salient differences, i.e. discriminating information, between interfering sources. (ii) Development of efficient inference approaches. Efficient inference is needed since the computational demands of exact inference in factorial hidden Markov models scale exponentially with the number of sources, i.e. inference is intractable in tasks with many interacting sources. (iii) We are interested in adapting the model parameters during separation to the specific situation (e.g. actual speakers, gain, etc.) using only speech mixture data. Therefore, an expectation-maximization- like iterative adaptation framework initialized with universal models, e.g. speaker independent models, is proposed. This greatly increases the utility of this model. Currently, source-specific monaural data is required to learn the model. The models and methods derived are applied to single-channel speech separation, tracking of the fundamental frequency of concurrent speakers, and benchmark classification scenarios. The overall goal is to devise methods for next generation time-series models well-suited for monaural audio data generated from multiple interacting sources. These models are also appealing to related fields requiring signal separation. Examples are resolving interactions in brain-scan images or seismic data.
Robustness against reverberation, noise, and interfering audio signals is one of the grand challenges in speech recognition, speech understanding, and audio analysis technology. One avenue to approach this challenge is single-channel audio separation. Recently, factorial hidden Markov have won the single-channel speech separation and recognition challenge. These models are capable of modeling acoustic scenes with multiple sources interacting over time. While these models reach super-human performance on specific tasks, there are still serious limitations restricting the applicability in many areas. In this project we developed self-adaptation methods for separating single-channel signal mixtures. Furthermore, we developed discriminative learning methods to increase the performance of signal separation. For increasing the resource-efficiency we investigated reduced precision analysis for classification.These methods for speech enhancement are important in many telecommunication applications. Improving speech intelligibility and quality has been an active field of research for many decades.
- Technische Universität Graz - 100%
Research Output
- 211 Citations
- 24 Publications
-
2013
Title Greedy Part-Wise Learning of Sum-Product Networks DOI 10.1007/978-3-642-40991-2_39 Type Book Chapter Author Peharz R Publisher Springer Nature Pages 612-627 -
2013
Title Model-Based Multiple Pitch Tracking Using Factorial HMMs: Model Adaptation and Inference DOI 10.1109/tasl.2013.2260744 Type Journal Article Author Wohlmayr M Journal IEEE Transactions on Audio, Speech, and Language Processing Pages 1742-1754 -
2016
Title On the Latent Variable Interpretation in Sum-Product Networks DOI 10.1109/tpami.2016.2618381 Type Journal Article Author Peharz R Journal IEEE Transactions on Pattern Analysis and Machine Intelligence Pages 2030-2044 Link Publication -
2016
Title On the Latent Variable Interpretation in Sum-Product Networks DOI 10.48550/arxiv.1601.06180 Type Preprint Author Peharz R -
2015
Title Generatively Optimized Bayesian Network Classifiers Under Computational Constraints. Type Conference Proceeding Abstract Author Pernkopf F Conference International Conference on Machine Learning (ICML), Workshop on Resource-Efficient Machine Learning, 2015 -
2015
Title Message Scheduling Methods for Belief Propagation DOI 10.1007/978-3-319-23525-7_18 Type Book Chapter Author Knoll C Publisher Springer Nature Pages 295-310 -
2015
Title Representation Models in Single Channel Source Separation DOI 10.1109/icassp.2015.7178062 Type Conference Proceeding Abstract Author Zöhrer M Pages 713-717 -
2016
Title Maximum margin hidden Markov models for sequence classification DOI 10.1016/j.patrec.2016.03.017 Type Journal Article Author Mutsam N Journal Pattern Recognition Letters Pages 14-20 -
2018
Title Sum-Product Networks for Sequence Labeling DOI 10.48550/arxiv.1807.02324 Type Preprint Author Ratajczak M -
2014
Title Context-Specific Deep Conditional Random Fields for Structured Prediction. Type Conference Proceeding Abstract Author Pernkopf F Et Al Conference International Conference on Machine Learning (ICML), Workshop on Learning Tractable Probabilistic Models, 2014 -
2014
Title Integer Bayesian Network Classifiers DOI 10.1007/978-3-662-44845-8_14 Type Book Chapter Author Tschiatschek S Publisher Springer Nature Pages 209-224 -
2014
Title General Stochastic Networks for Classification. Type Conference Proceeding Abstract Author Pernkopf F Conference Neural Information Processing Systems (NIPS) -
2014
Title Single-Channel Source Separation with General Stochastic Networks. Type Conference Proceeding Abstract Author Pernkopf F Conference Interspeech, 2014 -
2014
Title Modeling Speech with SUM-Product Networks: Application to Bandwidth Extension DOI 10.1109/icassp.2014.6854292 Type Conference Proceeding Abstract Author Peharz R Pages 3699-3703 -
2015
Title On Representation Learning for Artificial Bandwidth Extension. Type Conference Proceeding Abstract Author Pernkopf F Et Al Conference Interspeech 2015 -
2015
Title Learning of Bayesian Network Classifiers Under Computational Constraints. Type Journal Article Author Pernkopf F -
2015
Title Structured Regularizer for Neural Higher-Order Sequence Models DOI 10.1007/978-3-319-23528-8_11 Type Book Chapter Author Ratajczak M Publisher Springer Nature Pages 168-183 -
2015
Title Representation Learning for Single-Channel Source Separation and Bandwidth Extension DOI 10.1109/taslp.2015.2470560 Type Journal Article Author Zöhrer M Journal IEEE/ACM Transactions on Audio, Speech, and Language Processing Pages 2398-2409 -
2015
Title On Bayesian Network Classifiers with Reduced Precision Parameters DOI 10.1109/tpami.2014.2353620 Type Journal Article Author Tschiatschek S Journal IEEE Transactions on Pattern Analysis and Machine Intelligence Pages 774-785 -
2013
Title MODEL ADAPTATION OF FACTORIAL HMMS FOR MULTIPITCH TRACKING DOI 10.1109/icassp.2013.6638977 Type Conference Proceeding Abstract Author Wohlmayr M Pages 6792-6796 -
2015
Title Parameter Learning of Bayesian Network Classifiers Under Computational Constraints DOI 10.1007/978-3-319-23528-8_6 Type Book Chapter Author Tschiatschek S Publisher Springer Nature Pages 86-101 Link Publication -
2015
Title On theoretical properties of sum-product Networks. Type Conference Proceeding Abstract Author Doningos P Et Al Conference Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS -
2015
Title Neural Higher-Order Factors in Conditional Random Fields for Phoneme Classification. Type Conference Proceeding Abstract Author Pernkopf F Et Al Conference Interspeech, 2015 -
2014
Title On Self-Adaptation in Single-Channel Source Separation. Type Conference Proceeding Abstract Author Pernkopf F Conference Interspeech, 2014