Acoustic modeling and transformation of varieties for speech synthesis
Acoustic modeling and transformation of varieties for speech synthesis
Disciplines
Computer Sciences (95%); Linguistics and Literature (5%)
Keywords
-
Speech Synthesis,
Hidden Markov Model,
Dialect,
Machine Learing,
Adaption
Our main goal in this research project is the advancement of variety modeling for speech synthesis through the optimal use of available data resources. The fact of having phonetically similar data within different social (sociolects) or regional (dialects) varieties and the potential to use statistical parametric synthesis to adapt models with a relatively small amount of data from background models will yield synthesis methods that can model varieties using only a few minutes of speech adaptation data. To reach this overall goal we focus on three topics that are highly relevant for variety modeling and that represent new scientific challenges, namely average voice models for varieties, modeling of variety transformation, and modeling of varieties with incomplete training data. In average voice modeling for varieties we will investigate variety and speaker adaptive training as a new training method for average voice models. In variety transformation we will develop techniques to create a speaker`s voice in a certain variety when only having speech data of the speaker in a similar variety. Furthermore we will investigate methods to create a speaker`s voice from incomplete speech data sets, which can be used to synthesize historic dialect states. Speech synthesis is becoming more and more important as an output interface in cognitive user interfaces. While it is possible to achieve natural sounding speech synthesis for neutral style speech with today`s technology, the fast adaptation of speech synthesis systems to different contexts and situations is still a problem, something to which both speakers and listeners are used to. While emotional speech and natural intonation are an area of active research, relatively little research is devoted to language varieties. Within this project we will develop the necessary methods to develop speech synthesis systems that can be easily adapted to social and regional language varieties. To achieve this we search for optimal ways to use the available training data by exploiting similarities within social and regional varieties using different layers of abstraction.
Our main goal in this research project was the advancement of variety modeling for speech synthesis. To reach this overall goal, we focused on three topics that are highly relevant for variety modeling and that represented new scientific challenges, namely modeling of variety transformation, average voice models for varieties, and modeling of varieties with incomplete training data. In modeling of variety transformation, we developed a method for unsupervised interpolation of language varieties that automatically creates in-between varieties by generating gradual transitions between two varieties, be it two dialects/sociolects, or a dialect and a standard. Furthermore, we developed a cross-variety speaker transformation method that can create a speakers voice in a certain variety even if only speech data of another variety of the speaker are available. In average voice modeling, we investigated different adaptation methods like dialect-adaptive training and dialect clustering that exploit the common phone sets of dialects and standard and applied an adaptive modelling method that uses one variety as background and one as adaptation variety to Albanian dialects.On modeling of varieties with incomplete training data we evaluated the perception of foreign-accented natural and synthetic speech in comparison to automatically accent-reduced synthetic speech. The applied method does not use an average voice model but only the phonetically incomplete accented speech data.Speech synthesis is becoming increasingly important as an output interface in cognitive user interfaces. While emotional speech and natural intonation are an area of active research, less attention has been paid to the investigation of language varieties in the context of speech synthesis. Within this project we developed methods for speech synthesis systems that can be easily adapted to social and regional language varieties.
Research Output
- 19 Citations
- 17 Publications
-
2017
Title Influence of speaker familiarity on blind and visually impaired children’s and young adults’ perception of synthetic voices DOI 10.1016/j.csl.2017.05.010 Type Journal Article Author Pucher M Journal Computer Speech & Language Pages 179-195 Link Publication -
2013
Title Cross-variety speaker transformation in HSMM-based speech synthesis. Type Conference Proceeding Abstract Author Schabus D Conference 8th ISCA Speech Synthesis Workshop (SSW8). -
2013
Title Structural KLD for Cross-Variety Speaker Adaptation in HMM-based Speech Synthesis DOI 10.2316/p.2013.798-069 Type Conference Proceeding Abstract Author Toman M -
2015
Title Efficient Pitch Estimation on Natural Opera-Singing by a Spectral Correlation based Strategy. Type Journal Article Author Pucher M Et Al Journal IPSJ SIG Technical Report. -
2015
Title Visio-articulatory to acoustic conversion of speech DOI 10.1145/2813852.2813858 Type Conference Proceeding Abstract Author Pucher M Pages 1-2 -
2015
Title Comparison of dialect models and phone mappings in HSMM-based visual dialect speech synthesis. Type Conference Proceeding Abstract Author Pucher M Conference 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing (FAAVSP). -
2016
Title Development of a statistical parametric synthesis system for operatic singing in German DOI 10.21437/ssw.2016-11 Type Conference Proceeding Abstract Author Pucher M Pages 64-69 Link Publication -
2013
Title Multi-variety adaptive acoustic modeling in HSMM-based speech synthesis. Type Conference Proceeding Abstract Author Schabus D Et Al Conference 8th ISCA Speech Synthesis Workshop (SSW8). -
2016
Title Aufnahme von hochwertigen authentischen Dialektdaten im Feld. Type Conference Proceeding Abstract Author Pucher M Conference 13 Bayerisch-österreichische Dialektologentagung. -
2015
Title Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio games. Type Conference Proceeding Abstract Author Pucher M Conference 16th Annual Conference of the International Speech Communication Association. -
2015
Title Adaptive Speech Synthesis of Albanian Dialects DOI 10.1007/978-3-319-24033-6_18 Type Book Chapter Author Pucher M Publisher Springer Nature Pages 158-164 -
2015
Title Evaluation of state mapping based foreign accent conversion. Type Conference Proceeding Abstract Author Pucher M Conference 16th Annual Conference of the International Speech Communication Association -
2015
Title An Open Source Speech Synthesis Frontend for HTS DOI 10.1007/978-3-319-24033-6_33 Type Book Chapter Author Toman M Publisher Springer Nature Pages 291-298 -
2015
Title Unsupervised and phonologically controlled interpolation of Austrian German language varieties for speech synthesis DOI 10.1016/j.specom.2015.06.005 Type Journal Article Author Toman M Journal Speech Communication Pages 176-193 Link Publication -
0
Title MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech. Type Other Author Pucher M -
0
Title GIDS Bad Goisern and Innervillgraten Audio-Visual Dialect Speech Corpus, a collection of audiovisual speech recordings for research purposes. Type Other Author Pucher M -
0
Title FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing. Type Other Author Davis C Et Al