Projectdetail

Grant DOI 10.55776/P23821
Funding program Principal Investigator Projects
Status Ended
Start February 1, 2012
End June 30, 2016
Funding amount € 296,510

Disciplines

Computer Sciences (95%); Linguistics and Literature (5%)

Keywords

Speech Synthesis,
Hidden Markov Model,
Dialect,
Machine Learing,
Adaption

Abstract

Final report

Our main goal in this research project is the advancement of variety modeling for speech synthesis through the optimal use of available data resources. The fact of having phonetically similar data within different social (sociolects) or regional (dialects) varieties and the potential to use statistical parametric synthesis to adapt models with a relatively small amount of data from background models will yield synthesis methods that can model varieties using only a few minutes of speech adaptation data. To reach this overall goal we focus on three topics that are highly relevant for variety modeling and that represent new scientific challenges, namely average voice models for varieties, modeling of variety transformation, and modeling of varieties with incomplete training data. In average voice modeling for varieties we will investigate variety and speaker adaptive training as a new training method for average voice models. In variety transformation we will develop techniques to create a speaker`s voice in a certain variety when only having speech data of the speaker in a similar variety. Furthermore we will investigate methods to create a speaker`s voice from incomplete speech data sets, which can be used to synthesize historic dialect states. Speech synthesis is becoming more and more important as an output interface in cognitive user interfaces. While it is possible to achieve natural sounding speech synthesis for neutral style speech with today`s technology, the fast adaptation of speech synthesis systems to different contexts and situations is still a problem, something to which both speakers and listeners are used to. While emotional speech and natural intonation are an area of active research, relatively little research is devoted to language varieties. Within this project we will develop the necessary methods to develop speech synthesis systems that can be easily adapted to social and regional language varieties. To achieve this we search for optimal ways to use the available training data by exploiting similarities within social and regional varieties using different layers of abstraction.

Our main goal in this research project was the advancement of variety modeling for speech synthesis. To reach this overall goal, we focused on three topics that are highly relevant for variety modeling and that represented new scientific challenges, namely modeling of variety transformation, average voice models for varieties, and modeling of varieties with incomplete training data. In modeling of variety transformation, we developed a method for unsupervised interpolation of language varieties that automatically creates in-between varieties by generating gradual transitions between two varieties, be it two dialects/sociolects, or a dialect and a standard. Furthermore, we developed a cross-variety speaker transformation method that can create a speakers voice in a certain variety even if only speech data of another variety of the speaker are available. In average voice modeling, we investigated different adaptation methods like dialect-adaptive training and dialect clustering that exploit the common phone sets of dialects and standard and applied an adaptive modelling method that uses one variety as background and one as adaptation variety to Albanian dialects.On modeling of varieties with incomplete training data we evaluated the perception of foreign-accented natural and synthetic speech in comparison to automatically accent-reduced synthetic speech. The applied method does not use an average voice model but only the phonetically incomplete accented speech data.Speech synthesis is becoming increasingly important as an output interface in cognitive user interfaces. While emotional speech and natural intonation are an area of active research, less attention has been paid to the investigation of language varieties in the context of speech synthesis. Within this project we developed methods for speech synthesis systems that can be easily adapted to social and regional language varieties.

Research institution(s)

Österreichische Akademie der Wissenschaften - 100%

International project participants

Sebastian Möller, Technische Universität Berlin - Germany
Junichi Yamagishi, National Institute of Informatics - Japan

Research Output

19 Citations
17 Publications

Publications

Title	Evaluation of state mapping based foreign accent conversion.
Type	Conference Proceeding Abstract
Author	Toman M
Conference	16th Annual Conference of the International Speech Communication Association

Title	Development of a statistical parametric synthesis system for operatic singing in German
DOI	10.21437/ssw.2016-11
Type	Conference Proceeding Abstract
Author	Pucher M
Pages	64-69
Link	Publication

Title	Influence of speaker familiarity on blind and visually impaired children’s and young adults’ perception of synthetic voices
DOI	10.1016/j.csl.2017.05.010
Type	Journal Article
Author	Pucher M
Journal	Computer Speech & Language
Pages	179-195
Link	Publication

Title	Multi-variety adaptive acoustic modeling in HSMM-based speech synthesis.
Type	Conference Proceeding Abstract
Author	Toman M
Conference	8th ISCA Speech Synthesis Workshop (SSW8).

Title	An Open Source Speech Synthesis Frontend for HTS
DOI	10.1007/978-3-319-24033-6_33
Type	Book Chapter
Author	Toman M
Publisher	Springer Nature
Pages	291-298

Title	Efficient Pitch Estimation on Natural Opera-Singing by a Spectral Correlation based Strategy.
Type	Journal Article
Author	Villavicencio F
Journal	IPSJ SIG Technical Report.

Title	Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio games.
Type	Conference Proceeding Abstract
Author	Pucher M
Conference	16th Annual Conference of the International Speech Communication Association.

Title	Unsupervised and phonologically controlled interpolation of Austrian German language varieties for speech synthesis
DOI	10.1016/j.specom.2015.06.005
Type	Journal Article
Author	Toman M
Journal	Speech Communication
Pages	176-193
Link	Publication

Title	Adaptive Speech Synthesis of Albanian Dialects
DOI	10.1007/978-3-319-24033-6_18
Type	Book Chapter
Author	Pucher M
Publisher	Springer Nature
Pages	158-164

Title	Comparison of dialect models and phone mappings in HSMM-based visual dialect speech synthesis.
Type	Conference Proceeding Abstract
Author	Schabus D
Conference	1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing (FAAVSP).

Title	Visio-articulatory to acoustic conversion of speech
DOI	10.1145/2813852.2813858
Type	Conference Proceeding Abstract
Author	Pucher M
Pages	1-2

Title	Cross-variety speaker transformation in HSMM-based speech synthesis.
Type	Conference Proceeding Abstract
Author	Toman M
Conference	8th ISCA Speech Synthesis Workshop (SSW8).

Title	Structural KLD for Cross-Variety Speaker Adaptation in HMM-based Speech Synthesis
DOI	10.2316/p.2013.798-069
Type	Conference Proceeding Abstract
Author	Toman M

Title	Aufnahme von hochwertigen authentischen Dialektdaten im Feld.
Type	Conference Proceeding Abstract
Author	Pucher M
Conference	13 Bayerisch-österreichische Dialektologentagung.

Title	MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech.
Type	Other
Author	Pucher M

Title	GIDS Bad Goisern and Innervillgraten Audio-Visual Dialect Speech Corpus, a collection of audiovisual speech recordings for research purposes.
Type	Other
Author	Pucher M

Title	FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing.
Type	Other
Author	Pucher M

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Acoustic modeling and transformation of varieties for speech synthesis

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Acoustic modeling and transformation of varieties for speech synthesis

Disciplines

Keywords

Research Output