• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • ERA-NET TRANSCAN
        • Alternative Methods to Animal Testing
        • European Partnership Biodiversa+
        • European Partnership BrainHealth
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

Adaptive Audio-Visual Dialect Speech Synthesis

Adaptive Audio-Visual Dialect Speech Synthesis

Michael Pucher (ORCID: 0000-0002-5374-1342)
  • Grant DOI 10.55776/P22890
  • Funding program Principal Investigator Projects
  • Status ended
  • Start January 1, 2011
  • End September 30, 2014
  • Funding amount € 299,526

Disciplines

Computer Sciences (85%); Linguistics and Literature (15%)

Keywords

    Speech Synthesis, Visual Synthesis, Dialect

Abstract Final report

The goal of this project is to investigate multimodal adaptation for audio-visual speech synthesis. Human speech is multimodal and therefore we aim at modeling both the audio and visual signals jointly. Furthermore, in speech behavior we are confronted with intra-speaker variability (e.g. variability in dependence on different speech situations, speaking tasks or emotional states of the speaker) and inter-speaker variability (e.g. variability across sociolects and/or dialects). The second type of variation can be modeled by adapting average models of speakers with different dialects to a speaker of a specific dialect. Dialect is chosen as a source of variation between speakers to extend our previous work on Viennese sociolects to other Austrian dialects and to conduct basic research on the audio-visual synthesis of dialects. Generally, audio-visual speech synthesis is the attempt to generate speech and visual signals of a person speaking. In most previous approaches the acoustic and visual signal were modeled separately although both signals are the result of the same underlying articulation process and should be treated as one. Moreover, adding visual information to the synthesis models might lead to better overall acoustic synthesis. Therefore, we propose a joint audio visual modeling framework that is able to generate both acoustic and visual speech for different Austrian dialects. By employing hidden Markov models (HMMs) for both audio and visual speech synthesis, it is possible to combine these two feature streams into a single model. Therefore, a major aspect of this project will be the multimodal adaptation of audio-visual synthesis models. The joint adaptation of audio and visual models from multimodal audio-visual models has not yet been investigated and leads to several important research questions that we want to address in this project.

Generally, audio-visual speech synthesis is the attempt to generate speech and visual signals of a person speaking. Audio-visual synthesis can be used in communication technologies and computer games. In this project we investigated multimodal modelling for audio-visual dialect speech synthesis. Human speech is multimodal and therefore we modeled both the audio and visual signals jointly. In most previous approaches the acoustic and visual signal were modeled separately although both signals are the result of the same underlying articulation process and should be treated as one.In this project we could show that joint modeling of visual and acoustic signals can lead to better visual synthesis, without changing the quality of the acoustic synthesis. Through the use of flexible models that can be adapted through parameters, these models can be easily reused and transformed.Furthermore we could show that adaptation of visual average models with new data can improve modeling compared to models that dont use any background data. With this method it is possible to train a visual model of a person with a small amount of adaptation data.For controlling acoustic models with a large number of parameters, we developed a method that allows for changing acoustic parameters via visual parameters. The opening of the mouth in the visual model can thus lead to the corresponding acoustic changes in the acoustic model.For modeling of dialects we did extensive recordings of two Austrian dialects with 8 speakers, one Middle Bavarian dialect from Upper Austria (Bad Goisern) and one South Bavarian dialect from Tyrol (Innervillgraten). For these audio-visual dialect recordings for speech synthesis, we developed a method for phonetic data collection and audio-visual recording. For modeling of dialects we developed methods for optimal use of dialect data. The recorded data is already used in other running projects and will also lead to new findings in the future.

Research institution(s)
  • FTW Forschungszentrum Telekommunikation - 89%
  • Österreichische Akademie der Wissenschaften - 11%
Project participants
  • Sylvia Moosmüller, Österreichische Akademie der Wissenschaften , associated research partner

Research Output

  • 233 Citations
  • 14 Publications
Publications
  • 2011
    Title Phone set selection for HMM-based dialect speech Synthesis.
    Type Conference Proceeding Abstract
    Author Pucher M
  • 2014
    Title The MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech.
    Type Conference Proceeding Abstract
    Author Hoole P Et Al
    Conference LREC 2014
  • 2013
    Title Joint Audiovisual Hidden Semi-Markov Model-Based Speech Synthesis
    DOI 10.1109/jstsp.2013.2281036
    Type Journal Article
    Author Schabus D
    Journal IEEE Journal of Selected Topics in Signal Processing
    Pages 336-347
    Link Publication
  • 2013
    Title Objective and Subjective Feature Evaluation for Speaker-Adaptive Visual Speech Synthesis.
    Type Conference Proceeding Abstract
    Author Hofer G Et Al
    Conference AVSP 2013
  • 2013
    Title Visual Control of Hidden-Semi-Markov-Model based Acoustic Speech Synthesis.
    Type Conference Proceeding Abstract
    Author Hollenstein J
    Conference AVSP 2013
  • 2012
    Title From Viennese to Austrian German and back again-An alogorithm for the realization of a variety-slider.
    Type Conference Proceeding Abstract
    Author Hofer G Et Al
    Conference SIDG 2012
  • 2012
    Title Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audiovisual speech Synthesis.
    Type Conference Proceeding Abstract
    Author Hofer G Et Al
    Conference LREC 2012
  • 2012
    Title Sprachressourcen für adaptive Sprachsynthesen von Dialekten.
    Type Conference Proceeding Abstract
    Author Hofer G Et Al
    Conference SIDG 2012
  • 2012
    Title Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech
    DOI 10.1109/tasl.2012.2201472
    Type Journal Article
    Author De Leon P
    Journal IEEE Transactions on Audio, Speech, and Language Processing
    Pages 2280-2290
    Link Publication
  • 2011
    Title DETECTION OF SYNTHETIC SPEECH FOR THE PROBLEM OF IMPOSTURE
    DOI 10.1109/icassp.2011.5947440
    Type Conference Proceeding Abstract
    Author De Leon P
    Pages 4844-4847
    Link Publication
  • 2012
    Title Speaker-adaptive visual speech synthesis in the HMM-Framework.
    Type Conference Proceeding Abstract
    Author Hofer G Et Al
  • 2012
    Title Regionalizing Virtual Avatars - Towards Adaptive Audio-Visual Dialect Speech Synthesis.
    Type Conference Proceeding Abstract
    Author Moosmüller S Et Al
    Conference In Proc. 5th International Conference on Cognitive Systems, Vienna, Austria, 2012
  • 2015
    Title Unsupervised and phonologically controlled interpolation of Austrian German language varieties for speech synthesis
    DOI 10.1016/j.specom.2015.06.005
    Type Journal Article
    Author Toman M
    Journal Speech Communication
    Pages 176-193
    Link Publication
  • 0
    Title Proceedings Abstract Book.
    Type Other
    Author Pucher M

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF