• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • ERA-NET TRANSCAN
        • Alternative Methods to Animal Testing
        • European Partnership Biodiversa+
        • European Partnership BrainHealth
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

A Corpus Based Investigation into Segmental Duration in German Speech

A Corpus Based Investigation into Segmental Duration in German Speech

Harald Trost (ORCID: )
  • Grant DOI 10.55776/P13224
  • Funding program Principal Investigator Projects
  • Status ended
  • Start November 1, 1998
  • End September 30, 2002
  • Funding amount € 148,979
  • Project website

Disciplines

Computer Sciences (85%); Linguistics and Literature (15%)

Keywords

    SPRACHSYNTHESE, PROSODIE, COMPUTERLINGUISTIK, LANGUAGE ENGINEERING, ARTIFICIAL INTELLIGENCE

Abstract Final report

Automatic speech synthesis is a highly promising field. of growing economic importance. Naturalsounding speech is a key factor for the acceptability of practical voice output systems whereby the main factors contributing to naturalness are segmental quality and prosody. Current improvements in the segmental quality of synthesized speech have made it clear that truly high-quality speech synthesis now depends crucially on adequate and natural- sounding prosody as well. Besides, new application areas going beyond text-to-speech like spoken dialogue and concept-to-speech systems ask for the production of utterances with non-neutral prosody. We propose a project for the investigation of segmental duration in German speech. At the moment, most research in prosody is directed at intonation and its realization through fundamental frequency (f0) as the single most important factor in prosody, while duration (and amplitude) are regarded as secondary (dependent) parameters. A more thorough investigation of duration and its interaction with f0 is necessary before -if at all - such a conclusion can be drawn. The goal of the project is a better model of segmental duration, and also its interaction with other parameters like f0, where the currently prevalent methods are not satisfactory for users in terms of naturalness. Moreover, we want to get a clearer understanding of the relation between discourse structure and prosodic parameters. New application areas make rich linguistic information available to speech production for the first time. To take advantage of this information, the realization of these features by prosodic parameters has to be understood. Central to our approach will be the investigation of the interdependencies between intonation and duration, i.e., fundamental frequency will explicitly be taken into account by means of tone labeling. Another novelty is the explicit incorporation of discourse related information such as the division of topic, focus and background in a dedicated part of the corpus. When investigating prosody we have to take into account a large number of (potential) parameters without the possibility to recur to an agreed-upon linguistic theory covering the whole range of phenomena. Moreover, we cannot apriori exclude non-linear dependencies between these parameters. In such a situation a data-driven, statistical approach seems to be appropriate. To investigate segmental duration in that paradigm we need speech corpora of adequate size with prosodic labeling. Because of the number of influencing factors to be considered, we will need a corpus of considerable size (50.000+ phonemes). The corpus used in our study will be the first corpus of that kind of Austrian German. The construction of such a speech database not only is an indispensable prerequisite for our study on duration but also will be of interest to other researchers who want to perform phonetical investigations on that variant of German. A demand for such a corpus exists, both in academic and industrial research. The statistical methods selected must be able to cope with the inherently uneven distribution of feature values (data sparsity). Some forms of neural networks have proved to be suitable for the task of predicting segmental duration. Their disadvantage for our purpose is that it is inherently difficult to interpret results achieved. Therefore, we have opted for the use of machine learning methods, in particular Structural Regression Trees (SRT). SRT integrates the statistical method of regression trees with the inductive logic programming paradigm. It is a flexible machine learning paradigm that allows for the use of relational constraints and is well suited for numerical problems. It also fulfills the requirement for producing inspectable results. The results of our study shall be integrated into our existing speech synthesis component. This will provide us with the necessary tool to experimentally test our hypotheses in the evaluation phase. It shall also showcase the practical enhancement of the quality of synthesized speech through the employment of the project`s results. The proposed project shall also form the Austrian contribution to COST action 258 "Naturalness of synthetic Speech". This action comprises research laboratories from 14 European countries. The aim of the action is to develop methods to increase the naturalness of synthetic speech which is a prerequisite for its broad application in commercial applications.

Naturalness is the key factor for the acceptance and comprehensibility of automatically synthesised speech. One of the most important parameters to control is the duration of speech segments. In order to be able to predict the duration of speech sounds it is necessary to analyse data of actual speech by means of statistical methods. For this purpose, an adequately large corpus of Austrian German was established. We recorded one speaker and segmented and annotated the speech signal. Using machine learning techniques it was possible to achieve appropriate duration models. Their quality was checked against methods in the literature. For the first time we have now a model for the automatic synthesis of the Austrian variant of German. The following factors have to be controlled in order to synthesise speech in an unlimited and natural way: intensity, pitch, and - maybe most important - the duration of particular acoustic events. This is independent of the method for the generation of speech signals, be it the simulation of the characteristics of the sound (formant synthesis), the derivation from production models (articulatory synthesis), or the concatenation of pre-recorded parts of speech (concatenative synthesis). The core problem to the modelling of duration is the fact that the speech signal functions as a carrier for a variety of information, which are only communicable together. The speaker must agglomerate this information, and the hearer must extract the individual components from the complex signal. Non-linguistic information are for example the emotional state of a speaker. Influencing factors are speaker characteristics or a certain speaking style. Additionally, the make-up of the utterance by various phrases is encoded, as well as accenting. Language specific factors come from the linguistic structuring: from the sentence level (syntax) to the level of phonemes (syllable structure). Beside that there are also genuine phonetic factors such as the mutual influence of neighbouring phonemes which affect the duration of single segments. How can we approach this complex task? Either one postulates a set of rules which result in a duration value for each phoneme, or one tries to simulate natural speech using statistical methods. In this project the second approach was favoured. In order to do so, it was necessary to establish a corpus of spoken speech large and combinatory rich enough for machine learning techniques to provide valid results. Potential influencing factors had to be controlled (we recorded speech in reading style of one single speaker of Austrian Standard German from Vienna). Or the factors had to be determined, if they were taken to have some influence on duration (for example phrasing, accent, syllable structure, neighbouring segments). On top of that it was necessary to segment the signal into individual sounds, in order to obtain reference values for duration. In this corpus the number of phonetic segments is approx. 50.000, which have been at least corrected manually. In a last step the data were used to generate various models using statistical machine learning techniques. These models predict for each phoneme in every potential context a duration value. For optimisation we experimented with various factors and also tested several techniques. The quality of the results as good as the best methods reported in the literature. Maybe the most significant result of this project is that for the first time a model is available for the synthesis of Austrian German.

Research institution(s)
  • ÖFAI - Österreichisches Forschungsinstitut für Artifical Intelligence - 100%
Project participants
  • Gernot Kubin, Technische Universität Graz , associated research partner
International project participants
  • Gzregorz Dogil, Universität Stuttgart-Hohenheim - Germany

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF