Cross-layer prosodic models for conversational speech
Cross-layer prosodic models for conversational speech
Disciplines
Computer Sciences (20%); Linguistics and Literature (80%)
Keywords
-
Conversational Speech,
Prosodic Models,
Automatic Speech Recognition,
Austrian German,
Pronunciation Variation,
Machine Learning
Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered `ungrammatical` and contain disfluencies such as ...oh, well, I think ahm exactly . Moreover, in spontaneous conversation, a word like yesterday may sound like yeshay and the German word haben (to have) may sound like ham. The pronunciation of the words depends on well-known factors, for instance on the regional background of the speakers and the formality of the situation. Highly influential, but not so well studied factors are those reflecting the prosodic characteristics of the word in the utterance. These prosodic characteristics describe the rhythm and melody of a sentence, and for instance whether a word is accented or not. The proposed project aims at investigating which role prosody plays for pronunciation variation from a linguistic point of view and at incorporating gained knowledge into an ASR system. In our investigations, we will use speech material from German and Austrian speakers. In contrast to most research in the field of prosody which used read sentences or prepared speech, we will annotate and analyze speech from free conversations between speakers who know each other well. Such speech material is not only more naturalistic, but also richer in pronunciation variation. In sum, our project will deliver the first prosodically annotated database for conversational Austrian German, automatic tools for the creation of prosodic annotations and a prosody-dependent ASR system for conversational speech from German and Austrian German speakers.
Cross-layer prosodic models for conversational speech Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered 'ungrammatical' and contain disfluencies such as "...oh, well, I think ahm exactly ". Moreover, in spontaneous conversation, a word like "yesterday" may sound like "yeshay" and the German word "haben" ("to have") may sound like "ham". The pronunciation of the words depends on well-known factors, for instance on the regional background of the speakers and the formality of the situation. Highly influential, but not so well studied factors are those reflecting the prosodic characteristics of the word in the utterance. These prosodic characteristics describe the rhythm and melody of a sentence, and for instance whether a word is accented or not. This Elise Richter project investigating which role prosody plays for pronunciation variation from a linguistic point of view and investigated statistical methods for its quantitative analyis. In our investigations, we used speech material from German and Austrian speakers. In contrast to most research in the field of prosody which used read sentences or prepared speech, we annotated and analyzed speech from casual, free conversations between speakers who know each other well. Such speech material is not only more naturalistic, but also richer in pronunciation variation, coming with the challenge of more variation and the requirement of more complex statistic techniques. One of our main findings was that from a speech-melody, rhythm point of view, German and Austrian German conversations show a more similar pattern than Austrian read and Austrian conversational speech, leading us to the conclusion that with respect to prosodic phrasing, speaking style is more relevant than the regional background of the speakers. One main deliverable of the project was the first prosodically annotated database for conversational Austrian German, along with automatic tools for the creation of prosodic annotations. These will continued to be used to develop an automatic speech recognition system for conversational speech from German and Austrian German speakers. What is more, the speech database is already in use by linguists and speech technologists by national and international academic research institutions
- Technische Universität Graz - 100%
- Margaret Zellers, University of Stockholm - Sweden
- Philip Garner, Idiap Research Institute - Switzerland
Research Output
- 7 Citations
- 10 Publications
- 1 Policies
- 2 Methods & Materials
- 1 Disseminations
- 3 Scientific Awards
-
2024
Title The prosody of theme, rheme and focus in Egyptian Arabic: A quantitative investigation of tunes, configurations and speaker variability DOI 10.1016/j.specom.2024.103082 Type Journal Article Author El Zarka D Journal Speech Communication -
2024
Title An introduction to pluricentric languages in speech science and technology DOI 10.1016/j.specom.2023.103007 Type Journal Article Author Adda-Decker M Journal Speech Communication -
2020
Title Towards building a cross-lingual speech recognition system for Slovenian and Austrian German, Type Journal Article Author A. Žgank Journal The Phonetician Link Publication -
2020
Title An analysis of prosodic boundary detection in German and Austrian German read speech, Type Conference Proceeding Abstract Author Ludusan B. Conference Speeh Prosody Pages 990-994 Link Publication -
2019
Title Prosodic Effects on Plosive Duration in German and Austrian German DOI 10.21437/interspeech.2019-2197 Type Conference Proceeding Abstract Author Schuppler B Pages 1736-1740 -
2019
Title Acoustic Cues to Topic and Narrow Focus in Egyptian Arabic DOI 10.21437/interspeech.2019-1189 Type Conference Proceeding Abstract Author Zarka D Pages 1771-1775 -
2019
Title Automatic detection of prosodic boundaries in two varieties of German Type Conference Proceeding Abstract Author Ludusan B. Conference Interspeech 2019 Satellite Workshop on 'Pluricentric Languages in Speech Technology' Link Publication -
2020
Title An analysis of prosodic boundary detection in German and Austrian German read speech DOI 10.21437/speechprosody.2020-202 Type Conference Proceeding Abstract Author Ludusan B Pages 990-994 -
2020
Title Towards automatic annotation of prosodic prominence levels in Austrian German DOI 10.21437/speechprosody.2020-204 Type Conference Proceeding Abstract Author Linke J Pages 1000-1004 -
2020
Title Microprosodic Variability in Plosives in German and Austrian German DOI 10.21437/interspeech.2020-2353 Type Conference Proceeding Abstract Author Zellers M Pages 656-660
-
2021
Title ELRC Type Membership of a guideline committee
-
2019
Title GRASS corpus Type Improvements to research infrastructure Public Access -
0
Title Prosodic Boundary Annotation Tool Type Improvements to research infrastructure Public Access
-
2019
Title Radio interview Type A press release, press conference or response to a media enquiry/interview
-
2019
Title Speech Communication Editor Type Appointed as the editor/advisor to a journal or book series Level of Recognition Continental/International -
2018
Title Keynote speech Type Personally asked as a key note speaker to a conference Level of Recognition National (any country) -
2020
Title Guest Professor Type Attracted visiting staff or user to your research group Level of Recognition National (any country)