Projectdetail

Disciplines

Computer Sciences (20%); Linguistics and Literature (80%)

Keywords

Conversational Speech, Prosodic Models, Automatic Speech Recognition, Austrian German, Pronunciation Variation, Machine Learning

Abstract

Final report

Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered `ungrammatical` and contain disfluencies such as ...oh, well, I think ahm exactly . Moreover, in spontaneous conversation, a word like yesterday may sound like yeshay and the German word haben (to have) may sound like ham. The pronunciation of the words depends on well-known factors, for instance on the regional background of the speakers and the formality of the situation. Highly influential, but not so well studied factors are those reflecting the prosodic characteristics of the word in the utterance. These prosodic characteristics describe the rhythm and melody of a sentence, and for instance whether a word is accented or not. The proposed project aims at investigating which role prosody plays for pronunciation variation from a linguistic point of view and at incorporating gained knowledge into an ASR system. In our investigations, we will use speech material from German and Austrian speakers. In contrast to most research in the field of prosody which used read sentences or prepared speech, we will annotate and analyze speech from free conversations between speakers who know each other well. Such speech material is not only more naturalistic, but also richer in pronunciation variation. In sum, our project will deliver the first prosodically annotated database for conversational Austrian German, automatic tools for the creation of prosodic annotations and a prosody-dependent ASR system for conversational speech from German and Austrian German speakers.

Cross-layer prosodic models for conversational speech Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered 'ungrammatical' and contain disfluencies such as "...oh, well, I think ahm exactly ". Moreover, in spontaneous conversation, a word like "yesterday" may sound like "yeshay" and the German word "haben" ("to have") may sound like "ham". The pronunciation of the words depends on well-known factors, for instance on the regional background of the speakers and the formality of the situation. Highly influential, but not so well studied factors are those reflecting the prosodic characteristics of the word in the utterance. These prosodic characteristics describe the rhythm and melody of a sentence, and for instance whether a word is accented or not. This Elise Richter project investigating which role prosody plays for pronunciation variation from a linguistic point of view and investigated statistical methods for its quantitative analyis. In our investigations, we used speech material from German and Austrian speakers. In contrast to most research in the field of prosody which used read sentences or prepared speech, we annotated and analyzed speech from casual, free conversations between speakers who know each other well. Such speech material is not only more naturalistic, but also richer in pronunciation variation, coming with the challenge of more variation and the requirement of more complex statistic techniques. One of our main findings was that from a speech-melody, rhythm point of view, German and Austrian German conversations show a more similar pattern than Austrian read and Austrian conversational speech, leading us to the conclusion that with respect to prosodic phrasing, speaking style is more relevant than the regional background of the speakers. One main deliverable of the project was the first prosodically annotated database for conversational Austrian German, along with automatic tools for the creation of prosodic annotations. These will continued to be used to develop an automatic speech recognition system for conversational speech from German and Austrian German speakers. What is more, the speech database is already in use by linguists and speech technologists by national and international academic research institutions

Research institution(s)

Technische Universität Graz - 100%

International project participants

Margaret Zellers, University of Stockholm - Sweden
Philip Garner, Idiap Research Institute - Switzerland

Research Output

8 Citations
11 Publications
1 Policies
2 Methods & Materials
1 Disseminations
3 Scientific Awards

Publications

Title	Automatic detection of prosodic boundaries in two varieties of German
Type	Conference Proceeding Abstract
Author	Ludusan B.
Conference	Interspeech 2019 Satellite Workshop on 'Pluricentric Languages in Speech Technology'
Link	Publication

Title	Towards automatic annotation of prosodic prominence levels in Austrian German
DOI	10.21437/speechprosody.2020-204
Type	Conference Proceeding Abstract
Author	Linke J
Pages	1000-1004

Title	Microprosodic Variability in Plosives in German and Austrian German
DOI	10.21437/interspeech.2020-2353
Type	Conference Proceeding Abstract
Author	Zellers M
Pages	656-660

Title	Developing an Annotation System for Communicative Functions for a Cross-Layer ASR System
Type	Conference Proceeding Abstract
Author	Kelterer A.
Conference	ESSLLI Workshop "Integrating Perspectives on Discourse Annotation" (DiscAnn)
Link	Publication

Title	The prosody of theme, rheme and focus in Egyptian Arabic: A quantitative investigation of tunes, configurations and speaker variability
DOI	10.1016/j.specom.2024.103082
Type	Journal Article
Author	Zarka D
Journal	Speech Communication
Pages	103082
Link	Publication

Title	An introduction to pluricentric languages in speech science and technology
DOI	10.1016/j.specom.2023.103007
Type	Journal Article
Author	Adda-Decker M
Journal	Speech Communication

Title	Acoustic Cues to Topic and Narrow Focus in Egyptian Arabic
DOI	10.21437/interspeech.2019-1189
Type	Conference Proceeding Abstract
Author	Zarka D
Pages	1771-1775

Title	Prosodic Effects on Plosive Duration in German and Austrian German
DOI	10.21437/interspeech.2019-2197
Type	Conference Proceeding Abstract
Author	Schuppler B
Pages	1736-1740

Title	An analysis of prosodic boundary detection in German and Austrian German read speech
DOI	10.21437/speechprosody.2020-202
Type	Conference Proceeding Abstract
Author	Ludusan B
Pages	990-994

Title	Towards building a cross-lingual speech recognition system for Slovenian and Austrian German,
Type	Journal Article
Author	A. Žgank
Journal	The Phonetician
Link	Publication

Title	An analysis of prosodic boundary detection in German and Austrian German read speech,
Type	Conference Proceeding Abstract
Author	Ludusan B.
Conference	Speeh Prosody
Pages	990-994
Link	Publication

Policies

Title	ELRC
Type	Membership of a guideline committee

Methods & Materials

Public Access
Title	GRASS corpus
Type	Improvements to research infrastructure

Public Access
Title	Prosodic Boundary Annotation Tool
Type	Improvements to research infrastructure

Disseminations

Title	Radio interview
Type	A press release, press conference or response to a media enquiry/interview

Scientific Awards

Title	Guest Professor
Type	Attracted visiting staff or user to your research group
Level of Recognition	National (any country)

Title	Speech Communication Editor
Type	Appointed as the editor/advisor to a journal or book series
Level of Recognition	Continental/International

Title	Keynote speech
Type	Personally asked as a key note speaker to a conference
Level of Recognition	National (any country)

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Cross-layer prosodic models for conversational speech

Cross-layer prosodic models for conversational speech

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Cross-layer prosodic models for conversational speech

Cross-layer prosodic models for conversational speech

Disciplines

Keywords

Research Output