Projectdetail

Disciplines

Computer Sciences (40%); Linguistics and Literature (60%)

Keywords

Automatic Speech Recognition, Spontaneous Speech, Pronunciation Variation, Austrian German, Linguistic Models, Dutch

Abstract

Final report

ASR systems have originally been designed to cope with carefully pronounced speech. As a consequence, these systems cannot deal well with spontaneous, conversational speech. Read and conversational speech are different in many aspects. On the linguistic level, conversational speech contains disfluencies and many utterances that might be considered as `ungrammatical`. On the phonetic level, a much higher degree of pronunciation variation is observed in spontaneous than in read speech. Words are more often acoustically reduced compared to their full pronunciations, such that a word like yesterday may sound like yeshay or a German word like haben my sound like ham. Since most real world applications of ASR systems require the recognition of spontaneous speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.), the investigation of new methods to model every-day speech has received a lot of attention among speech technologists. Also in the linguistic and psycholinguistic domain, casual conversations are studied on the search for an answer to how every-day speech production and comprehension works. Their studies have indicated that certain higher level linguistic functions and structures of utterances condition the details of their pronunciation. It is likely that the kind of analysis that is becoming feasible with the growing availability of large speech corpora will bring to light yet unknown factors that affect pronunciation variation. The research envisioned in this proposal is designed to increase our knowledge about spontaneous, conversational speech and to use this knowledge to improve Automatic Speech Recognition (ASR) systems. The first objective is to identify which higher level linguistic structures and functions condition pronunciation variation by means of quantitative phonetic analyses. Studies will be carried out on Dutch and on Austrian German material, which will allow to draw conclusions about which findings are language specific and which are characteristic for conversational speech in general. The second objective is to improve ASR technology by incorporating the gained knowledge about the conditions for pronunciation variation. Most ASR systems still deal with acoustic and linguistic information independently of each other. In contrast, I propose a Cross-layer pronunciation modeling technique, which (1) makes use of the gained knowledge about the effects of several layers of linguistic structures and functions on pronunciation variation, and (2) which means that the recognizer makes use of lexicons in more than just one layer of its architecture. Additional deliverables of this project are the collected speech material along with the created tools for its automatic annotation, which both would be of great value for future studies of linguists and engineers.

The Problem Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared or read speech, conversational speech contains utterances that might be considered 'ungrammatical' and contain disfluencies, such as ...oh, well, I think ahhm exactly The pronunciation of the words may depend for instance on the regional background of the speakers, the formality of the situation or the frequency of the word. A highly frequent word like yesterday may sound like yeshay and the German word haben (to have) may sound like ham. This project focused on investigating interdisciplinary methods (including linguistics, phonetics, speech technology) to model the factors on which pronunciation variation depends in everyday speech. The Methods In this project, we collected and annotated the first largescale speech database of Austrian German. It is a rich resource on pronunciation variation in Austrian German, containing approximately 1900 minutes of speech spoken by 38 speakers from 5 provinces in 3 different speaking styles (read speech, spontaneous commands, and conversational speech). Moreover, it is one of the largest German speech databases with completely unconstrained and casual conversations, and thus is also relevant to speech scientists outside of Austria. We have also developed transcription tools for the corpus and have made both the speech material and the tools available for other researchers.The Findings Based on Dutch, German and the collected Austrian German speech material, we found that pronunciation variation does not only depend on well known factors such as the regional background of the speaker and the speaking style, but also on, for example, the grammatical and morphological properties of the words. For instance, whereas in spontaneous speech the German word der is pronounced differently depending on whether it is an article, a demonstrative pronoun or a relative pronoun, in read speech it is always pronounced the same way. These linguistic findings for pronunciation variation were used to develop methods to improve ASR systems. Most importantly, our work not only demonstrates novel methods for ASR, it introduces a new perspective: Whereas previously, the high degree of pronunciation variation in spontaneous speech was primarily seen as a problem for ASR, we view it as an additional resource which is not present in read speech. This change in perspective will guide our future research plans.

Research institution(s)

Technische Universität Graz - 100%

International project participants

Mirjam Ernestus, Radboud University - Netherlands

Research Output

40 Citations
13 Publications

Publications

Title	Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles
DOI	10.1007/s10772-017-9436-y
Type	Journal Article
Author	Schuppler B
Journal	International Journal of Speech Technology
Pages	699-713
Link	Publication

Title	A corpus of read and conversational Austrian German
DOI	10.1016/j.specom.2017.09.003
Type	Journal Article
Author	Schuppler B
Journal	Speech Communication
Pages	62-74

Title	Acoustic correlates of stress and accent in Standard Austrian German.
Type	Book Chapter
Author	El Zarka D

Title	Informal speech processes can be categorical in nature, even if they affect many different words
DOI	10.1121/1.4790352
Type	Journal Article
Author	Hanique I
Journal	The Journal of the Acoustical Society of America
Pages	1644-1655
Link	Publication

Title	On the use of acoustic features for automatic disambiguation of homophones in spontaneous German
DOI	10.1016/j.csl.2017.12.011
Type	Journal Article
Author	Schuppler B
Journal	Computer Speech & Language
Pages	209-224

Title	Pronunciation Variation in Read and Conversational Austrian German.
Type	Conference Proceeding Abstract
Author	Morales-Cordovilla Ja Et Al
Conference	Proceedings of Interspeech

Title	How extra-linguistic factors affect pronunciation variation in different speaking styles.
Type	Conference Proceeding Abstract
Author	Schuppler B
Conference	22Nd Czech-German Workshop on Speech Communication.

Title	GRASS: The Graz Corpus of Read and Spontaneous Speech.
Type	Conference Proceeding Abstract
Author	Pessentheiner H Et Al
Conference	Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14).

Title	The challenge of manner classification in conversational speech.
Type	Conference Proceeding Abstract
Author	Boves L Et Al
Conference	Proceedings of the Workshop on Speech Production in Automatic Speech Recognition, Satellite Workshop of Interspeech

Title	Automatic detection of uncertainty in spontaneous German dialogue.
Type	Conference Proceeding Abstract
Author	Schrank T
Conference	Proceedings of Interspeech

Title	Statistical Language and Speech Processing, Second International Conference, SLSP 2014, Grenoble, France, October 14-16, 2014, Proceedings
DOI	10.1007/978-3-319-11397-5
Type	Book
Publisher	Springer Nature

Title	Automatic Phonetic Transcription in Two Steps: Forced Alignment and Burst Detection
DOI	10.1007/978-3-319-11397-5_10
Type	Book Chapter
Author	Schuppler B
Publisher	Springer Nature
Pages	132-143

Title	Where /aR/ the /R/s in Standard Austrian German?
Type	Conference Proceeding Abstract
Author	Jackschina A
Conference	Proceedings of Interspeech

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Cross-layer pronunciation modeling for conversational speech

Cross-layer pronunciation modeling for conversational speech

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Cross-layer pronunciation modeling for conversational speech

Cross-layer pronunciation modeling for conversational speech

Disciplines

Keywords

Research Output