Projectdetail

Grant DOI 10.55776/P32700
Funding program Principal Investigator Projects
Status ended
Start November 1, 2019
End October 31, 2024
Funding amount € 593,189

Disciplines

Electrical Engineering, Electronics, Information Engineering (40%); Linguistics and Literature (60%)

Keywords

Conversational Speech,
Automatic Speech Recognition,
Language Modeling,
Speech Perception,
Prosody,
Communicative Functions

Abstract

Final report

Whereas speech scientists have focused on carefully pronounced speech for a long time, the interest has more and more shifted to language as it occurs in natural conversations. This has two reasons. From a technological point of view, there is an increasing demand for social robots, which in order to become more interactional and social also need to use language naturally. Second, linguists became more interested in natural conversations, as they reveal additional insights to controlled experiments with respect to how speech is processed in our brain. In this project, we aim at improving the automatic recognition of conversational speech, at increasing our knowledge about the human production and perception of conversational speech, and to increase our knowledge and resources for conversational Austrian German. On the basis of conversational speech and chat corpora from German and Austrian speakers, we develop cross-layered language models which include acoustic and semantic contextual information the way humans do. These models will be informed by quantitative phonetic corpus studies and tested in ASR and speech perception experiments. For conducting the linguistic studies, speech technology will be used for creating automatic annotations, acoustic feature extraction and data analysis. Gained linguistic knowledge will then again be incorporated into the language models. This approach requires an interdisciplinary team (engineers and linguists) that works closely together. The PI Dr. Barbara Schuppler (Graz University of Technology) is a young interdisciplinary speech scientist who has shown in two previous FWF projects that her cross-layer principle reaches good results for pronunciation and prosody modelling. The project proposed gives her the opportunity to expand the cross-layer concept to language models, and to establish a research group on conversational speech in Austria. The national partners Prof. Dina El Zarka (Department of Linguistics, University of Graz) and Dr. Roman Kern (Know-Center GmbH) bring long lasting experience to the project. Together, they cover the disciplines speech technology, linguistics, phonetics and natural language processing.

In the last decade, conversational speech has received a lot of attention among speech scientists. On the one hand, accurate automatic speech recognition (ASR) systems are essential for conversational systems and social robots, as these shall become more interactional and social rather than solely transactional. On the other hand, linguists study natural conversations, as these reveal additional insights to controlled experiments with respect to how speech processing works. The works of this project investigate conversational speech to enhance our linguistic knowledge of conversational Austrian German and to use this knowledge to improve ASR systems. For this purpose, the GRASS corpus, a large-scale database of Austrian German conversations, has been annotated with respect to communicative functions annotations suitable for a newly introduced method for quantitative analysis of conversational dynamics. Our work demonstrates that prosodic variation in conversational speech is systematic and linked to semantic and pragmatic context. But how sensitive are ASR systems to prosodic cues and conversational context? Our work suggests that integrating both data-driven and theory-driven components, including linguistic knowledge, can improve ASR, particularly for short utterances. When comparing how ASR systems transcribe conversational speech with how humans transcribe the same utterances, we find that they struggle with the same characteristics of conversational speech (e.g., disfluent sentences, dialectal pronunciation, fast speech rate), but just to a different degree. Finally, the project delivers valuable methods for speech technologists working with low-resource languages and dialects and for working with small datasets of high degrees of variation (e.g., pathological speech, child speech).

Research institution(s)

Project participants

Roman Kern, Technische Universität Graz , associated research partner
Dina El Zarka, Universität Graz , associated research partner

International project participants

Benno Maria Stein, Bauhaus-Universität Weimar - Germany
Bogdan Ludusan, Universität Bielefeld - Germany
Margaret Zellers, University of Stockholm - Sweden
Dimitra Vergyri, SRI International - USA

Research Output

30 Citations
45 Publications
1 Methods & Materials
3 Software
12 Disseminations
6 Scientific Awards
6 Fundings

Publications

Title	What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures
DOI	10.1016/j.csl.2024.101738
Type	Journal Article
Author	Linke J
Journal	Computer Speech & Language
Pages	101738
Link	Publication

Title	What's so complex about conversational speech? Prosodic Prominence and Speech Recognition Challenges
Type	PhD Thesis
Author	Julian Linke

Title	Cross-layer models for conversational speech
Type	Postdoctoral Thesis
Author	Barbara Schuppler

Title	Uncertainty prediction for prominence classification with chroma features
Type	Conference Proceeding Abstract
Author	Linke J.
Conference	ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Pages	1 - 5
Link	Publication

Title	Uncertainty prediction for prominence classification with chroma features
Type	Conference Proceeding Abstract
Author	Linke J.
Conference	Event 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP

Title	Turn-taking annotation for quantitative and qualitative analyses of conversation
Type	Other
Author	Kelterer A.
Pages	1 - 41
Link	Publication

Title	Turn-taking annotation for quantitative and qualitative analyses of conversation
Type	Other
Author	Kelterer A.

Title	Slicer - A Tool for Efficient Stimuli Extraction from Large Speech Corpora
Type	Conference Proceeding Abstract
Author	Eckert L
Conference	Forum Acusticum Euronoise 2025

Title	On the Role of Priors in Bayesian Causal Learning
DOI	10.1109/tai.2024.3522867
Type	Journal Article
Author	Geiger B
Journal	IEEE Transactions on Artificial Intelligence
Pages	1439-1445
Link	Publication

Title	Exploring Graph Theory Methods For the Analysis of Pronunciation Variation in Spontaneous Speech
DOI	10.21437/interspeech.2023-1398
Type	Conference Proceeding Abstract
Author	Geiger B
Pages	596-600

Title	(Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-prosodic Features
DOI	10.21437/interspeech.2023-1538
Type	Conference Proceeding Abstract
Author	Kelterer A
Pages	4768-4772

Title	What do self-supervised speech representations encode? An analysis of languages, varieties, speaking styles and speakers
DOI	10.21437/interspeech.2023-951
Type	Conference Proceeding Abstract
Author	Linke J
Pages	5371-5375

Title	Breath sounds and their relationship to turn-taking in conversational speech
Type	Other
Author	Menrath A.
Link	Publication

Title	Modelling Bachchannels for Human-Robot Interaction
Type	Other
Author	Paierl M.
Link	Publication

Title	Towards Improving ASR Outputs of Spontaneous Speech with LLMs
Type	Conference Proceeding Abstract
Author	Karner M.
Conference	20th Conference on Natural Language Processing (KONVENS 2024),
Pages	339 - 348
Link	Publication

Title	Version Control for Speech Corpora
Type	Conference Proceeding Abstract
Author	Boehm M.
Conference	20th Conference on Natural Language Processing (KONVENS 2024)
Pages	303 - 308
Link	Publication

Title	Towards causal data science for non-independent data
Type	Postdoctoral Thesis
Author	Roman Kern

Title	Using Kaldi for Automatic Speech Recognition of Conversational Austrian German
DOI	10.48550/arxiv.2301.06475
Type	Preprint
Author	Linke J

Title	Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition
DOI	10.3390/info14020137
Type	Journal Article
Author	Gabler P
Journal	Information
Pages	137
Link	Publication

Title	On Disfluency and Non-lexical Sound Labeling for End-to-end Automatic Speech Recognition
DOI	10.21437/interspeech.2024-2157
Type	Conference Proceeding Abstract
Author	Meng Y
Pages	1270-1274

Title	Uncertainty prediction for prominence classification with chroma features
DOI	10.1109/icassp49660.2025.10887992
Type	Conference Proceeding Abstract
Author	Linke J
Pages	1-5

Title	Speaker interpolation based data augmentation for automatic speech recognition
Type	Conference Proceeding Abstract
Author	Kerle L.
Conference	Proceedings of the 20th International Congress of Phonetic Sciences - ICPhS 2023
Pages	3126 - 3130
Link	Publication

Title	creapy: A Python-based tool for the detection of creak in conversational speech
Type	Conference Proceeding Abstract
Author	Paierl M
Conference	20th International Congress on Phonetic Sciences (ICPhS)
Pages	1716-1720
Link	Publication

Title	Points of maximum grammatical control - The prosody of a turn-holding practice
Type	Conference Proceeding Abstract
Author	Kelterer A
Conference	20th International Congress on Phonetic Sciences (ICPhS)
Pages	3467-3471
Link	Publication

Title	Single Channel Source Separation in the Wild -- Conversational Speech in Realistic Environments
Type	Conference Proceeding Abstract
Author	Berger E.
Conference	ITG-Fachbericht 312: Speech Communication
Pages	96 - 100
Link	Publication

Title	Using word-level features for prosodic prominence detection in conversational speech
Type	Conference Proceeding Abstract
Author	Kubin G.
Conference	Proceedings of the 20th International Congress of Phonetic Sciences - ICPhS 2023
Pages	3101 - 3105
Link	Publication

Title	10 Years of GRASS development: Experiences from annotating a large corpus of conversational Austrian German
Type	Conference Proceeding Abstract
Author	Kelterer A.
Conference	Österreichische Linguistiktagung : Austrian Meeting on Digital Linguistics: Recent Developments in Austria - Institut fuer Linguistik, Graz, Austria
Link	Publication

Title	Speechcake: Version control for speech corpora
Type	Other
Author	Dumitru V.A.
Link	Publication

Title	Prosodic cues to agreement and disagreement prefaces in Austrian German conversations
DOI	10.21437/tai.2021-22
Type	Conference Proceeding Abstract
Author	Kelterer A
Pages	107-111

Title	An Analysis of Prosodic Prominence Cues to Information Structure in Egyptian Arabic
DOI	10.21437/interspeech.2020-2322
Type	Conference Proceeding Abstract
Author	Kelterer A
Pages	1883-1887

Title	Automatic Speech Segmentation using KALDI
Type	Other
Author	Wasserfall S.
Link	Publication

Title	Information-theoretic approaches in model reduction and machine learning
Type	Postdoctoral Thesis
Author	Bernhard Geiger

Title	An analysis of prosodic boundaries across speaking styles in two varieties of German
DOI	10.1016/j.specom.2022.05.002
Type	Journal Article
Author	Ludusan B
Journal	Speech Communication
Pages	93-106

Title	To laugh or not to laugh? The use of laughter to mark discourse structure
DOI	10.18653/v1/2022.sigdial-1.8
Type	Conference Proceeding Abstract
Author	Ludusan B
Pages	76-82

Title	How prosody affects ASR performance in conversational Austrian German
DOI	10.21437/speechprosody.2022-40
Type	Conference Proceeding Abstract
Author	Schuppler B
Pages	195-199

Title	Analyzing the different meanings of laughter in conversational speech
Type	Other
Author	Schmallegger E.
Link	Publication

Title	Speaker interpolation based data augmentation for Automatic Speech Recognition
Type	Other
Author	Kerle L.
Link	Publication

Title	Text Complexity in the Digital Humanities - A Case Study on 18th Century Periodicals
Type	Other
Author	Geiger B
Link	Publication

Title	Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker
Type	Conference Proceeding Abstract
Author	Linke J.
Conference	Interspeech 2024

Title	What the Filler? Both ASR Systems and Humans Struggle More With Other Kinds of Disfluencies Than With Filler Particles
Type	Conference Proceeding Abstract
Author	Eckert L
Conference	Interspeech 2025

Title	Prominence-aware automatic speech recognition for conversational speech
Type	Conference Proceeding Abstract
Author	Kubin G.
Conference	Interspeech 2024

Title	Continuous prediction of backchannel timing for human-robot interaction
Type	Conference Proceeding Abstract
Author	Hagmueller M.
Conference	Interspeech 2024

Title	(When) Does it harm to be incomplete? Human and automatic speech recognition of syntactically disfluent structures
Type	Journal Article
Author	Lennkh S
Journal	Speech Communication

Title	Developing an Annotation System for Communicative Functions for a Cross-Layer ASR System
Type	Conference Proceeding Abstract
Author	Kelterer A.
Conference	ESSLLI Workshop "Integrating Perspectives on Discourse Annotation" (DiscAnn)
Link	Publication

Title	Towards automatic annotation of prosodic prominence levels in Austrian German
DOI	10.21437/speechprosody.2020-204
Type	Conference Proceeding Abstract
Author	Linke J
Pages	1000-1004

Methods & Materials

Public Access
Title	Tool for Analysis of Self-supervised Speech Representations
DOI	10.21437/interspeech.2023-951
Type	Improvements to research infrastructure
Link	Link

Software

Title	pvlex
Link	Link

Title	speechcake
Link	Link

Title	creapy
Link	Link

Disseminations

Title	MINKT Labor a super science space for children
Type	Participation in an activity, workshop or similar
Link	Link

Title	Newsaper Article on AI for Dialect: Klipp Das Magazin
Type	A press release, press conference or response to a media enquiry/interview
Link	Link

Title	Speech AI for Styrian Dialect on "Radio Steiermark"
Type	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Link	Link

Title	Podcast in Oe1 DIGITAL Leben
Type	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Link	Link

Title	Invited talk at Bielefeld University
Type	A talk or presentation

Title	Newsaper Article on AI for Austrian German: Der Standard
Type	A press release, press conference or response to a media enquiry/interview
Link	Link

Title	GEED Graz Electrical Engineering Days
Type	Participation in an open day or visit at my research institution
Link	Link

Title	Special Session at "Phonetics and Phonology in Europe" 2021
Type	Participation in an activity, workshop or similar
Link	Link

Title	Initiation of the "Graz-Vienna Speechworkshop" Series
Type	Participation in an activity, workshop or similar

Title	Newspaper Article in Kleine Zeitung on KI for Styrian Dialect
Type	A press release, press conference or response to a media enquiry/interview
Link	Link

Title	Podcast about our work on Conversational Speech
Type	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Link	Link

Title	AI and Dialect? Radio Interview in Oe3
Type	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Link	Link

Scientific Awards

Title	Guest Professorship teaching the course: Speaker charisma: Analysis and training of acoustic-prosodic features within a sex-sensitive framework
Type	Attracted visiting staff or user to your research group
Level of Recognition	Regional (any country)

Title	Invited Speaker at "Ringvorlesung: Vielfalt im Zentrum der Forschung"
Type	Personally asked as a key note speaker to a conference
Level of Recognition	Regional (any country)

Title	Jury member of "Das österreichische Wort des Jahres"
Type	Prestigious/honorary/advisory position to an external body
Level of Recognition	National (any country)

Title	Invited participant to the student-meets experts event at DAGA 47. Jahrestagung fuer Akustik 2021
Type	Personally asked as a key note speaker to a conference
Level of Recognition	Continental/International

Title	Guest Professorship teaching the course: Experimental Methods in Phonetics
Type	Attracted visiting staff or user to your research group
Level of Recognition	Regional (any country)

Title	Speech Communication Editor
Type	Appointed as the editor/advisor to a journal or book series
Level of Recognition	Continental/International

Fundings

Title	Reisekostenzuschuss für Interspeech 2023
Type	Studentship
Start of Funding	2023
Funder	Austrian Research Association

Title	Förderungsbeitrag für die Tagungsteilnahme
Type	Studentship
Start of Funding	2023
Funder	Land Steiermark

Title	ERASMUS+ Short-Term Mobility WASP Summer School 2024
Type	Studentship
Start of Funding	2024
Funder	ERASMUS+ Short-Term Mobility International Office - Welcome Center, TU Graz

Title	Prof. Margaret Zellers - teaching
Type	Fellowship
Start of Funding	2024
Funder	University of Graz

Title	Doktoratsfertigstellungsstipendium
Type	Research grant (including intramural programme)
Start of Funding	2024
Funder	Literar Mechana

Title	ICPhS 2023 Reisekostenübernahme Land Steiermark
Type	Studentship
Start of Funding	2023
Funder	Land Steiermark

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Cross-layer language models for conversational speech

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Cross-layer language models for conversational speech

Disciplines

Keywords

Research Output