Cross-layer language models for conversational speech
Cross-layer language models for conversational speech
Disciplines
Electrical Engineering, Electronics, Information Engineering (40%); Linguistics and Literature (60%)
Keywords
-
Conversational Speech,
Automatic Speech Recognition,
Language Modeling,
Speech Perception,
Prosody,
Communicative Functions
Whereas speech scientists have focused on carefully pronounced speech for a long time, the interest has more and more shifted to language as it occurs in natural conversations. This has two reasons. From a technological point of view, there is an increasing demand for social robots, which in order to become more interactional and social also need to use language naturally. Second, linguists became more interested in natural conversations, as they reveal additional insights to controlled experiments with respect to how speech is processed in our brain. In this project, we aim at improving the automatic recognition of conversational speech, at increasing our knowledge about the human production and perception of conversational speech, and to increase our knowledge and resources for conversational Austrian German. On the basis of conversational speech and chat corpora from German and Austrian speakers, we develop cross-layered language models which include acoustic and semantic contextual information the way humans do. These models will be informed by quantitative phonetic corpus studies and tested in ASR and speech perception experiments. For conducting the linguistic studies, speech technology will be used for creating automatic annotations, acoustic feature extraction and data analysis. Gained linguistic knowledge will then again be incorporated into the language models. This approach requires an interdisciplinary team (engineers and linguists) that works closely together. The PI Dr. Barbara Schuppler (Graz University of Technology) is a young interdisciplinary speech scientist who has shown in two previous FWF projects that her cross-layer principle reaches good results for pronunciation and prosody modelling. The project proposed gives her the opportunity to expand the cross-layer concept to language models, and to establish a research group on conversational speech in Austria. The national partners Prof. Dina El Zarka (Department of Linguistics, University of Graz) and Dr. Roman Kern (Know-Center GmbH) bring long lasting experience to the project. Together, they cover the disciplines speech technology, linguistics, phonetics and natural language processing.
In the last decade, conversational speech has received a lot of attention among speech scientists. On the one hand, accurate automatic speech recognition (ASR) systems are essential for conversational systems and social robots, as these shall become more interactional and social rather than solely transactional. On the other hand, linguists study natural conversations, as these reveal additional insights to controlled experiments with respect to how speech processing works. The works of this project investigate conversational speech to enhance our linguistic knowledge of conversational Austrian German and to use this knowledge to improve ASR systems. For this purpose, the GRASS corpus, a large-scale database of Austrian German conversations, has been annotated with respect to communicative functions annotations suitable for a newly introduced method for quantitative analysis of conversational dynamics. Our work demonstrates that prosodic variation in conversational speech is systematic and linked to semantic and pragmatic context. But how sensitive are ASR systems to prosodic cues and conversational context? Our work suggests that integrating both data-driven and theory-driven components, including linguistic knowledge, can improve ASR, particularly for short utterances. When comparing how ASR systems transcribe conversational speech with how humans transcribe the same utterances, we find that they struggle with the same characteristics of conversational speech (e.g., disfluent sentences, dialectal pronunciation, fast speech rate), but just to a different degree. Finally, the project delivers valuable methods for speech technologists working with low-resource languages and dialects and for working with small datasets of high degrees of variation (e.g., pathological speech, child speech).
- Technische Universität Graz - 50%
- Universität Graz - 22%
- Technische Universität Graz - 28%
- Roman Kern, Technische Universität Graz , associated research partner
- Dina El Zarka, Universität Graz , associated research partner
Research Output
- 5 Citations
- 44 Publications
- 1 Methods & Materials
- 3 Software
- 12 Disseminations
- 6 Scientific Awards
- 6 Fundings
-
0
Title (When) Does it harm to be incomplete? Human and automatic speech recognition of syntactically disfluent structures Type Journal Article Author Lennkh S Journal Speech Communication -
0
Title What the Filler? Both ASR Systems and Humans Struggle More With Other Kinds of Disfluencies Than With Filler Particles Type Conference Proceeding Abstract Author Eckert L Conference Interspeech 2025 -
0
Title Prominence-aware automatic speech recognition for conversational speech Type Conference Proceeding Abstract Author Kubin G. Conference Interspeech 2024 -
0
Title Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker Type Conference Proceeding Abstract Author Linke J. Conference Interspeech 2024 -
0
Title Continuous prediction of backchannel timing for human-robot interaction Type Conference Proceeding Abstract Author Hagmueller M. Conference Interspeech 2024 -
2022
Title Information-theoretic approaches in model reduction and machine learning Type Postdoctoral Thesis Author Bernhard Geiger -
2025
Title Slicer - A Tool for Efficient Stimuli Extraction from Large Speech Corpora Type Conference Proceeding Abstract Author Eckert L Conference Forum Acusticum Euronoise 2025 -
2025
Title Uncertainty prediction for prominence classification with chroma features Type Conference Proceeding Abstract Author Linke J. Conference ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pages 1 - 5 Link Publication -
2025
Title Uncertainty prediction for prominence classification with chroma features Type Conference Proceeding Abstract Author Linke J. Conference Event 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP -
2025
Title Turn-taking annotation for quantitative and qualitative analyses of conversation Type Other Author Kelterer A. -
2025
Title Turn-taking annotation for quantitative and qualitative analyses of conversation Type Other Author Kelterer A. Pages 1 - 41 Link Publication -
2024
Title On the Role of Priors in Bayesian Causal Learning DOI 10.1109/tai.2024.3522867 Type Journal Article Author Geiger B Journal IEEE Transactions on Artificial Intelligence Pages 1439-1445 Link Publication -
2024
Title On Disfluency and Non-lexical Sound Labeling for End-to-end Automatic Speech Recognition DOI 10.21437/interspeech.2024-2157 Type Conference Proceeding Abstract Author Meng Y Pages 1270-1274 -
2024
Title Towards causal data science for non-independent data Type Postdoctoral Thesis Author Roman Kern -
2023
Title Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition DOI 10.3390/info14020137 Type Journal Article Author Gabler P Journal Information Pages 137 Link Publication -
2023
Title Using Kaldi for Automatic Speech Recognition of Conversational Austrian German DOI 10.48550/arxiv.2301.06475 Type Preprint Author Linke J -
2025
Title Uncertainty prediction for prominence classification with chroma features DOI 10.1109/icassp49660.2025.10887992 Type Conference Proceeding Abstract Author Linke J Pages 1-5 -
2025
Title Cross-layer models for conversational speech Type Postdoctoral Thesis Author Barbara Schuppler -
2025
Title What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures DOI 10.1016/j.csl.2024.101738 Type Journal Article Author Linke J Journal Computer Speech & Language Pages 101738 Link Publication -
2025
Title What's so complex about conversational speech? Prosodic Prominence and Speech Recognition Challenges Type PhD Thesis Author Julian Linke -
2022
Title Analyzing the different meanings of laughter in conversational speech Type Other Author Schmallegger E. Link Publication -
2022
Title Speaker interpolation based data augmentation for Automatic Speech Recognition Type Other Author Kerle L. Link Publication -
2022
Title Text Complexity in the Digital Humanities - A Case Study on 18th Century Periodicals Type Other Author Geiger B Link Publication -
2024
Title Breath sounds and their relationship to turn-taking in conversational speech Type Other Author Menrath A. Link Publication -
2024
Title Modelling Bachchannels for Human-Robot Interaction Type Other Author Paierl M. Link Publication -
2024
Title Towards Improving ASR Outputs of Spontaneous Speech with LLMs Type Conference Proceeding Abstract Author Karner M. Conference 20th Conference on Natural Language Processing (KONVENS 2024), Pages 339 - 348 Link Publication -
2024
Title Version Control for Speech Corpora Type Conference Proceeding Abstract Author Boehm M. Conference 20th Conference on Natural Language Processing (KONVENS 2024) Pages 303 - 308 Link Publication -
2023
Title creapy: A Python-based tool for the detection of creak in conversational speech Type Conference Proceeding Abstract Author Paierl M Conference 20th International Congress on Phonetic Sciences (ICPhS) Pages 1716-1720 Link Publication -
2023
Title Points of maximum grammatical control - The prosody of a turn-holding practice Type Conference Proceeding Abstract Author Kelterer A Conference 20th International Congress on Phonetic Sciences (ICPhS) Pages 3467-3471 Link Publication -
2023
Title Speaker interpolation based data augmentation for automatic speech recognition Type Conference Proceeding Abstract Author Kerle L. Conference Proceedings of the 20th International Congress of Phonetic Sciences - ICPhS 2023 Pages 3126 - 3130 Link Publication -
2023
Title Speechcake: Version control for speech corpora Type Other Author Dumitru V.A. Link Publication -
2023
Title 10 Years of GRASS development: Experiences from annotating a large corpus of conversational Austrian German Type Conference Proceeding Abstract Author Kelterer A. Conference Österreichische Linguistiktagung : Austrian Meeting on Digital Linguistics: Recent Developments in Austria - Institut fuer Linguistik, Graz, Austria Link Publication -
2023
Title Using word-level features for prosodic prominence detection in conversational speech Type Conference Proceeding Abstract Author Kubin G. Conference Proceedings of the 20th International Congress of Phonetic Sciences - ICPhS 2023 Pages 3101 - 3105 Link Publication -
2023
Title Single Channel Source Separation in the Wild -- Conversational Speech in Realistic Environments Type Conference Proceeding Abstract Author Berger E. Conference ITG-Fachbericht 312: Speech Communication Pages 96 - 100 Link Publication -
2023
Title What do self-supervised speech representations encode? An analysis of languages, varieties, speaking styles and speakers DOI 10.21437/interspeech.2023-951 Type Conference Proceeding Abstract Author Kadar M Pages 5371-5375 -
2023
Title (Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-prosodic Features DOI 10.21437/interspeech.2023-1538 Type Conference Proceeding Abstract Author Kelterer A Pages 4768-4772 -
2023
Title Exploring Graph Theory Methods For the Analysis of Pronunciation Variation in Spontaneous Speech DOI 10.21437/interspeech.2023-1398 Type Conference Proceeding Abstract Author Geiger B Pages 596-600 -
2022
Title An analysis of prosodic boundaries across speaking styles in two varieties of German DOI 10.1016/j.specom.2022.05.002 Type Journal Article Author Ludusan B Journal Speech Communication Pages 93-106 -
2022
Title How prosody affects ASR performance in conversational Austrian German DOI 10.21437/speechprosody.2022-40 Type Conference Proceeding Abstract Author Schuppler B Pages 195-199 -
2022
Title To laugh or not to laugh? The use of laughter to mark discourse structure DOI 10.18653/v1/2022.sigdial-1.8 Type Conference Proceeding Abstract Author Ludusan B Pages 76-82 -
2021
Title Developing an Annotation System for Communicative Functions for a Cross-Layer ASR System Type Conference Proceeding Abstract Author Kelterer A. Conference ESSLLI Workshop "Integrating Perspectives on Discourse Annotation" (DiscAnn) Link Publication -
2021
Title Prosodic cues to agreement and disagreement prefaces in Austrian German conversations DOI 10.21437/tai.2021-22 Type Conference Proceeding Abstract Author Kelterer A Pages 107-111 -
2020
Title Towards automatic annotation of prosodic prominence levels in Austrian German DOI 10.21437/speechprosody.2020-204 Type Conference Proceeding Abstract Author Linke J Pages 1000-1004 -
2020
Title Automatic Speech Segmentation using KALDI Type Other Author Wasserfall S. Link Publication -
2020
Title An Analysis of Prosodic Prominence Cues to Information Structure in Egyptian Arabic DOI 10.21437/interspeech.2020-2322 Type Conference Proceeding Abstract Author Kelterer A Pages 1883-1887
-
2023
Link
Title Tool for Analysis of Self-supervised Speech Representations DOI 10.21437/interspeech.2023-951 Type Improvements to research infrastructure Public Access Link Link
-
2024
Link
Title Newsaper Article on AI for Austrian German: Der Standard Type A press release, press conference or response to a media enquiry/interview Link Link -
2023
Link
Title GEED Graz Electrical Engineering Days Type Participation in an open day or visit at my research institution Link Link -
2021
Link
Title Special Session at "Phonetics and Phonology in Europe" 2021 Type Participation in an activity, workshop or similar Link Link -
2021
Title Initiation of the "Graz-Vienna Speechworkshop" Series Type Participation in an activity, workshop or similar -
2025
Link
Title Podcast about our work on Conversational Speech Type A broadcast e.g. TV/radio/film/podcast (other than news/press) Link Link -
2025
Link
Title Newspaper Article in Kleine Zeitung on KI for Styrian Dialect Type A press release, press conference or response to a media enquiry/interview Link Link -
2023
Link
Title MINKT Labor a super science space for children Type Participation in an activity, workshop or similar Link Link -
2025
Link
Title Newsaper Article on AI for Dialect: Klipp Das Magazin Type A press release, press conference or response to a media enquiry/interview Link Link -
2025
Link
Title AI and Dialect? Radio Interview in Oe3 Type A broadcast e.g. TV/radio/film/podcast (other than news/press) Link Link -
2025
Link
Title Speech AI for Styrian Dialect on "Radio Steiermark" Type A broadcast e.g. TV/radio/film/podcast (other than news/press) Link Link -
2025
Link
Title Podcast in Oe1 DIGITAL Leben Type A broadcast e.g. TV/radio/film/podcast (other than news/press) Link Link -
2024
Title Invited talk at Bielefeld University Type A talk or presentation
-
2023
Title Jury member of "Das österreichische Wort des Jahres" Type Prestigious/honorary/advisory position to an external body Level of Recognition National (any country) -
2023
Title Guest Professorship teaching the course: Speaker charisma: Analysis and training of acoustic-prosodic features within a sex-sensitive framework Type Attracted visiting staff or user to your research group Level of Recognition Regional (any country) -
2023
Title Invited Speaker at "Ringvorlesung: Vielfalt im Zentrum der Forschung" Type Personally asked as a key note speaker to a conference Level of Recognition Regional (any country) -
2021
Title Invited participant to the student-meets experts event at DAGA 47. Jahrestagung fuer Akustik 2021 Type Personally asked as a key note speaker to a conference Level of Recognition Continental/International -
2019
Title Guest Professorship teaching the course: Experimental Methods in Phonetics Type Attracted visiting staff or user to your research group Level of Recognition Regional (any country) -
2019
Title Speech Communication Editor Type Appointed as the editor/advisor to a journal or book series Level of Recognition Continental/International
-
2024
Title ERASMUS+ Short-Term Mobility WASP Summer School 2024 Type Studentship Start of Funding 2024 Funder ERASMUS+ Short-Term Mobility International Office - Welcome Center, TU Graz -
2023
Title ICPhS 2023 Reisekostenübernahme Land Steiermark Type Studentship Start of Funding 2023 Funder Land Steiermark -
2024
Title Prof. Margaret Zellers - teaching Type Fellowship Start of Funding 2024 Funder University of Graz -
2024
Title Doktoratsfertigstellungsstipendium Type Research grant (including intramural programme) Start of Funding 2024 Funder Literar Mechana -
2023
Title Reisekostenzuschuss für Interspeech 2023 Type Studentship Start of Funding 2023 Funder Austrian Research Association -
2023
Title Förderungsbeitrag für die Tagungsteilnahme Type Studentship Start of Funding 2023 Funder Land Steiermark