Voice conversion for the processing of pathological speech
Voice conversion for the processing of pathological speech
Disciplines
Electrical Engineering, Electronics, Information Engineering (30%); Computer Sciences (30%); Clinical Medicine (40%)
Keywords
-
Voice Conversion,
Outcome Prediction,
Laryngectomy,
Deep Learning,
Speech Pathology,
Voice Disorder
Speaking is something many of us take for grantedbut for people with speech impairments, even simple conversations can be a daily challenge. They may struggle to express themselves, feel frustrated when misunderstood, and often experience social isolation. This affects not only their relationships but also their job prospects and overall well- being. Our research project aims to change that by developing advanced speech technology that supports clinical decision-making and helps people with speech impairments improve how they sound when speaking. The technology we focus on is called voice conversion (VC). Simply put, VC allows a speech recording to be modified to sound as if spoken by someone else, while the words stay the same. Many people might know this technology from deep fake videos, where voices are manipulated to imitate celebrities or public figuresoften raising ethical concerns about misuse, such as fraud or misinformation. In our project, however, we use VC responsibly and positively: to give people with speech impairments a voice that sounds more natural, expressive, and easier to understand. We are working toward two key goals. Our first goal is to help people whose speech remains altered or artificial even after treatment. This includes individuals using devices like an electrolarynx, which produces a mechanical voice, or those using other substitution voices such as a speaking valve voice, where air is redirected from the windpipe to the food pipe through a small valve, and burp voice, where air is swallowed and released it in a controlled way. These methods may sound strained, effortful, and lack natural melody. People living with chronic hoarseness also face similar challenges, as their speech may sound rough, breathy, or fatigued. In all these cases, speech may lack tone, emotion, or clarity, making it harder for others to understand or connect emotionally. Imagine having a heartfelt phone call, only to sound robotic, monotone, or hard to understandthats the experience many of these individuals face daily. We are developing technology that not only works with speech audio but in real-time also integrates biosensors and videos capturing facial expressions and gestures, helping restore not just the sound of the voice, but also its emotional richness and personal characterwhether in face-to-face conversations or phone calls. As a second goal, we plan to use VC to predict how a persons voice might sound after medical treatment for a speech disorder. For example, someone might wonder how their voice will change after surgery or therapy. By training artificial intelligence to simulate these changes, we aim to give doctors and patients a realistic preview of the likely outcomes, supporting more informed treatment decisions. Our mission is to give people with speech impairments back their voicenot just as a communication tool, but as a full, expressive part of who they are.
- Barbara Schuppler, Technische Universität Graz , national collaboration partner
- Franz Pernkopf, Technische Universität Graz , national collaboration partner
- Martin Hagmüller, Technische Universität Graz , associated research partner
- Tomoki Toda, University Nagoya - Japan