Objective differentiation of dysphonic voice quality types
Objective differentiation of dysphonic voice quality types
Disciplines
Electrical Engineering, Electronics, Information Engineering (60%); Clinical Medicine (40%)
Keywords
-
Voice disorders,
Dysphonia,
Voice assessment,
Laryngeal high-speed videos,
Speech processing,
Acoustics
Voice problems are not completely understood yet, partly because conventional methods for clinical examination are limited. Aim of this research project is to create computerized methods which automatically recognize and identify reliably voice problems in microphone recordings of patients voices. Videos of the throat and microphone recordings of 230 patients with voice problems are obtained. Super slow motion videos with 4000 images per second are used, because the vocal folds, which are located in the larynx, normally vibrate very fast during voice production, more often than 100 times per second. To visualize irregularities, videos showing two seconds of vocal fold vibration are slowed down to a playing time of five minutes. The researchers investigate how the vocal fold vibration relates to the sound of the voice. In particular, three voice types are payed attention to. First, a so called vocal fry is a very low pitched type of voice. This voice type may be evocative of the Strohbass singing register. Sometimes, it is possible to hear individual pulses of the vocal folds in these voices. Second, extrapulsed voices are investigated. These may be compared to a common phenomenon in human heart beating, i.e., extra systoles. Just as a human heart may sometimes stumble every now and then, extra pulses may occur in the voice. Frequently occurring extra pulses may be a sign of a voice problem, may be perceived to sound raspy. Third, phase differences between the left and the right vocal fold may occur in voice disorders. To understanding phase differences, one could try to bounce two basketballs with the left and the right hand simultaneously, and notice that the balls will hardly ever hit the ground at exactly same time, and two separate bouncing sounds will be audible every time the balls hit the ground. Such timing differences also occur in vocal folds, and are a sign of a voice problem. Interestingly, instead of hearing the two vocal folds separately, listeners have reported to hear a certain type of rumbling in voices with phase differences. This is mainly because the frequency of the vocal folds is much higher than the frequency of basketballs. A diverse approach is chosen to investigate how the vocal folds vibrate, and how abnormal voices sound to listeners. The project comprises computer-based science, patient data, and auditory experiments. The results may be applied for improving clinical voice quality assessment.
Verbal communication is one of the most significant human achievements, relying on the functioning of the voice box, particularly the vibration of the vocal cords. This vibration gives the voice its tone, similar to how a vibrating string gives a guitar its sound. Voice disorders may disrupt this normal vibration, making speaking difficult. Clinicians use cameras to examine patient's vocal folds and listening to the nuances of the voice. However, vocal cords vibrate rapidly, making them difficult to see, and both visual and auditory assessments can be subjective. This project aimed to address these challenges through innovative technology and methods. The innovative techniques and significant findings are the following. First, researchers used high-speed cameras to slow down voice recordings by a factor of 160. This allowed for observation of minute details otherwise missed, especially features of irregularity. Second, microphone recordings were analyzed to understand how vocal sounds relate to vibrations. This led to improved understanding of how different vocal fold conditions affect voice sound. Third, the project involved advanced simulations of vocal fold vibrations and the auditory process, further pinpointing critical features of phonatory dysfunctions. Finally, the project leveraged artificial intelligence and machine learning (AIML). In particular, recent advances in speech technology (cf. Siri, Alexa, etc.) were adapted to create more realistic simulations of pathological voices, and AIML mimicking human vision was used to automate video analysis, reducing the need for manual review and facilitating the implementation of slow-motion video analysis in clinical settings. Specific voice types were investigated. First, diplophonia is a condition where different regions of the vocal folds vibrate at distinct rates, causing a doubled voice. Software was developed to measure the frequency of this occurrence in speech objectively. Second, vocal fry and creaky voice are characterized by separated sound pulses, similar to the sounds of a frying pan, a creaky door, or the making of popcorn. This research clarified that such voices either have a low vibration rate, or other disturbances creating only the illusion of pulse separation. Third, researchers investigated timing differences between vocal fold regions and extra pulses similar to extra systoles in heartbeats. In summary, this research has greatly enhanced our understanding of vocal fold mechanics and voice perception. By combining high-speed video technology, computer simulations, and AI, the project tackles key challenges in diagnosing and treating voice disorders. The findings have the potential to transform clinical practices, providing more accurate and reliable diagnostics via digital twinning and decision support, ultimately leading to better treatment outcomes and an improved quality of life for individuals with voice problems.
Research Output
- 21 Citations
- 29 Publications
- 2 Methods & Materials
- 6 Scientific Awards
-
2024
Title Auditory perception of impulsiveness and tonality in vocal fry DOI 10.61782/fa.2023.0426 Type Conference Proceeding Abstract Author Devaraj V Pages 4719-4724 -
2021
Title Modelling of Amplitude Modulated Vocal Fry Glottal Area Waveforms Using an Analysis-by-Synthesis Approach DOI 10.3390/app11051990 Type Journal Article Author Devaraj V Journal Applied Sciences Pages 1990 Link Publication -
2021
Title Fitting synthetic to clinical kymographic images for deriving kinematic vocal fold parameters: Application to left-right vibratory phase differences DOI 10.1016/j.bspc.2020.102253 Type Journal Article Author Bulusu S Journal Biomedical Signal Processing and Control Pages 102253 Link Publication -
2021
Title Modelling sagittal and vertical phase differences in a lumped and distributed elements vocal fold model DOI 10.1016/j.bspc.2020.102309 Type Journal Article Author Drioli C Journal Biomedical Signal Processing and Control Pages 102309 Link Publication -
2021
Title Synthesis and Analysis-By-Synthesis of Modulated Diplophonic Glottal Area Waveforms DOI 10.1109/taslp.2021.3053387 Type Journal Article Author Aichinger P Journal IEEE/ACM Transactions on Audio, Speech, and Language Processing Pages 914-926 Link Publication -
2019
Title Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices DOI 10.1016/j.bspc.2019.01.007 Type Journal Article Author Aichinger P Journal Biomedical Signal Processing and Control Pages 158-167 Link Publication -
2019
Title Analysis and Synthesis of Vocal Flutter and Vocal Jitter DOI 10.21437/interspeech.2019-1998 Type Conference Proceeding Abstract Author Schoentgen J Pages 2518-2522 -
2024
Title Deep Learning-Based Detection of Glottis Segmentation Failures. DOI 10.3390/bioengineering11050443 Type Journal Article Author Aichinger P Journal Bioengineering (Basel, Switzerland) -
2018
Title Detection of Diplophonation in Audio Recordings of German Standard Text Readings DOI 10.1016/j.jvoice.2018.06.009 Type Journal Article Author Aichinger P Journal Journal of Voice -
2019
Title Tracking of multiple fundamental frequencies in standard text readings of diplophonic speakers Type Conference Proceeding Abstract Author Aichinger P Conference International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications Pages 125-128 Link Publication -
2019
Title Perturbation of cycle lengths and cycle peak amplitudes in diplophonic voices Type Conference Proceeding Abstract Author Aichinger P Conference International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications Pages 121-124 Link Publication -
2019
Title A glottal area waveform model for multi-pulsed vocal fry Type Conference Proceeding Abstract Author Aichinger P Conference International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications Pages 133-136 Link Publication -
2019
Title Extracting kinematic vocal fold parameters from videokymograms via simulation of clinical data Type Conference Proceeding Abstract Author Bulusu S Conference International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications Pages 141-144 Link Publication -
2019
Title Modelling longitudinal phase differences in a lumped and distributed elements vocal fold model Type Conference Proceeding Abstract Author Aichinger P Conference International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications Pages 137-140 Link Publication -
2019
Title Analysis and synthesis of vocal flutter and vocal jitter Type Conference Proceeding Abstract Author Aichinger P Conference Annual Conference of the International Speech Communication Association, INTERSPEECH Pages 2518-2522 -
2019
Title Aerodynamics and Lumped-Masses Combined with Delay Lines for Modeling Vertical and Anterior-Posterior Phase Differences in Pathological Vocal Fold Vibration Type Conference Proceeding Abstract Author Aichinger P Conference Annual Conference of the International Speech Communication Association, INTERSPEECH Pages 2503-2507 -
2019
Title Characterization of turbulence noise in breathy human phonation Type Conference Proceeding Abstract Author Aichinger P Conference ICA 2019 and EAA Euroregio Pages 3139-3146 -
2021
Title Neural network based estimation of vocal fold kinematic parameters from digital videokymograms Type Conference Proceeding Abstract Author Bulusu S Conference Advances in Quantitative Laryngology, Voice and Speech Research (AQL) -
2021
Title Artificial high-speed videos of normal and dysphonic vocal fold vibration Type Conference Proceeding Abstract Author Aichinger P Conference International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications Pages 93-96 Link Publication -
2019
Title Characterization of turbulence noise in breathy human phonation DOI 10.18154/rwth-conv-239381 Type Other Author Aichinger P Link Publication -
2022
Title A Modelling Study on the Comparison of Predicted Auditory Nerve Firing Rates for the Personalized Indication of Cochlear Implantation DOI 10.3390/app12105168 Type Journal Article Author Aichinger P Journal Applied Sciences Pages 5168 Link Publication -
2022
Title Simulated Laryngeal High-Speed Videos for the Study of Normal and Dysphonic Vocal Fold Vibration. DOI 10.1044/2022_jslhr-21-00673 Type Journal Article Author Aichinger P Journal Journal of speech, language, and hearing research : JSLHR Pages 2431-2445 Link Publication -
2021
Title Fitting a biomechanical model of the folds to oscillatory patterns with AP and LR asymmetries observed in high speed video data Type Conference Proceeding Abstract Author Aichinger P Conference International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications Pages 89-92 Link Publication -
2021
Title Objective detection of amplitude modulation in glottal area waveforms Type Conference Proceeding Abstract Author Aichinger P Conference International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications Pages 15-18 Link Publication -
2023
Title Performance evaluation of 3D neural networks applied to high-speed videos for glottis segmentation in difficult cases Type Conference Proceeding Abstract Author Aichinger P Conference International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications Pages 87-90 Link Publication -
2023
Title Sinusoidal Modelling of Vocal Fold Medial Surface Vibration Trajectories Type Conference Proceeding Abstract Author Aichinger P Conference Conference on Advances in Quantitative Laryngology, Voice and Speech Research (AQL) -
2023
Title Kinematics of Vocal Fold Vibration in Double-Pulsed Phonation Type Conference Proceeding Abstract Author Aichinger P Conference Annual Symposium of the Voice Foundation -
2023
Title Biomechanics and acoustics of voice production Type PhD Thesis Author Lehoux, Sarah -
2023
Title Auditory Perception of Impulsiveness and Tonality in Vocal Fry DOI 10.3390/app13074186 Type Journal Article Author Devaraj V Journal Applied Sciences
-
2022
Title Synthesizer for videos of vocal fold vibration Type Model of mechanisms or symptoms - human Public Access -
2019
Title Diplophonia rate (DR) extractor Type Physiological assessment or outcome measure Public Access
-
2023
Title Becoming Ap.Professor Type Honorary Degree Level of Recognition Regional (any country) -
2023
Title Senior member of IEEE Type Awarded honorary membership, or a fellowship, of a learned society Level of Recognition Continental/International -
2023
Title Associate Editor for IEEE/ACM Transactions on Audio Speech and Language Processing Type Appointed as the editor/advisor to a journal or book series Level of Recognition Continental/International -
2022
Title Sarah Lehoux visited the lab for one week 2022 Type Attracted visiting staff or user to your research group Level of Recognition Continental/International -
2022
Title Guest editor for Biomedical Signal Processing and Control Type Appointed as the editor/advisor to a journal or book series Level of Recognition Continental/International -
2018
Title Attracted Jean Schoentgen to temporally join the lab in Vienna Type Attracted visiting staff or user to your research group Level of Recognition Continental/International