PhD defence: Automatic Speech Recognition in noisy environments and with heavy accents

Date of news: 9 September 2021

Automatic Speech Recognition (ASR) has become better and better. However, ASR performance still has serious difficulties when speech is heavily accented or in noisy environments. Language technologist Yang Sun investigated novel ways to combine existing models and systems to make ASR more robust. He will defend his dissertation on 13 September.

Busy street in Hongkong

Automatic speech recognition (ASR) is the process that maps speech signals to a sequence of words. While ASR has become reasonably accurate in quiet environments and with standard pronunciations, performance degrades rapidly when speech is heavily accented or in noisy environments. This is problematic, since it is rather rare to have a noise-free environment, except in laboratories or recording studios. Moreover, the diversity of noises is large.

Besides background noises, ASR also struggles with characteristics of individual speakers and speaking habits, such as accent. The difficulty is caused not only by deviations in pronunciation, but also by different vocabularies and even grammars that are used by speakers with different language and accent backgrounds. To this day, even the most powerful systems fall short dramatically when compared with human performance. It is therefore essential to investigate theoretical and practical approaches for improving noise- and accent robustness of ASR for real-world speech applications.

With previous research, improving the performance of ASR systems is usually achieved by training the systems using tasks for which they were designed, and these improvements seldom generalize to other tasks and conditions. For example, a noise-robust system may perform poorly with clean speech or an accent-robust system may degrade the recognition of accent-free speech.


The research of Yang Sun addressed noise robustness in ASR by employing system combination and accent robustness by discovering the pronunciation variants that affect recognition accuracy most.

To improve noise robustness, Sun focused on system-combination approaches to harness the strengths of different systems. For that purpose, he developed methods to determine the confidence that the output of a given system is correct. This makes it possible for all systems to contribute, regardless of their overall performance. For accent robustness, he studied the pronunciation variants that characterize 15 accents in Mandarin Chines. Next, he investigated methods to adding pronunciation variants to the lexicon such that the recognition accuracy for accented speech improved, without deteriorating the accuracy for standard Mandarin. Also, an accent classifier was built to allow more accent-specific enhancement of the ASR system.


With respect to noise robustness, the core contributions of Sun’s system combination technique are two-fold. First, several transformations are introduced for improving the strengths of multiple systems that may differ substantially in approach and performance level. Second, a dynamic weighting algorithm was developed to allow dynamic and adaptive adjustment of the importance of each component system in diverse scenarios.

With respect to accent-robustness, a data-driven approach was developed to adapt the lexicon, so that the recognizer becomes more tolerant to pronunciation variations associated with different accents. The adaptation approach was shown to be efficient and effective for improving accent robustness against all 15 Mandarin accents at the same time.

Fundamental knowledge

Sun’s thesis shows that fundamental knowledge of the internal operation of speech recognition systems and linguistic knowledge are at least as important for improving the accuracy of automatic speech recognition as training with ever larger amounts of data.

Yang Sun's defence is on 13 September, from 14.30. Unfortunately, it is not possible to attend the ceremonies spontaneously and without invitation. You can follow the PhD defences via livestream via the following link: Academiezaal. If you have any questions regarding the livestream or attending a defence, contact