(Automatic) Speech Recognition
Course infoSchedule
Course moduleLET-REMA-LCEX10
Credits (ECTS)6
Language of instructionEnglish
Offered byRadboud University; Faculty of Arts; Graduate School;
Contactperson for the course
dr. L.F.M. ten Bosch
Other course modules lecturer
dr. L.F.M. ten Bosch
Other course modules lecturer
dr. L.F.M. ten Bosch
Other course modules lecturer
dr. L.F.M. ten Bosch
Other course modules lecturer
dr. C. Tejedor Garcia
Other course modules lecturer
Academic year2020
PER 3-PER 4  (25/01/2021 to 31/08/2021)
Starting block
Course mode
Registration using OSIRISYes
Course open to students from other facultiesYes
Waiting listNo
Placement procedure-
This course provides an overview of the theoretical and practical issues related to automatic speech recognition, a rapidly advancing research field bridging AI, linguistics and computational modeling.

From a theoretical perspective, students will be able to understand and explain important aspects of ASR such as the feature extraction, acoustic modeling (using Deep Neural Networks or Gaussian mixture models), training and decoding algorithms (Baum-Welch, Viterbi), the pronunciation vocabulary, language modeling, keyword spotting, evaluation procedures, and scalability of the approach.

From a practical point of view, students will familiarize themselves with speech decoding techniques via existing ASR software, web interfaces or Deep Neural Networks, so that they are able to perform mid-scale ASR experiments. A number of default experiments are pre-specified, but students are encouraged to explore their own research questions, e.g., the discovery of structure in speech, the use of prediction in speech decoding, accent detection, word search, entropy, or the relation between speech and artificial intelligence. The focus is on gaining insight in speech decoding, not on development of implementations.
In principle, each students writes his/her own scientific report, but cooperation may apply - depending on group size.
In the first part of the course, relevant aspects of speech production, speech perception, and acoustics are addressed. Thereafter, we discuss the theoretical and practical aspects of Automatic Speech Recognition. 

Speech production is a process in which some spoken message (e.g. a sequence of words or word-like units) is transformed into a continuous speech signal (spoken utterance). Due to the personal anatomical characteristics  in a speaker’s vocal tract, speech signals that represent the same message usually show a tremendous amount of variation.
Next, we address the question how the reverse process might take place, i.e., the decoding of a spoken utterance  in terms of discrete elements (e.g., words). In particular, we will explore various approaches for Automatic Speech Recognition (ASR), including the hidden Markov models (HMMs) based ASR, and the artificial net based approaches in ASR.

Lectures are given to provide students with background knowledge on ASR. Students will define their own research questions which can be answered by means of ASR technology. Students meet on a regular basis to present their progress to fellow students and to discuss any challenges encountered. A scientific report is written on the research carried out. Depending on the topics chosen, this report is individual or team-work.

Presumed foreknowledge

Test information


Assumed previous knowledge
Knowledge of statistics and scripting/programming.

Required materials
Title:Additional materials to be announced during the course. These materials will depend on the precise topic of the ASR thesis, and are essential for a thesis of sufficient quality.

Recommended materials
Title:Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.
Author:Jurafsky, D. and J.M. Martin
Publisher:London: Pearson.

Instructional modes

1 x 2 hours lecture/seminar a week

Test weight100
Test typeProject
OpportunitiesBlock PER 4, Block PER 4

Minimum grade