LET-REMA-LCEX10
(Automatic) Speech Recognition
Course infoSchedule
Course moduleLET-REMA-LCEX10
Credits (ECTS)6
Category-
Language of instructionEnglish
Offered byRadboud University; Faculty of Arts; Graduate School;
Lecturer(s)
Coordinator
dr. L.F.M. ten Bosch
Other course modules lecturer
Examiner
dr. L.F.M. ten Bosch
Other course modules lecturer
Lecturer
dr. L.F.M. ten Bosch
Other course modules lecturer
Contactperson for the course
dr. L.F.M. ten Bosch
Other course modules lecturer
Lecturer
dr. C. Tejedor Garcia
Other course modules lecturer
Academic year2022
Period
PER 3-PER 4  (30/01/2023 to 03/09/2023)
Starting block
PER 3
Course mode
full-time
RemarksOpen for master students Data Science, Artificial Intelligence, Computer Science
Registration using OSIRISYes
Course open to students from other facultiesYes
Pre-registrationNo
Waiting listNo
Placement procedure-
Aims
This course provides an overview of the theoretical and practical issues related to automatic speech recognition, a rapidly advancing research field bridging AI, linguistics and computational modeling.

From a theoretical perspective, students will be able to understand and explain important aspects of ASR such as the feature extraction, acoustic modeling (using Deep Neural Networks or Gaussian mixture models), training and decoding algorithms (Baum-Welch, Viterbi), the pronunciation vocabulary, language modeling, keyword spotting, evaluation procedures, and scalability of the approach.

From a practical point of view, students will familiarize themselves with speech decoding techniques via existing ASR software, web interfaces or Deep Neural Networks, so that they are able to perform mid-scale ASR experiments. A number of default experiments are pre-specified, but students are encouraged to explore their own research questions, e.g., the discovery of structure in speech, the use of prediction in speech decoding, accent detection, word search, entropy, or the relation between speech and artificial intelligence. The focus is on gaining insight in speech decoding, not on development of implementations.
In principle, each students writes his/her own scientific report, but cooperation may apply - depending on group size.
Content
In the first part of the course, relevant aspects of speech production, speech perception, and acoustics are addressed. Thereafter, we discuss the theoretical and practical aspects of Automatic Speech Recognition. 

Speech production is a process in which some spoken message (e.g. a sequence of words or word-like units) is transformed into a continuous speech signal (spoken utterance). Due to the personal anatomical characteristics  in a speaker’s vocal tract, speech signals that represent the same message usually show a tremendous amount of variation.
Next, we address the question how the reverse process might take place, i.e., the decoding of a spoken utterance  in terms of discrete elements (e.g., words). In particular, we will explore various approaches for Automatic Speech Recognition (ASR), including the hidden Markov models (HMMs) based ASR, and the artificial net based approaches in ASR.

Lectures are given to provide students with background knowledge on ASR. Students will define their own research questions which can be answered by means of ASR technology. Students meet on a regular basis to present their progress to fellow students and to discuss any challenges encountered. A scientific report is written on the research carried out. Depending on the topics chosen, this report is individual or team-work.
Level
Research master.
Presumed foreknowledge
Knowledge of statistics and scripting/programming.
Knowledge of AI and or deep learning and or data science will be helpful.
 
Test information
Assessment will take place on the basis of an individual thesis. This thesis is either based on own experiments or on experiments carried out in a team.
Specifics

Required materials
Literature
Title:Additional materials to be announced during the course. These materials will depend on the precise topic of the ASR thesis, and are essential for a thesis of sufficient quality.

Recommended materials
Book
Title:Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.
Author:Jurafsky, D. and J.M. Martin
Publisher:London: Pearson.

Instructional modes
lecture/seminar

General
1 x 2 hours lecture/seminar a week

Tests
Assignment
Test weight100
Test typeProject
OpportunitiesBlock PER 4, Block PER 4

Minimum grade
5,5