LET-REMA-LCEX10
(Automatic) Speech Recognition
Course infoSchedule
Course moduleLET-REMA-LCEX10
Credits (ECTS)6
Category-
Language of instructionEnglish
Offered byRadboud University; Faculty of Arts; Graduate School;
Lecturer(s)
Examiner
dr. L.F.M. ten Bosch
Other course modules lecturer
Lecturer
dr. L.F.M. ten Bosch
Other course modules lecturer
Contactperson for the course
dr. L.F.M. ten Bosch
Other course modules lecturer
Academic year2017
Period
PER 3-PER 4  (05/02/2018 to 31/08/2018)
Starting block
PER 3
Course mode
full-time
Remarks-
Registration using OSIRISYes
Course open to students from other facultiesYes
Pre-registrationNo
Waiting listNo
Placement procedure-
Aims
After successful completion of this course, the student must have a detailed overview of the steps entailed in training and testing an automatic continuous speech recognizer (ASR system).

From a theoretical perspective, students must be able to understand and explain design considerations regarding the feature extraction, acoustic modeling (using Deep Neural Networks or Gaussian mixture models), training and decoding algorithms (Baum-Welch, Viterbi), pronunciation vocabulary, language modeling, evaluation procedures, and scalability of the approach.

From a practical point of view, students are expected to have familiarized themselves with the basic use of the Hidden Markov Model Toolkit (HTK) so that they are able to implement a simple ASR system. For the scientic report, it is encouraged to re-use ideas from ASR to address decoding or classification problems as specified by the students themselves, such as singing type classification, prediction models in speech, accent detection, speech and big data, speech and artificial intelligence.
In principle, each students writes his/her own scientific report, but cooperation may apply - depending on group size.
Content
in the first part of the course, relevant aspects of speech production, speech perception, and acoustics are addressed. Thereafter, we discuss the theoretical and practical aspects of Automatic Speech Recognition. 

Speech production is a process in which some message (a sequence of one or more discrete symbols e.g. a sequence of words or phonemes) is transformed into a continuous speech signal (spoken utterance). Due to the anatomical differences in articulator organs as well as to individual behavioural differences in how these organs are used during speech production, speech signals that represent the same message may show a tremendous amount of variation.

Next, we address the question how the reverse process might take place, i.e., the decoding of a spoken utterance  in terms of discrete elements (e.g. words, phonemes). In particular, we will explore various approaches for Automatic Speech Recognition (ASR), including the hidden Markov models (HMMs) based ASR, and the artificial net based approaches in ASR.

Lectures are given to provide students with background knowledge on ASR. Students will define their own research questions which can be answered by means of ASR technology. Students meet on a regular basis to present their progress to fellow students and to discuss any challenges encountered. A scientific report is written on the research carried out. Depending on the topics chosen, this report is individual or team-work.
Assumed previous knowledge
Knowledge of statistics and scripting/programming.

Required materials
Literature
Title:Additional materials to be announced during the course. These materials are essential for a thesis of sufficient quality.

Recommended materials
Book
Title:Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.
Author:Jurafsky, D. and J.M. Martin
Publisher:London: Pearson. Obligatory: Chapters 9 and 10. 2009.

Instructional modes
Lecture/ Seminar
Attendance MandatoryYes

Tests
Assignment
Test weight100
Test typeProject
OpportunitiesBlock PER 4, Block PER 4

Minimum grade
5,5