After successful completion of this course, the student must have a detailed overview of the steps entailed in training and testing an automatic continuous speech recognizer (ASR system).
From a theoretical perspective, students must be able to understand and explain design considerations regarding the feature extraction, acoustic modeling (using Deep Neural Networks or Gaussian mixture models), training and decoding algorithms (Baum-Welch, Viterbi), pronunciation vocabulary, language modeling, evaluation procedures, and scalability of the approach.
From a practical point of view, students are expected to have familiarized themselves with the basic use of the Hidden Markov Model Toolkit (HTK) so that they are able to implement a simple ASR system. For the scientic report, it is encouraged to re-use ideas from ASR to address decoding or classification problems as specified by the students themselves, such as singing type classification, prediction models in speech, accent detection, speech and big data, speech and artificial intelligence.
In principle, each students writes his/her own scientific report, but cooperation may apply - depending on group size. |
|
|
in the first part of the course, relevant aspects of speech production, speech perception, and acoustics are addressed. Thereafter, we discuss the theoretical and practical aspects of Automatic Speech Recognition.
Speech production is a process in which some message (a sequence of one or more discrete symbols e.g. a sequence of words or phonemes) is transformed into a continuous speech signal (spoken utterance). Due to the anatomical differences in articulator organs as well as to individual behavioural differences in how these organs are used during speech production, speech signals that represent the same message may show a tremendous amount of variation.
Next, we address the question how the reverse process might take place, i.e., the decoding of a spoken utterance in terms of discrete elements (e.g. words, phonemes). In particular, we will explore various approaches for Automatic Speech Recognition (ASR), including the hidden Markov models (HMMs) based ASR, and the artificial net based approaches in ASR.
Lectures are given to provide students with background knowledge on ASR. Students will define their own research questions which can be answered by means of ASR technology. Students meet on a regular basis to present their progress to fellow students and to discuss any challenges encountered. A scientific report is written on the research carried out. Depending on the topics chosen, this report is individual or team-work.
|
|
|