OSIRIS - Course offerings LET-REMA-LCEX10 2022

Course module

LET-REMA-LCEX10

Credits (ECTS)

Category

Language of instruction

English

Offered by

Radboud University; Faculty of Arts; Graduate School;

Lecturer(s)

Coordinator		dr. L.F.M. ten Bosch Other course modules lecturer
Examiner		dr. L.F.M. ten Bosch Other course modules lecturer
Lecturer		dr. L.F.M. ten Bosch Other course modules lecturer
Contactperson for the course		dr. L.F.M. ten Bosch Other course modules lecturer
Lecturer		dr. C. Tejedor Garcia Other course modules lecturer

Academic year

2022

Period

PER 3-PER 4

(30/01/2023 to 03/09/2023)

Starting block

PER 3

Course mode

full-time

Remarks

Open for master students Data Science, Artificial Intelligence, Computer Science

Registration using OSIRIS

Yes

Course open to students from other faculties

Yes

Pre-registration

Waiting list

Placement procedure

Aims

This course provides an overview of the theoretical and practical issues related to automatic speech recognition, a rapidly advancing research field bridging AI, linguistics and computational modeling.

From a theoretical perspective, students will be able to understand and explain important aspects of ASR such as the feature extraction, acoustic modeling (using Deep Neural Networks or Gaussian mixture models), training and decoding algorithms (Baum-Welch, Viterbi), the pronunciation vocabulary, language modeling, keyword spotting, evaluation procedures, and scalability of the approach.

From a practical point of view, students will familiarize themselves with speech decoding techniques via existing ASR software, web interfaces or Deep Neural Networks, so that they are able to perform mid-scale ASR experiments. A number of default experiments are pre-specified, but students are encouraged to explore their own research questions, e.g., the discovery of structure in speech, the use of prediction in speech decoding, accent detection, word search, entropy, or the relation between speech and artificial intelligence. The focus is on gaining insight in speech decoding, not on development of implementations.
In principle, each students writes his/her own scientific report, but cooperation may apply - depending on group size.

Content

In the first part of the course, relevant aspects of speech production, speech perception, and acoustics are addressed. Thereafter, we discuss the theoretical and practical aspects of Automatic Speech Recognition.

Speech production is a process in which some spoken message (e.g. a sequence of words or word-like units) is transformed into a continuous speech signal (spoken utterance). Due to the personal anatomical characteristics in a speaker’s vocal tract, speech signals that represent the same message usually show a tremendous amount of variation.
Next, we address the question how the reverse process might take place, i.e., the decoding of a spoken utterance in terms of discrete elements (e.g., words). In particular, we will explore various approaches for Automatic Speech Recognition (ASR), including the hidden Markov models (HMMs) based ASR, and the artificial net based approaches in ASR.

Lectures are given to provide students with background knowledge on ASR. Students will define their own research questions which can be answered by means of ASR technology. Students meet on a regular basis to present their progress to fellow students and to discuss any challenges encountered. A scientific report is written on the research carried out. Depending on the topics chosen, this report is individual or team-work.

Level

Research master.

Presumed foreknowledge

Knowledge of statistics and scripting/programming.
Knowledge of AI and or deep learning and or data science will be helpful.

Test information

Assessment will take place on the basis of an individual thesis. This thesis is either based on own experiments or on experiments carried out in a team.

Specifics

Required materials

Literature

Title

Additional materials to be announced during the course. These materials will depend on the precise topic of the ASR thesis, and are essential for a thesis of sufficient quality.

Recommended materials

Book

Title

Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.

Author

Jurafsky, D. and J.M. Martin

Publisher

London: Pearson.

Instructional modes

lecture/seminar

General

1 x 2 hours lecture/seminar a week

Tests

Assignment

Test weight

100

Test type

Project

Opportunities

Block PER 4, Block PER 4

Minimum grade

5,5