CLST Internships

NOTE: This page is about internships and master projects for students, not for job opportunities

Overview:

  • Text analytics, Iris Hendrickx
  • e-Learning - Language Learning through a computer that listens, Helmer Strik
  • e-Health: diagnosis and therapy through a speaking and listening computer, Helmer Strik
  • Computational Psycholinguistics, Stefan Frank
  • Acoustic & phonemic variation, Louis ten Bosch
  • Video analysis of sign language, Onno Crasborn

----------------------------------------------------------------------------------------------

Theme: Text analytics, Iris Hendrickx

Text analytics for career planning

For categorising the university personnel the HR department works with loads of function descriptions and classifications. Each employee is categorized in a function and a scale/level. This job classification system is called UFO (https://www.vsnu.nl/en_GB/job_classification_ufo.html). Employees and their supervisors are interested in how they can plan their careers by not only moving upward in levels in the same function, but also to different functions with similar descriptions and competences.

In this project a text analytical approach will be taken in which all function descriptions are compared and clustered by similarity measures (e.g. with word2vec) in order to find similarities between jobs and career paths that are novel and so far have not been taken into account by the HR dept.

Since all job descriptions are available in XML format, your task will comprise a) information extraction from the XML files, b) text analytics on similarities of job descriptions, c) reporting on results and potential career paths.

Contact details: Iris Hendrickx i.hendrickx@let.ru.nl

Text analytics of Dreams

Dreams have been studies from many different perspectives like psycho-analysis, neuroscience or history. We are interested in the textual content of dreams and aimed to investigate where people fream about. We have already performed a first study where focused on automatic analysis of dreams. We used the dream reports from collections as gathered in the DreamBank (www.dreambank.net) an online collection of over 20.000 dream descriptions.

One currently dominant theory of assumes that the content of dreams reflects a person's daily life and personal concerns. Previous studies on dream descriptions have shown that around 75-80% of dream content relates to everyday life and daily activities (Domhoff and Schneider, 2008).

For this project the are several possible directions for further research like: To what extent can we build an automatic classifier that can predict whether some text is a dream description or not? And how well can humans make this distinction?  Another direction is to look into the narrative structure of dreams, do dreams have such structure?

Domhoff, G. and Schneider, A. (2008). Studying dream content using the archive and search engine on DreamBank.net. Consciousness and Cognition, 17(4):1238 – 1247.

Iris Hendrickx, Louis Onrust, Florian Kunneman, Ali Hürriyetoğlu, Antal van den Bosch, and Wessel Stoop. Unraveling reported dreams with text analytics. Digital Humanities Quarterly, 2017, accepted. (draft version at: https://arxiv.org/abs/1612.03659 )

Contact details: Iris Hendrickx i.hendrickx@let.ru.nl

Theme: e-Learning - Language Learning through a computer that listens
Theme: e-Health: diagnosis and therapy through a speaking and listening computer

Our group carries out research on the use of ICT and, in particular, Language and Speech Technology (L&ST) on:
- e-Learning: in the context of Computer Assisted Language Learning (CALL); and
- e-Health: to the benefit of people with communicative disabilities
See, e.g., the projects mentioned on http://hstrik.ruhosting.nl/projects/.

In [the projects of] our research group there are several possibilities for internship, lab rotation, thesis research, etc.; see http://hstrik.ruhosting.nl/research-topics/
You can, e.g., assist in our research, experiments, or set up your own experiments. In general, we try to finalize the research plans together with you (the student), taking into account your background, interests, etc., since we believe that this is advantageous for you and us.

If you are interested, have questions, etc., contact me, e.g. send an e-mail to Helmer – ‘w.strik -at- let.ru.nl’, and then we can provide more information about current possibilities.

Theme: Neural network models of human sentence processing

Artificial neural networks currently underlie the most successful natural language processing applications. They have also long been popular with psycholinguists and cognitive scientists because of their apparent ability to simulate human language use. In this project, you will use modern, large-scale neural networks trained on realistic text corpora and investigate to what extent they predict aspects of sentence processing. Possible questions include: Which neural network architectures match human behaviour best? Do networks trained on different languages match human performance differences between these languages? Can networks that are trained on multiple languages simultaneously account for aspects of multilingual sentence processing?

Some experience with Python programming is necessary for this project.

Contact: Stefan Frank, s.frank@let.ru.nl

Theme: Acoustic & phonemic variation

Project 1: Modelling Continuity in the Speech Signal

Problem: The speech signal is characterized by an enormous amount of variation. Variation refers to the observation that acoustic realizations of the same word in general are very different. Variation occurs within a single speaker as a result of speaking in different moods (happy, sad, ...), speaking style and rate (formal, informal, sloppy, careful), having a cold, and so on. Variation also takes place between speakers (e.g. male, female; different physiological properties of the vocal tract; accents). How human listeners deal with this variation is still largely unknown. It is a fundamental issue in speech research since for a successful communication a listener must be able to identify words even when they are pronounced in a way the listener never heard before (e.g. by a novel speaker). Variation in speech is constrained: speech is the result of articulatory movements which are actually quite slow, due to the ballistic properties of the vocal organs.  As a result, speech contains a large amount of continuity which is very helpful to identify spoken words from the stream of speech sounds. In this project we will focus on these continuity properties of speech.

Task: The data that will be used for this project is Dutch. The first step of the task consists of choosing one of the corpora that are available for Dutch, to investigate a number of options to create features from the speech wave form, and to compare acoustic word realizations by segmenting words from the surrounding context in an automatic way. The second step of the task is to compare two acoustic realization of a word, one from a matching and one from a non-matching context.  In this comparison we will investigate to what extent these contextual matches fit, relatively to other arbitrarily chosen possible non-contextual matches. This project  will focus primarily on the conceptual/linguistic issues of variation in speech.

Contact details: Louis ten Bosch, l.tenbosch@let.ru.nl, 024-3616069

Project 2: Automatic Speech Recognition using Deep Neural Networks

Problem: A neural network is an algorithm that makes use of computations via many artificial neurons organized along one or multiple layers. This hierarchical layered architecture is inspired by the structure of the human cortex. Nowadays Deep Neural Networks (DNNs) , i.e. networks with more than 3 layers, are often used in applications that simulate complex human tasks, such as simulation of human behavior, image recognition (i.e. the decoding of captchas), and speech recognition.  A DNN can be considered as a technical tool, but in this project we will focus on the problem how to use a DNN in such a way that we can extract information about the deep structure in speech data. To that end, we will use it as a tool to search for phonemic structure in a speech corpus (TIMIT, a database of spoken American English).

Task: The first part of this project is a literature study about the advancements of techniques based on Deep Neural Networks since 2014. The second  part of this project will be focused on the recognition of phone-like units in continuous speech from TIMIT by using Deep Neural Networks of different types.

Contact details: Louis ten Bosch, l.tenbosch@let.ru.nl, 024-3616069

Theme: Video analysis of sign language

Project: Visual search in a sign language dictionary

Sign language dictionaries contain videos of the form of signs, along with written information about their meaning and use. These dictionaries are in high demand world-wide, because there are a lot of foreign language learners of sign languages like Sign Language of the Netherlands (NGT): not only hearing interpreters and teachers, but also hearing family members of deaf children, for example. A current limitation of these dictionaries is that users can only search for properties of the form of signs (their ‘phonology’) that have been manually coded, such as the nature of the movement in the sign, where the location is, and which fingers are active.

This project aims to recruit AI techniques (computer vision by way of OpenPose and machine learning) to build search tools for users of sign dictionaries. Can we pre-empt the need for manual coding of signs altogether? How can we allow for more user-friendly and flexible search routines, both within and across languages? Could the same tools benefit research on sign language? These are the research questions you could work on and improve your knowlegde and skills in various AI techniques. 

Datasets that we have at our disposal include the various languages in the Global Signbank lexical database, ranging from a few hundred to a few thousand lexical items. The Deaf Support Group of the Humanities Lab of the Faculty of Arts is available for support in accessing and enriching data where needed.


Contact details: Onno Crasborn, o.crasborn@let.ru.nl