CLST Internships
NOTE: This page is about internships and Master's projects for students, not for job opportunities.
Overview
- Text analytics (Iris Hendrickx)
- e-Learning: Language Learning through a computer that listens (Helmer Strik)
- e-Health: diagnosis and therapy through a speaking and listening computer (Helmer Strik)
- Computational Psycholinguistics (Stefan Frank)
- Acoustic & phonemic variation (Louis ten Bosch)
Text analytics
Text analytics for career planning
For categorising the university personnel the HR department works with loads of function descriptions and classifications. Each employee is categorized in a function and a scale/level. This job classification system is called UFO. Employees and their supervisors are interested in how they can plan their careers by not only moving upward in levels in the same function, but also to different functions with similar descriptions and competences.
In this project a text analytical approach will be taken in which all function descriptions are compared and clustered by similarity measures (e.g., with word2vec) in order to find similarities between jobs and career paths that are novel and so far have not been taken into account by the HR dept.
Since all job descriptions are available in XML format, your task will comprise a) information extraction from the XML files, b) text analytics on similarities of job descriptions, c) reporting on results and potential career paths.
Contact
Researcher: Iris Hendrickx
Email: iris.hendrickx@ru.nl
Text analytics of Dreams
Dreams have been studies from many different perspectives like psycho-analysis, neuroscience or history. We are interested in the textual content of dreams and aimed to investigate where people fream about. We have already performed a first study where focused on automatic analysis of dreams. We used the dream reports from collections as gathered in the DreamBank, an online collection of over 20.000 dream descriptions.
One currently dominant theory of assumes that the content of dreams reflects a person's daily life and personal concerns. Previous studies on dream descriptions have shown that around 75-80% of dream content relates to everyday life and daily activities (Domhoff and Schneider, 2008).
For this project the are several possible directions for further research like: To what extent can we build an automatic classifier that can predict whether some text is a dream description or not? And how well can humans make this distinction? Another direction is to look into the narrative structure of dreams, do dreams have such structure?
Domhoff, G. and Schneider, A. (2008). Studying dream content using the archive and search engine on DreamBank.net. Consciousness and Cognition, 17(4):1238 – 1247.
Iris Hendrickx, Louis Onrust, Florian Kunneman, Ali Hürriyetoğlu, Antal van den Bosch, and Wessel Stoop. Unraveling reported dreams with text analytics. Digital Humanities Quarterly, 2017, accepted. (draft version at: https://arxiv.org/abs/1612.03659 )
Contact
Researcher: Iris Hendrickx
Email: iris.hendrickx@ru.nl
e-Learning: Language Learning through a computer that listens
e-Health: diagnosis and therapy through a speaking and listening computer
Our group carries out research on the use of ICT and, in particular, Language and Speech Technology (L&ST) on:
- e-Learning: in the context of Computer Assisted Language Learning (CALL); and
- e-Health: to the benefit of people with communicative disabilities
See, e.g., the projects mentioned on http://hstrik.ruhosting.nl/projects/.
In [the projects of] our research group there are several possibilities for internship, lab rotation, thesis research, etc.; see http://hstrik.ruhosting.nl/research-topics/
You can, e.g., assist in our research, experiments, or set up your own experiments. In general, we try to finalize the research plans together with you (the student), taking into account your background, interests, etc., since we believe that this is advantageous for you and us.
Contact
If you are interested, have questions, etc., you can contact Helmer Strik for more information about current possibilities.
Researcher: Helmer Strik
Email: helmer.strik@ru.nl
Neural network models of human sentence processing
Artificial neural networks currently underlie the most successful natural language processing applications. They have also long been popular with psycholinguists and cognitive scientists because of their apparent ability to simulate human language use. In this project, you will use modern, large-scale neural networks trained on realistic text corpora and investigate to what extent they predict aspects of sentence processing. Possible questions include: Which neural network architectures match human behaviour best? Do networks trained on different languages match human performance differences between these languages? Can networks that are trained on multiple languages simultaneously account for aspects of multilingual sentence processing?
Note: Some experience with Python programming is necessary for this project.
Contact
Researcher: Stefan Frank
Email: stefan.frank@ru.nl
Acoustic & phonemic variation
Project 1: Modelling Continuity in the Speech Signal
Problem: The speech signal is characterized by an enormous amount of variation. Variation refers to the observation that acoustic realizations of the same word in general are very different. Variation occurs within a single speaker as a result of speaking in different moods (happy, sad, ...), speaking style and rate (formal, informal, sloppy, careful), having a cold, and so on. Variation also takes place between speakers (e.g. male, female; different physiological properties of the vocal tract; accents). How human listeners deal with this variation is still largely unknown. It is a fundamental issue in speech research since for a successful communication a listener must be able to identify words even when they are pronounced in a way the listener never heard before (e.g. by a novel speaker). Variation in speech is constrained: speech is the result of articulatory movements which are actually quite slow, due to the ballistic properties of the vocal organs. As a result, speech contains a large amount of continuity which is very helpful to identify spoken words from the stream of speech sounds. In this project we will focus on these continuity properties of speech.
Task: The data that will be used for this project is Dutch. The first step of the task consists of choosing one of the corpora that are available for Dutch, to investigate a number of options to create features from the speech wave form, and to compare acoustic word realizations by segmenting words from the surrounding context in an automatic way. The second step of the task is to compare two acoustic realization of a word, one from a matching and one from a non-matching context. In this comparison we will investigate to what extent these contextual matches fit, relatively to other arbitrarily chosen possible non-contextual matches. This project will focus primarily on the conceptual/linguistic issues of variation in speech.
Contact
Researcher: Louis ten Bosch
Email: louis.tenbosch@ru.nl
Project 2: Automatic Speech Recognition using Deep Neural Networks
Problem: A neural network is an algorithm that makes use of computations via many artificial neurons organized along one or multiple layers. This hierarchical layered architecture is inspired by the structure of the human cortex. Nowadays Deep Neural Networks (DNNs) , i.e. networks with more than 3 layers, are often used in applications that simulate complex human tasks, such as simulation of human behavior, image recognition (i.e. the decoding of captchas), and speech recognition. A DNN can be considered as a technical tool, but in this project we will focus on the problem how to use a DNN in such a way that we can extract information about the deep structure in speech data. To that end, we will use it as a tool to search for phonemic structure in a speech corpus (TIMIT, a database of spoken American English).
Task: The first part of this project is a literature study about the advancements of techniques based on Deep Neural Networks since 2014. The second part of this project will be focused on the recognition of phone-like units in continuous speech from TIMIT by using Deep Neural Networks of different types.
Contact
Researcher: Louis ten Bosch
Email: louis.tenbosch@ru.nl