Student Projects

Here you can find suggestions for Master projects in computer science, information science, and artificial intelligence. If you're interested in one of these topics, we are positive that we can put together a lighter version for a Bachelor project.

 

  • Single-trial classification of neuroimaging data: how to interpret the parameters

    Classification methods are a new approach to data analysis in cognitive neuroscience. However, it is often unclear how to interpret the parameters of the obtained classifiers. In this project, you will apply classification methods to neuroimaging data (MEG and/or fMRI) and analyze the classifier parameters. The results will be compared with those of more conventional methods,  such as statistical testing. The goal is to show that classification methods provide more insight into neuroimaging data when interpreted correctly.

    Contact: Marcel van Gerven or Tom Heskes

  • Single-trial detection of evoked potentials for audiology


    Here, the goal is to use machine learning techniques to detect evoked potentials using a limited amount of EEG data. This is of use in audiology, where the P1-N1-P2 auditory potential can be used to estimate hearing loss. In this project, you will work together with machine learning researchers at the computer science department as well as audiologists at the university medical center.

    Contact: Marcel van Gerven or Tom Heskes

  • Collaborative Filtering Techniques


    Collaborative filtering is the technique of making predictions about the  preferences of a user (filtering) by collecting the preference information from many other users (collaborating). One of the applications of collaborative filtering is building recommender systems for movies. Several approaches that build a profile of the user have been considered in the literature. The task of the student will be to implement, compare, and possibly improve some of these approaches and apply them to a real-world dataset. As an expert in collaborative filtering, you can join the Netflix competition (http://www.netflixprize.com/) and go for the grand prize of one million dollars.

    Contact: Adriana Birlutiu or Tom Heskes
  • Machine learning for Malaria research

    Malaria infects between 300 and 500 million people every year and causes between one and three million deaths annually. Control of malaria is becoming increasingly difficult as both the parasite and the mosquito vector are developing resistance to anti-malaria drugs and insecticides. A good understanding of transcriptional regulation of genes in Plasmodium falciparum, the deadliest species of the parasite that causes malaria in humans, is important for devising new ways to disrupt the parasite's life cycle. A number of methods that combine gene expression and genome sequence data to infer transcriptional regulatory models have been developed, see [1] for a review of some of the existing methods and some biological background. However, there is one more source of data available for Plasmodium falciparum, namely, genome sequences of species related to this organism. Since functionally relevant DNA sequences are conserved among related species, this data source is expected to be useful for building more accurate transcriptional regulatory models. The goal of this project is to build and test a classifier that combines Plasmodium falciparum gene expression and genome sequence data as well as genome sequence data of one or more related parasites.

    [1] Gardner, T.S., Faith, J.J. (2005). Reverse-engineering transcription control networks. Physics of Life Reviews, 2, 65-88.

    Contact: Rasa Jurgelenaite or Tom Heskes

     
  • Multi-class classification for fraude detection

    Using a dataset of past transactions, classifiers can learn to distinguish fraudulent transactions from legitimate transactions. In a previous project, "simple" binary classifiers have been tested. The goal of this project is to study multi-class classifiers that can make a distinction between different grades of fraudulence (i.e., green/orange/red instead of just green/red). The project is a collaboration with Digital Security and the company First8. Check out this project description (in Dutch) for more information.

    Contact: Tom Heskes or Erik Poll

  • Working Tomorrow


    Working Tomorrow is a program of LogicaCMG that aims to test the usefulness of new technological advancements. Several of their suggested Master projects involve artificial intelligence in general and machine learning/data mining in particular.

    Contact: Tom Heskes

  • Machine learning and Bioinformatics


    Several suggestions for bachelor and master projects on the interface between machine learning and bioinformatics can be found on Elena's page. Topics include: fast condensed nearest neighbor with hit miss networks, graph based feature selection, protein function assignment based on shared interacting domain patterns, and predicting protein subcellular localization.

    Contact: Elena Marchiori


  • Decomposition of orthologs in PPI networks using articulations


    The amount of data on protein-protein interaction networks (PPI-Net) grows exponentially. Many new data mining methods try to retrieve an important biological information using PPIs. One of the possible approaches is to compare PPI-Nets of distinct species to discover functional protein modules inside of the networks which are evolutionary conserved. However they work with whole PPI-nets and they have to use heuristics to make their computation feasible due to a computational burden. Recently a method for decomposition of networks has been introduced [1,2] where decomposed PPI's regions covered conserved modules and a new notion of modular network alignment has been introduced. The method has also shown the ability of enhancing searching strategies. The main goal of the project would be to improve the current state-of-the-art decomposition method and prove its generality for different protein network alignment methods.

    1) Jancura, P., Heringa, J. and Marchiori E.: Dividing Protein Interaction Networks by Growing Orthologous Articulations. PRIB 2008. Springer, 2008.
    2) Jancura, P., Heringa, J. and Marchiori E.: Divide, Align and Full-Search for Discovering Conserved Protein Complexes. EvoBio 2008. LNCS 4973, pp. 73-84, 2008.

    Contact: Pavol Jancura and Elena Marchiori

     

  • A "World" Learning Robot

    The focus of this project is on a computational implementation of the knowledge representation developed at the RU in past years. This includes the definition of a memory model and an adaptation of an existing learning algorithm using self-organizing memory maps. The software implementation will have the potential for interpreting events in some observed world, by means of reasoning including abductive inferencing. For test purposes an artificial world, embedded in a computer game context, is defined. 

    Contact: Janos Sarbo

     

  • Correctness of Problem Elicitation

    As opposed to "hard'" computer science (informatica), soft computer science (informatiekunde) does not have methods for proving a formal correctness of specifications. This is most pregnantly present in the field of problem elicitation, comprising the initial or first conceptualization of a problem. The goal of this research is an application of a knowledge representation model developed at the RU as a type system, enabling correctness proofs of soft specifications. The project includes the development of cases studies, as well as the introduction of a methodology that also suits a computational realization.

    Contact: Janos Sarbo

     

  • Mobile Robot Localization

    Robotics is an interesting application domain for machine learning techniques  that are able to deal with the inherent uncertainty in perception and action. We are interested in ways to estimate a robot's position relative to an external frame of reference (i.e., environment map). In other words, how does a robot learn about its position in the environment and how can it make use of this information to determine its actions? The robot will be constructed from a Mindstorms NXT LEGO set. Programming is to be done in Matlab using a toolbox that  makes a bluetooth connection between a PC and the robot. Possible research questions to consider:
     - How to map sensor data into probability distributions over robot locations?
     - How to learn and represent environment maps?
     - Assumptions / robustness / scalability of various architectures?

    Contact:  Perry Groot and  Marcel van Gerven

     robotx.jpg (142 Kb)

  • Efficient inference methods for the Promedas medical diagnostic decision support system 

    Promedas is one of the largest and most accurate diagnostic decision support systems for internal medicine. The system consists of a network of approximately 4000 diagnoses and 4000 tests and uses a Bayesian network to compute the diagnoses. This computation is computationally intractable and requires efficient approximation methods. In the project a number of novel approximate inference methods, such as Belief Propagation, are evaluated with the gold standard that is computed through extensive monte carlo simulation.w

    Contact: Bert Kappen

  • Probabilistic Structured Output Learning 

    Regression and classification are among most frequently encountered tasks in machine learning. Usually the inputs are represented as feature vectors and the outputs are real value scores (in case of regression) or class labels (in case of classification). However, in many situations we are faced with much more general problem: estimation of dependencies between sets of objects. Let us consider an example where training set consists of pairs of images constituting inputs and outputs, respectively. Let each pair of the images be of two people who are "related" to each other (e.g. friends, classmates, etc). Given a new image of a person that is used as an input, the output of the learning algorithm would be an image that is not necessarily contained in the training set but shares similar features with the training examples (e.g. shape of head, hair colour, etc). Thus, algorithm will generalize and predict how your best friend might look like! The aim of this project is to develop novel algorithms that can learn from structured inputs and predict structured outputs. The application area will be bioinformatics (e.g. prediction of the protein structure) or machine vision (e.g. image recognition).

    Contact: Evgeni Tsivtsivadze, Botond Cseke, and Tom Heskes

  • Semi-Supervised Preference Learing

    The labeled data for the training of machine learning algorithms is usually scarce and laborious to obtain. Situations when only a limited amount of labeled data and a large amount of unlabeled data is available to the learning algorithm are typical for many real-world problems (e.g. in bioinformatics, information retrieval, etc). The aim of semi-supervised learning is to use the information contained in the unlabeled examples to boost performance of the learning algorithm. During this project we will develop and apply semi-supervised methods to various problems in bioinformatics (e.g. remote homology detection) or in information retrieval (e.g. learning how to rank documents retrieved by a query). Another important aspect of this project is rigorous analysis of computational complexity of the developed/existing semi-supervised algorithms. The focus in this case will be on improving/optimizing the code of the method so that it can scale to datasets containing millions of training examples.

    Contact: Evgeni Tsivtsivadze and Tom Heskes

  • Multi-task self-taught Learning

    The aim of this project is to build algorithms that can learn simultaneously from different tasks by taking into account vast amount of unlabeled data. It bridges two ideas: multi-task and self-taught learning. The self-taught learning is using unlabeled data to improve performance of the learning algorithm. However, unlike in semi-supervised approach (see Semi-Supervised Learning project) in self-taught learning there is no assumption that unlabeled data has the same class labels or distribution of the labeled data. The intuition here is that unlabeled objects will still contain "higher-level feature representations" (e.g. angles, shapes, etc. in images or syntactical patterns in documents) that are encountered in the labeled examples. Learning to recognize such basic patterns from unlabeled data helps algorithm to distinguish between different objects and makes the learning task easier. On the other hand, in many real world problems estimation of multiple models is required. In these situations learning different tasks simultaneously can be advantageous. For example, machine vision problems require estimation of different models one for face, hand, etc. With large amount of the unlabeled data freely available for training, the goal of this project is to develop multi-task algorithms that can take advantage of the unlabelled data using self-taught learning approach.

    Contact: Evgeni Tsivtsivadze and Tom Heskes