# Research projects Spring 2024

**How to apply**

If you are interested in applying for a position, please send your one page CV, motivation letter and a list of your obtained grades to: ellis-fellowships@ru.nl with the header containing the title and of the project and your s-number. **The deadline for applying to these projects is December 1st, 2023.**

This round consists of the following projects:

- Nonparametric Causal Regression Model for Heterogeneous Treatment Effects Estimation for Personalized Treatments: Benchmark Study
- Machine Learning Classifiers for Stratification of Alzheimer’s Disease and Mild Cognitive Impairment from resting-state fMRI
- Timeseries Analysis of Multivariate Count Data
- Extracting Structured Policy Representations from Transformer Networks
- Finding Structured Policies via Monte Carlo Tree Search
- Utilizing DVS Cameras in Neuromorphic Platforms for Eye Tracking
- Implementing Stochastic Spiking Neural Networks on FPGA for pattern recognition
- Federated Neuromorphic learning for edge AI and IoT
- Diving deeper into Autoencoders for anomaly detection
- Enriching large language models with factual knowledge for conversational AI
- Enhancing High-Intensity Precipitation Nowcasting with Transformer Models and Deep Generative Models
- Leveraging Weak Annotations in AI-based Metastasis Detection from CT Images
- Phonological scoring of non-standard speech
- Channel set invariance for neural networks
- Adaptation Strategies for Block-Toeplitz Regularized Linear Discriminant Analysis
- Robust (causal) inference using mathematical models of dynamical systems in biology
- Robust Reinforcement Learning via Uncertainty Quantification
- Shielded Reinforcement Learning under Delayed Observations
- Deep Learning-Based Control for Stochastic Dynamical Systems

## Nonparametric Causal Regression Model for Heterogeneous Treatment Effects Estimation for Personalized Treatments: Benchmark Study

**Supervisor**: dr. Parisa Naseri

### Project description

Precision medicine or personalized medicine is an innovative approach to tailoring disease prevention and treatment that takes into account differences in individuals’ characteristics. The increase in amount of available medical data provides opportunity for making inferences at the resolution of each individual. The ultimate aim of many statistical studies is to predict the effects of interventions and intervention in medicine is about the causal effects [6].

The average treatment effect (ATE) has long been studied as a measure of causal effect, assuming the same effect size for population under study. However, this is a simple assumption which might no be held. Accordingly, estimating ATE may oversimplify the heterogeneity of each individual. Therefore, it is important to estimate treatment effects for each individual or similar subgroups of patients, which is called heterogeneous treatment effect (HTE) [7].

The application of statistical learning tools for causal inference has led to significant improvements in the estimation of HTE. These improvements stem from the predictive power of advanced nonparametric regression models, which is adapted to causal inference. Two recent contributions such as Random Forests [1] and Bayesian Additive Regression Trees (BART) [3] are tree-based statistical learning tools on large datasets for causal analysis. A more recent and popular tree-based method for ITE estimation is Causal Forests (CF) which is a causal implementation of Random Forests. 9]. An extension of BART framework for causal analysis, is Bayesian Causal Forests (BCF) [4], specifically designed to discriminate prognostic and moderating effects of the covariates in HTE estimation. A different stream of contributions that do not focus on regression model is Meta-Learners. These algorithms are designed to estimate HTE by any suitable off-the-shelf supervised regression model (e.g., random forests, neural networks, etc.) [5, 8]. A recent model called Shrinkage Bayesian Causal Forests as the extension of previous method lead to improved HTE estimates and inferences on prognostic and moderating factors [2]. Many advanced machine learning tools for estimating HTE have been proposed in recent years, but there has been limited translational research into the real-world healthcare domain. To fill the gap, we review and compare recent HTE estimation methodologies, and perform benchmark experiments to test the feasibility of the methods for personalized treatment development.

### References

[1] Leo Breiman. Random forests. Machine learning, 45:5–32, 2001.

[2] Alberto Caron, Gianluca Baio, and Ioanna Manolopoulou. Shrinkage bayesian causal forests for heterogeneous treatment effects estimation. Journal of Computational and Graphical Statistics, 31(4):1202–1214, 2022.

[3] Hugh A Chipman, Edward I George, and Robert E McCulloch. Bart: Bayesian additive regression trees. 2010.

[4] P Richard Hahn, Jared S Murray, and Carlos M Carvalho. Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects (with discussion). Bayesian Analysis, 15(3):965–1056, 2020.

[5] Sören R Künzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10):4156–4165, 2019.

[6] Benjamin Lam, Mario Masellis, Morris Freedman, Donald T Stuss, and Sandra E Black. Clinical, imaging, and pathological heterogeneity of the alzheimer’s disease syndrome. Alzheimer’s research & therapy, 5(1):1–14, 2013.

[7] Yaobin Ling, Pulakesh Upadhyaya, Luyao Chen, Xiaoqian Jiang, and Yejin Kim. Emulate randomized clinical trials using heterogeneous treatment effect estimation for personalized treatments: Methodology review and benchmark. Journal of Biomedical Informatics, page 104256, 2022.

[8] Xinkun Nie and Stefan Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021. [9] Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.

## Machine Learning Classifiers for Stratification of Alzheimer’s Disease and Mild Cognitive Impairment from resting-state fMRI

**Supervisor: dr. Parisa Naseri**and prof. Tom Heskes

### Project description

Multivariate pattern analysis and statistical machine learning (ML) techniques have gained popularity in neuroimaging community. Neuroscientists are interested in the study of functional-connectivity patterns of brains at rest and how these affect the neurodegenerative disease like Alzheimer’s disease (AD).

AD is a progressive disease which has became a global crisis. It is estimated that more than 30 million people exhibit signs of AD and this number will rise to 90 million by 2050 [2]. Therefore, regarding the irreversible nature of AD, early diagnosis and interventions are crucial. Mild Cognitive Impairment (MCI) have been defined as the prodromal stage of AD, a period in which patients exhibit memory related dysfunction that does not interfere with their daily life [1]. As the brain connectivity at rest is shown to be sensitive to AD progression [4], it has been considered a potentially useful non-invasive biomarker of the disease. However, it is challenging to use it for diagnostic or prognostic purposes at the individual level.

ML approaches have the potential to overcome this issue, by combining several features in a single classifier. Previous studies mostly have applied support vector machines (SVMs). However, it has been shown that the Gaussian process logistic regression (GP-LR) approach has two significant advantages over kernel SVM; it provides a principled estimate of predicted class membership and a differentiable objective function. This objective function is used to set hyper-parameters. From predicted probability estimates we are able to quantify uncertainty in model predictions which is important in decision making in clinical settings. Moreover, thresholds can be tuned to adjust strong specificity or sensitivity scores in case of non-symmetric miss-classification costs. For instance, at preclinical stages, MCI, differentiating normal ageing from pathological cognitive decline in large populations requires highly specific discriminators. While, highly sensitive discriminators might be more important to predict the time of conversion from preclinical to clinical AD [3]. The aim of this project is to investigate the efficacy of multivariate statistical ML techniques including SVM, and GP-LR to perform patient stratification from functional-connectivity patterns of brains at rest.

### References

[1] Marilyn S Albert, Steven T DeKosky, Dennis Dickson, Bruno Dubois, Howard H Feldman, Nick C Fox, Anthony Gamst, David M Holtzman, William J Jagust, Ronald C Petersen, et al. The diagnosis of mild cognitive impairment due to alzheimer’s disease: recommendations from the national institute on aging-alzheimer’s association workgroups on diagnostic guidelines for alzheimer’s disease. Alzheimer’s & dementia, 7(3):270–279, 2011.

[2] Deborah E Barnes and Kristine Yaffe. The projected effect of risk factor reduction on alzheimer’s disease prevalence. The Lancet Neurology, 10(9):819–828, 2011.

[3] Edward Challis, Peter Hurley, Laura Serra, Marco Bozzali, Seb Oliver, and Mara Cercignani. Gaussian process classification of alzheimer’s disease and mild cognitive impairment from resting-state fmri. NeuroImage, 112:232–243, 2015.

[4] Michael D Greicius, Gaurav Srivastava, Allan L Reiss, and Vinod Menon. Default-mode network activity distinguishes alzheimer’s disease from healthy aging: evidence from functional mri. Proceedings of the National Academy of Sciences, 101(13):4637–4642, 2004.

## Timeseries Analysis of Multivariate Count Data

**Supervisors**: Dr. Max Hinne and Dr. Yuliya Shapovalova

### Project Description

Multivariate time series analysis plays a pivotal role across various scientific domains. One example is in epidemiology, where modelling the number of disease outbreaks across various cities is of paramount importance. These outbreak counts manifest as discrete time series. When modelling these observations, it is especially important to consider the potential correlations between cities, for instance, due to the disease being spread via traffic [1].

Although there is extensive literature on modelling multivariate time series data, the primary focus has been on continuous observations with a static correlation structure. Recent advancements in our research groups have begun to address these gaps. Shapovalova et al. [2] devised a state-space model with Poisson-distributed observations, while Huijsdens et al. [3] formulated a Bayesian non-parametric model based on Wishart processes that emphasizes dynamic correlation structures.

This project intends to bridge these research areas, concentrating on the following questions:

- How can the Poisson likelihood from the state-space model be integrated with the Wishart process model?
- Does accounting for dynamic correlations offer improvements in model fit and predictive capabilities compared to static correlations?
- How does our model perform relative to baseline strategies that traditionally treat observed counts as continuous variables?

Given the high-dimensional nature of the resulting model and its potential application to extensive time series data, ensuring computational efficiency is crucial. To this end, we utilize the Blackjax framework [4] to implement our approximate inference algorithms.

The chosen student will collaborate closely with the Uncertainty in Complex Systems group at the Artificial Intelligence department, and the Data Science group at the Institute for Computing and Information Sciences, working alongside PhD students engaged in related projects.

### Requirements

Experience with / interest in:

- Time series analysis
- Approximate Bayesian inference
- Gaussian processes

### References

[1] Agosto A, Giudici P. A Poisson autoregressive model to understand COVID-19 contagion dynamics. Risks. 2020; 8(3):77. https://doi.org/10.3390/risks8030077

[2] Shapovalova Y, Baştürk N, Eichler M. Multivariate count data models for time series forecasting. Entropy. 2021; 23(6):718. https://doi.org/10.3390/e23060718

[3] Huijsdens H. Leeftink D, Geerligs L, Hinne M. Inference of Wishart processes using sequential Monte Carlo, forthcoming.

[4] Lao, J., & Louf, R. (2022). BlackJAX: Library of samplers for JAX. Astrophysics Source Code Library, ascl-2211.

## Extracting Structured Policy Representations from Transformer Networks

**Supervisors**: Maris Galesloot MSc & Dr. Nils Jansen

### Motivation

In sequential decision-making environments with partial observability, for instance, where the underlying state is not fully observable, we often require memory in our optimal decision-making policies. This memory represents the history of actions taken and observations returned by the environment. In such problems, a typical approach is to approximate and learn such a memory representation by using recurrent neural networks (RNNs). However, in sequential processing domains like natural language processing, we have seen that RNNs are outperformed by a recent architecture called Transformers. These networks do not learn a recurrent update function but instead learn to represent the relative importance between inputs via so-called attention heads (Vaswani et al., 2017).

In accordance with other fields, Transformers have been shown to surpass RNNs as the underlying architecture in various sequential decision-making problems, such as the reinforcement learning (RL) problem (Esslinger et al., 2022). For example, in RL, the Decision Transformer is able to solve a range of tasks from a large dataset of data collected from each individual task (Chen et al., 2021). However, the policies are not interpretable, and no guarantee can be given on their performance. For RNNs, we can discretize (e.g., via quantization) the memory update to arrive at a structured representation of the policy (Koul et al., 2019). Such techniques can also be used to improve training efficiency and compute a policy that can satisfy a performance requirement (Carr et al., 2021).

### Challenge

RNNs are recurrent architectures that have an internal memory update function. Thus, they essentially represent a continuous memory update function that we can discretize with a quantization function (Koul et al., 2019). Transformers require all input simultaneously for the attention mechanism to kick in. Thus, it is unclear how we can structurally represent the policy of a Transformer in a similar way.

### Goal

Our aim is to find a structured policy representation from a Transformer policy.

### Overview

We start by looking into variants such as auto-regressive Transformer networks, which can output a representation of the history at each pass through the network. This way, the network is trained to represent the history implicitly. However, note that this direction is a suggestion. The student is encouraged to devise their own approach to tackle the problem.

### Main tasks

This project will be conducted in collaboration with the daily supervisors. The main tasks of the project involve:

- Review literature and formulate the problem statement
- Replicate results from previous work and identify gaps
- Implement the proposed methods to fill the gaps
- Design experiments to evaluate the new methods
- Write a paper on the main findings

### Embedding

This project will be conducted at the Department of Software Science (SWS) as part of the LAVA-LAB research group (see https://lava-lab.org/). The student is encouraged to actively participate in this group (including weekly meetings with the whole team) and will be able to work closely with the PhD student Maris Galesloot who is currently researching related topics. Furthermore, funding is available for visiting at least one workshop or conference on a related topic to interact with top researchers in the field

### References

Carr, S., Jansen, N., and Topcu, U. (2021). Task-aware verifiable rnn-based policies for partially observable markov decision processes. J. Artif. Intell. Res., 72:819–847.

Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., and Mordatch, I. (2021). Decision transformer: Reinforcement learning via sequence modeling. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 15084–15097.

Esslinger, K., Platt, R., and Amato, C. (2022). Deep transformer q-networks for partially observable reinforcement learning. CoRR, abs/2206.01078.

Koul, A., Fern, A., and Greydanus, S. (2019). Learning finite state representations of recurrent policy networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008

## Finding Structured Policies via Monte Carlo Tree Search

**Supervisors**: Maris Galesloot MSc & Dr. Nils Jansen

### Motivation

Computing optimal decision-making policies for autonomous agents in partially observable environments, for instance, where the underlying state is not directly observable, is generally a hard problem. Recently, techniques employing recurrent neural networks (RNNs) have seen great success in these environments (Ni et al., 2022). Such techniques learn approximate representations of history to find good policies from experiences collected by simulating the model (Heess et al., 2015; Hausknecht and Stone, 2015). By neglecting the model, these methods generally require a large number of environmental interactions. Furthermore, the computed policies are not interpretable and lack guarantees on their performance. To combat this, we (1) learn structured policy representations from RNNs and (2) iteratively improve the learned policy with model-based information (Carr et al., 2019).

### Challenge

Combining model-based approaches with deep learning techniques gives rise to the question of how to efficiently design the optimization objective for the RNN policy. Methods for learning structured policies are able to combine the information from the model to speed up learning (Carr et al., 2021). However, these methods have so far relied on sub-optimal learning targets computed directly from the model instead of optimizing towards an objective, for instance, by maximizing the expected rewards.

### Goal

Inspired by AlphaZero (Silver et al., 2017; Anthony et al., 2017), we aim to develop a search method equipped with neural networks to iterate over the space of memory-based policies, optimizing for an objective aimed at maximizing expected rewards. Similar methods are able to solve difficult problems with a fully observable state but are understudied in partially observable environments. Secondly, we aim to use the information from the performance of the computed policy on the model to steer further search iterations. The AlphaZero example figure is from Zhang & Yu (2020).

### Overview

The idea is to combine a well-known variant of the Monte Carlo tree search algorithm (Silver and Veness, 2010) with an RNN architecture to search for a good policy and associated history representation. Then, we discretize the memory representation to arrive at a finite-memory policy in a structured format (Koul et al., 2019). After implementing the sketched framework, we will look into how to exploit the structured policy to improve the learning procedure. For instance, by exploiting the resulting information of exactly evaluating the policy on the model.

### Main tasks

This project will be conducted in collaboration with the daily supervisors. The main tasks of the project involve:

- Review literature and formulate the problem statement
- Replicate results from previous work and identify gaps
- Implement the proposed methods to fill the gaps
- Design experiments to evaluate the new methods
- Write a paper on the main findings

### Embedding

This project will be conducted at the Department of Software Science (SWS) as part of the LAVA-LAB research group (see https://lava-lab.org/). The student is encouraged to actively participate in this group (including weekly meetings with the whole team) and will be able to work closely with the PhD student Maris Galesloot who is currently researching related topics. Furthermore, funding is available for visiting at least one workshop or conference on a related topic to interact with top researchers in the field.

### References

Anthony, T., Tian, Z., and Barber, D. (2017). Thinking fast and slow with deep learning and tree search. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5360–5370.

Carr, S., Jansen, N., and Topcu, U. (2021). Task-aware verifiable rnn-based policies for partially observable markov decision processes. J. Artif. Intell. Res., 72:819–847.

Carr, S., Jansen, N., Wimmer, R., Serban, A. C., Becker, B., and Topcu, U. (2019). Counterexample-guided strategy improvement for pomdps using recurrent neural networks. In Kraus, S., editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 5532–5539. ijcai.org.

Hausknecht, M. J. and Stone, P. (2015). Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015, pages 29–37. AAAI Press. Heess, N., Hunt, J. J., Lillicrap, T. P., and Silver, D. (2015). Memory-based control with recurrent neural networks. CoRR, 1512.04455.

Koul, A., Fern, A., and Greydanus, S. (2019). Learning finite state representations of recurrent policy networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.

Ni, T., Eysenbach, B., and Salakhutdinov, R. (2022). Recurrent model-free RL can be a strong baseline for many pomdps. In ICML, volume 162 of Proceedings of Machine Learning Research, pages 16691–16723. PMLR.

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T. P., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., and Hassabis, D. (2017). Mastering the game of go without human knowledge. Nat., 550(7676):354–359.

Silver, D. and Veness, J. (2010). Monte-carlo planning in large POMDPs. In NIPS, pages 2164–2172. Curran Associates, Inc.

Zhang, H., Yu, T. (2020). AlphaZero. In: Dong, H., Ding, Z., Zhang, S. (eds) Deep Reinforcement Learning. Springer, Singapore. https://doi.org/10.1007/978-981-15-4095-0_15

## Utilizing DVS Cameras in Neuromorphic Platforms for Eye Tracking

**Supervisors**: Marzieh Hassanshahi (marzieh.hassanshahi@donders.ru.nl) and Dr. Mahyar Shahsavari

### Project explanation

Eye tracking is a critical technology in various domains, such as human-computer interaction, psychology, and neuroscience. Traditional eye-tracking systems have limitations in terms of speed, power consumption, and spatial resolution. This research proposal aims to investigate the integration of Dynamic Vision Sensors (DVS) cameras into neuromorphic platforms for real-time, low-power, high-resolution eye tracking. The project's goal is to develop an efficient and accurate eye tracking system that leverages the advantages of neuromorphic computing and DVS cameras to advance our understanding of visual processing and enable novel applications.

The human visual system is a remarkable model of efficiency and precision, capable of processing vast amounts of visual information with minimal energy consumption. Replicating these capabilities in artificial systems has been a long-standing goal in the field of neuromorphic engineering. Neuromorphic platforms, inspired by the human brain's structure and function, are designed to perform complex cognitive tasks efficiently and in real-time.

Dynamic Vision Sensors (DVS) are a novel type of event-based camera that operates differently from traditional frame-based cameras. Instead of capturing frames at fixed time intervals, DVS cameras output events whenever a pixel's brightness changes significantly. This event-driven approach offers several advantages, including low power consumption, high temporal resolution, and low latency. Integrating DVS cameras into neuromorphic platforms could revolutionize eye tracking technology, making it faster, more energy-efficient, and capable of tracking eye movements with higher precision.

### Research Objectives

The primary objectives of this research are as follows:

- Integration of DVS Cameras: Investigate the hardware and software requirements for integrating DVS cameras into existing neuromorphic platforms.
- Development of Eye Tracking Algorithms: Develop novel algorithms that utilize DVS camera data for real-time eye tracking, including saccade, fixation, and smooth pursuit detection.
- Performance Evaluation: Evaluate the accuracy, speed, and power efficiency of the proposed eye tracking system in comparison to traditional eye-tracking methods.
- Applications: Explore potential applications of the neuromorphic eye tracking system in areas such as human-computer interaction, cognitive neuroscience, and assistive technologies.

### Methodology

Hardware Integration

- Select an appropriate DVS camera model and interface it with a neuromorphic platform (e.g., SpiNNaker, Loihi, FPGA, Or Jetson Nano).
- Develop any necessary hardware and software interfaces to enable real-time communication between the DVS camera and neuromorphic platform.

Eye Tracking Algorithms

- Develop event-based algorithms to process DVS camera data for eye tracking, including feature extraction and event interpretation.
- Implement real-time algorithms for detecting and tracking eye movements, including saccades, fixations, and smooth pursuits.

### Performance Evaluation

- Conduct experiments to compare the performance of the neuromorphic eye tracking system with traditional eye-tracking methods using standardized eye-tracking datasets.
- Measure the system's power consumption, latency, and accuracy under various conditions and stimulus types.

### Applications

- Explore potential applications of the neuromorphic eye tracking system, such as gaze-based human-computer interfaces, cognitive load assessment, and eye-controlled assistive technologies.
- Collaborate with researchers in relevant fields to apply the system to specific use cases and gather user feedback.

### Expected Outcomes

This research is expected to yield several outcomes:

- A functional neuromorphic eye tracking system that integrates DVS cameras, capable of real-time eye movement detection and tracking.
- Novel algorithms for processing DVS camera data and interpreting events for eye tracking applications.
- Comprehensive performance evaluations demonstrating the advantages of the proposed system over traditional eye-tracking methods.
- Insights into potential applications of the neuromorphic eye tracking system in various domains.

### References

Hengyi Lv et al., Dynamic Vision Sensor Tracking Method Based on Event Correlation Index, Hindawi, 2021

AnastasiosN.Angelopoulos, et al., Event-BasedNear-EyeGazeTrackingBeyond10,000Hz, 2022 IEEE Transactionson Visualizationand ComputerGraphics.

## Implementing Stochastic Spiking Neural Networks on FPGA for pattern recognition

**Supervisors**: Dr. Leila Bagheriye and Dr. Mahyar Shahsavari

### Project description

Although Deep Neural Network (DNN) have already shown superiority in many real-world applications but due to the high density of neuron computing, the high-power issue is the main design challenge to implement the DNN hardware. To solve the power problem of DNN, the Spiking Neural Network (SNN) has been proposed to reduce the power consumption through spike transmission. However, it is difficult to implement large-scale SNNs because of the intrinsic feature of non-differential neuron operations. In this project, we propose to use the Stochastic Computing (SC) method to build an SNN neuron model. The SC-based SNN can not only improve the efficiency of calculation but reduce the SNN design barrier compared with the non-SC SNN methods.

The SC-based computing methods use simple logic gates to perform common operations, such as add and multiplication. The processed data in SC is usually presented in a bit stream as a probability value (between 0 to 1). However, the result of the SC method relies on the correlation of the involved input bit streams. If the involved bit streams have high cross-correlation, it means the bit stream generation has low randomness i.e., we need some kind of random number generation to have stochastic behaviour in SNNs. To reduce the area overhead and retain the benefit of the SNN approach and add randomness we propose to use the SC-based operators, such as adders and multipliers, instead of the conventional operators in the neuron models like LIF model.

In order to verify the proposed stochastic SNN design method, the design will be implemented in field programmable gate arrays (FPGAs) or a neuromorphic hardware and its performance will be evaluated on the MNIST image recognition dataset. This stochastic SNN framework aims to achieve a higher accuracy compared to other SNN designs and a comparable accuracy as their ANN counterparts. Hence, the proposed SNN design can be an effective alternative to achieving high accuracy in hardware constrained applications and opens new horizons in developing brain inspired large scale platforms in edge devices and IOT applications.

### References

Fabio Galán-Prado, et al., ‘Compact Hardware Synthesis of Stochastic Spiking Neural Networks’ International Journal of Neural Systems Vol. 29, No. 08, 1950004 (2019)

Khadeer Ahmed, et al., Probabilistic Inference Using Stochastic Spiking Neural Networks on A Neurosynaptic Processor, IJCNN, 2015.

## Federated Neuromorphic learning for edge AI and IoT

**Supervisor**: Dr. Mahyar Shahsavari

### Project Description

Edge AI and Internet of Things (IoT) are rapidly developed with connecting billions of devices such as sensors, actuators, robots, and autonomous vehicles, generating massive amounts of data. Due to data privacy, reducing data traffic, and network latency federated learning (FL) [1][2] proposed decentralized or local data processing. Despite benefits of edge AI, they face the two fundamental challenges. First, modern AI-based algorithms depend intrinsically on complex learning methods, and more remarkably rich training datasets. Thus, the limited sizes of local datasets available to edge devices inevitably make the task of training more difficult. Secondly, machine learning algorithms are computationally intensive and energy consuming, which hampers energy-constrained edge devices from training and analyzing data locally. One potential technique to address the first challenge is federated learning (FL) which as reported [3] [4] multiple collaborative devices locally train a machine learning model with its own data and in parallel without uploading raw data to a server. Therefore, the edge devices only upload parameters or gradients to a central server for global model aggregation. Respect to data privacy, FL has recently been applied in privacy- sensitive medical applications e.g., medical image classification [5].

Neuromorphic platforms can learn from data without explicitly being program for each separated task. These platforms can use local learning to learn from local data in real-time. [6] shows neuromorphic platform can be used leading the federated neuromorphic learning has been discussed. In this research, we aim to design a neuromorphic system and investigating requirements to be used as edge AI devices. Which properties could this system supports to implement FL is part of this research.

### References

[1] Konečný, J. et al. Federated learning: strategies for improving communication efficiency (2016) (https://arxiv.org/abs/1610.05492).

[2] B. McMahan. Et al. Communication-efficient of deep networks from decentralized data, AISTATS, (2017)

[3] Li, B. et al. Random sketch learning for deep neural networks in edge computing. Nat. Comput. Sci.1, 221–228 (2021).

[4] Lim, W. Y. B. et al. Decentralized edge intelligence: A dynamic resource allocation framework for hierarchical federated learning. IEEE Trans. Parallel Distrib. Syst. 33, 536–550 (2022).

[5] Froelicher, D. et al. Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption. Nat. Commun. 2, 5910 (2021).

[6] Helen Yang, et al. Lead federated neuromorphic learning for wireless edge artificial intelligence, Nat. Commun. 4269 (2022).

## Diving deeper into Autoencoders for anomaly detection

**Supervisors**: Roel Bouman and prof. Tom Heskes

### Project Description

Anomaly detection is the study of finding data points that do not fit the expected structure of the data. Anomalies can be caused by unexpected processes generating the data. In chemistry an anomaly might be caused by an incorrectly performed experiment, in medicine a certain disease might induce rare symptoms, and in predictive maintenance an anomaly can be indicative of early system failure. Depending on the application domain, anomalies have different properties, and may also be called by different names.

One of the more popular methods for anomaly detection is the well-known autoencoder [1, 2]. Autoencoders can be used to detect anomalies by training them on a reference dataset and minimizing the reconstruction loss, and subsequently classifying those samples with a high reconstruction loss as anomalies. When anomaly detection is applied in the unsupervised setting, we encounter a general problem, the absence of labels. Without labels, the training data for the autoencoder will contain the anomalies we want to detect. An autoencoder, even with few anomalous samples, will learn to reconstruct them [3, 4]. This leads to a lower reconstruction loss, making anomalies harder to detect.

While this problem has been observed by several researchers [3, 4], the extent to which it affects the detection of anomalies has not previously been quantified. It is therefore unknown at what percentage of anomalies in the training data the method starts breaking down, and to what extent.

Several extensions to the autoencoder architecture have been proposed in recent years aiming to alleviate this problem in several domains [3, 5, 6]. These extensions however have not been thoroughly compared.

The goal of this project is to: I) Quantify the loss of performance in autoencoders in the presence of anomalies in the training data by making use of targeted simulations and real-world data. II) Assess the efficacy of existing methods in alleviating the above described detection problems. III) Use these learned lessons in order to provide both guidelines and develop new methodology for better dection of anomalies.

### Requirements

We search for MSc students in computer science, artificial intelligence, mathematics, or a related discipline with a strong interest in exploring the fundamentals of machine learning. You are highly motivated, open-minded, interested in pursuing an academic career, and preferably proficient in the Python programming language.

### References

1. Japkowicz, Nathalie, Catherine Myers, and Mark Gluck. "A novelty detection approach to classification." IJCAI. Vol. 1. 1995.

2. Sakurada, Mayu, and Takehisa Yairi. "Anomaly detection using autoencoders with nonlinear dimensionality reduction." Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis. 2014.

3. Astrid, Marcella, et al. "Learning not to reconstruct anomalies." arXiv preprint arXiv:2110.09742 (2021).

4. Gong, Dong, et al. "Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.

5. Zhou, Chong, and Randy C. Paffenroth. "Anomaly detection with robust deep autoencoders." Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 2017.

6. Bergmann, Paul, et al. "Improving unsupervised defect segmentation by applying structural similarity to autoencoders." arXiv preprint arXiv:1807.02011 (2018).

## Enriching large language models with factual knowledge for conversational AI

**Supervisor**: Dr. Faegheh Hasibi

### Project Description

Access to information is becoming increasingly conversational, and developing conversational assisstants is largely dependent on Large Language Models (LLMs). Generative LLMs, such as ChatGPT, tend to hallucinate information that they have seen less frequently during their training and have limited memorization of less popular factual knowledge (e.g., information about less popular places, people, and products). Conversational assistants, on the other hand, need to provide highly personalized and accurate content to their users, which often involves handling less popular knowledge. How can we help LLM-based conversational agents to provide accurate information about less popular factual knowledge?

Research has shown that instead of solely relying on LLMs’ memories (i.e., their parametric knowledge), we can augment them with relevant retrieved information and mitigate their low performance on questions about less popular entities. The challenge, however, is augmenting LLMs in conversational scenarios, where we have a rather long and personal conversation history, and questions are dependent on the context of the conversation.

In this project, you will use LLM augmentation methods for developing conversational models. You will explore how to efficiently augment and fine tune parts of LLMs using Parameter Efficient Fine Tuning (PEFT) methods and improve the performance of these models on less popular knowledge.

### Requirements

Basic knowledge in deep learning, interest in large language models and information retrieval.

### References

Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.

Edward J. Hu, Yelong Shen, Phillip Wallis,Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models, In Proceedings of the Tenth International Conference on Learning Representations (ICLR).

## Enhancing High-Intensity Precipitation Nowcasting with Transformer Models and Deep Generative Models

**Supervisor**: Dr. Yuliya Shapovalova

### Project Description

In the face of climate change, understanding and predicting extreme weather events has become more crucial than ever. We anticipate significant shifts in the distribution of meteorological variables, resulting in more frequent and severe weather anomalies, such as high-intensity precipitation events and prolonged droughts. Our project seeks to harness the potential of deep learning to address these challenges.

In the realm of precipitation nowcasting, we're at a pivotal moment where radar data and deep learning intersect. Recent research has demonstrated that deep generative models, when equipped with tailored loss functions and additional features, outperform conventional numerical weather prediction methods (NWM) and earlier neural network-based approaches (e.g., UNet, MetNet) in predicting high-intensity precipitation events [1].

Moreover, the emergence of transformer architectures has sparked excitement in the world of sequence-to-sequence modeling. A recent study by Bai et al. (2022) [2] introduced "Rainformer," a transformer-based architecture showcasing strong performance for precipitation nowcasting. However, what remains unexplored is how Rainformer stacks up against the formidable deep generative models we've seen in Cambier van Nooten et al. (2023)[1].

### Project Objectives

This Master's thesis project offers two research directions:

1) In-depth model comparison: Conduct an extensive comparative analysis between the deep generative model [1] and the Rainformer [2]. Extend Rainformer by incorporating additional features to ensure a fair and insightful comparison. This approach aims to unveil the strengths and weaknesses of each model, helping us understand which one excels in the context of high-intensity precipitation prediction.

2) Efficiency through sparsity: Dive into the world of neural network sparsity, a key step towards real-world applicability. We will explore the "lottery ticket hypothesis," a fascinating concept in deep learning. This hypothesis suggests that a subset of a neural network's initial weights can, when trained independently, achieve similar or even superior performance compared to the fully trained network. We will apply this concept to deep generative models for precipitation nowcasting, examining the efficiency gains associated with introducing sparsity and its impact on performance.

Domain experts from KNMI will join this project to help along the way, especially with the interpretation of the results.

### References

[1] Cambier van Nooten, C., Schreurs, K., Wijnands, J. S., Leijnse, H., Schmeits, M., Whan, K., & Shapovalova, Y. (2023). Improving precipitation nowcasting for high-intensity events using deep generative models with balanced loss and temperature data: a case study in the Netherlands. Artificial Intelligence for the Earth Systems.

[2] Bai, C., Sun, F., Zhang, J., Song, Y., & Chen, S. (2022). Rainformer: Features extraction balanced network for radar-based precipitation nowcasting. IEEE Geoscience and Remote Sensing Letters, 19, 1-5.

[3] Liu, S., Tian, Y., Chen, T., & Shen, L. (2023). Don’t be so dense: sparse-to-sparse GAN training without sacrificing performance. International Journal of Computer Vision, 1-14.

## Leveraging Weak Annotations in AI-based Metastasis Detection from CT Images

**Supervisors**: Max de Grauw and Alessa Hering

### Clinical Problem

Computed Tomography (CT) scans provide a comprehensive 3D view of the body's interior and serve as a primary imaging modality for detecting metastasis – malignant growths that often hint at the spread of cancer. Given their minute size, often just a few millimeters across, metastases present a considerable challenge to discern amidst the multitude of CT image slices. Currently, expert radiologists dedicate significant time and effort to interpret these scans. The aim of automatic metastasis detection, therefore, emerges not just as a quest for accuracy but also an avenue to optimize valuable human resources in the medical domain.

### Solution

To develop an AI-based model for automatic and efficient detection of metastasis in CT images. While automatic lesion detection models have been studied before, only few studies have explored the potential of utilizing expansive datasets with weak annotations. This project aims to use the vast amounts of weak annotated datasets (e.g. DeepLesion dataset) and diagnostic reports to improve the efficiency and accuracy of metastasis detection algorithms.

By reducing the need for exhaustive manual annotations and expert intervention, this approach could pave the way for a new era in medical imaging where clinicians are aided by intelligent systems, making diagnostic procedures faster and more precise.

### Approach

**Primary goal: Image-based Metastasis Detection** - Development: Build a baseline detection model to detect metastasis in CT images, using a mixture of fully and partially annotated training data. - Evaluation: Evaluate the model using metrics like accuracy, sensitivity, specificity, and the area under the ROC curve.

**Secondary goal: Incorporating Radiology Reports** - Self-supervised Learning: Use information extracted from radiology reports (such as lesion type, location, size) to guide the training of the image-based model, providing it with context and helping it focus on regions of interest. - Extension & Integration: Extend the baseline model to incorporate insights gained from the diagnostic reports. - Evaluation: Assess the integrated model on a separate test set to ensure its generalizability and compare it to the performance of the purely image-based model.

### Data

For this project we will have three dataset available, one from Radboudumc, while the other two publicly available. Those resulting into 42,128 lesions from 8,770 patients.

### Requirements

- Students with a major in computer science, biomedical engineering, artificial intelligence, physics, or a related area in the final stage of master level studies are invited to apply.
- Affinity with programming in Python, specifically in the PyTorch framework.
- Interest in deep learning and medical image analysis.

## Phonological scoring of non-standard speech

**Supervisor**: Dr. Michael Tangermann

### Context

While deep learning approaches have significantly pushed language applications like keyword spotting in audio recordings, natural language processing on text documents etc., these solutions for the masses can not applied directly to non-standard speech signals, e.g., audio recordings from patients with language production deficits after stroke (aphasia).

Building upon an existing keyword spotting solution for such aphasic speech recordings, the student’s task is to develop and evaluate a machine learning approach capable to deliver phonological scores of speech, which can enrich or even replace the scoring of an expert. Emphasis will be on the sample efficient use of patient’s speech data, e.g., by using pre-trained models.

### Research question

How can existing pre-trained language models be modified to serve for the scoring of non-standard speech of patients with language disorders?

### Skills / background required

- Very proficient in Python
- Machine learning, with rich hands-on experience in deep learning, ideally on natural speech data
- Experience in using a compute cluster

## Channel set invariance for neural networks

**Supervisors**: Pierre Guetschel and Dr. Michael Tangermann

### Problem

Despite the well-established standards for EEG electrode layout like the 10-10 montage, EEG recordings obtained in different labs or across studies are not straightforward to compare. There are usually important differences between the datasets, but also within them. These differences can be:

- the use of different channel sets or sensor placement routines
- different noise distributions, or changes of the noise structure over time
- temporarily unusable channels (electrical connection lost, faulty wire, …)

For this reason, spatial filters can hardly be re-used between sessions, let alone datasets, which is an obstacle for transfer learning approaches in EEG decoding problems. The existing solutions to do transfer learning with heterogeneous channel sets are:

- train a model on the common channel subset → restrictive, if a new recording has a faulty channel, as the whole model has to be re-trained without this channel
- interpolate the missing channels → obtained full channel set may not be full rank any more
- source reconstruction techniques → assumptions need to be made for source reconstruction, their choice is non-trivial

### Objective

The objective of this project is to help in the development of a neural network architecture invariant to the set of channels used. Such an architecture would allow to realize an end-to-end learning approach across multiple datasets with heterogeneous channel sets. Ideally, this architecture would be able to:

- handle EEG signals containing an arbitrary number of channels
- produce results independently of the channels used
- reach the same performances as non-channel independent architectures (like EEGNet) when the channel set is fixed / complete
- seamlessly ignore corrupted channels, and degrade gracefully
- cope in real time with potential noise distribution shifts

### Skills required

- Good programming skills in Python
- Experience with the pytorch library
*Optional:*experience with deploying pipelines on GPU clusters*Optional:*experience with EEG/BCI data

## Adaptation Strategies for Block-Toeplitz Regularized Linear Discriminant Analysis

**Supervisor**: Dr. Michael Tangermann

### Project Description

While most machine learning methods make the assumption that data is i.i.d., the brain signal features collected within a brain-computer interface (BCI) experiment and even over multiple BCI sessions typically change over time. This non-stationarity is a substantial problem for a pre-trained, but otherwise fixed classification model. In patient training for rehabilitation purposes, e.g., after a stroke has induced language deficits, changes of brain signal features over time should not be considered a problem. Instead, they are desirable, as this form of non-stationarity may reflect the expected training-induced effect caused by, e.g., a more efficient use of the spared brain networks, more efficient cognitive or behavioural strategies of the patient. Recently we have shown how an auditory BCI protocol can provide a successful language training with medium to large effect size for chronic stroke patients with aphasia [1]. This BCI-based rehabilitation protocol makes use of a linear classification model to discriminate between two classes of auditory evoked responses, so-called target and non-target word-induced event-related potentials (ERPs). This system made use of a shrinkage-regularized linear discriminant analysis (sLDA) model for classification, and non-stationarity was compensated for by supervised adaptation of the class means and the covariance matrix.

[1] Musso et al. (2022). Aphasia recovery by language training using a brain–computer interface: a proof-of-concept study. Brain Communications, Volume 4, Issue 1, fcac008, https://doi.org/10.1093/braincomms/fcac008

### Task

Very recent research has shown, that the performance of the sLDA can be improved by different regularization approaches, block-Toeplitz regularization with tapering of the covariance matrix [2] being one of them. As the resulting LDA classification model is capable to learn equally well also from smaller datasets, the adaption procedure should be adapted to optimize the classification performance under non-stationary conditions. This thesis investigates the computational costs of covariance updates and simulates different updating / adaptation strategies based on existing ERP-BCI data with the goal to provide an efficient adaptation strategy for future patients with aphasia who train with the BCI-based rehabilitation system.

[2] Sosulski, J., & Tangermann, M. (2022). Introducing block-Toeplitz covariance matrices to remaster linear discriminant analysis for event-related potential brain–computer interfaces. Journal of Neural Engineering, 19(6), 066001. https://doi.org/10.1088/1741-2552/ac9c98

### Skills required

- Strong mathematical background and intuition, specifically on linear algebra
- Machine learning
- Good programming skills in Python (familiarity with numpy, sklearn)
- Experience in ERP data analysis (e.g., by accomplished course SOW-BKI 323 or SOW-MKI 74)

## Robust (causal) inference using mathematical models of dynamical systems in biology

**Supervisors**: Dr. Tom Claassen and Dr. Inge Wortel

### Project Description

In this interdisciplinary project, you will combine elements of causal inference theory and mathematical biology to help improve our understanding of dynamical systems.

From physics and engineering to ecology, virology, epidemiology, and biochemistry: many scientific fields use mathematical models to reason about dynamical processes. This is especially important for systems in which multiple entities interact. Such systems rapidly become too complex for us to intuit outcomes based on qualitative assumptions alone; and good models are crucial to develop hypotheses, make predictions, or reliably calculate key system parameters from available data. A good example can be found in the models of viral dynamics developed in the 1990s and 2000s [1]. These models, which are similar to the familiar predator-prey models in biology, were incredibly impactful because they helped understand the complex interplay between HIV and the immune system, explained why HIV patients rapidly became resistant to drugs and why they needed life-long therapy, and inspired better treatment strategies.

But despite the benefits of this approach, it also has its limitations. One important problem was discovered about ten years ago, when it was shown that seemingly minor extensions to some models could completely flip their (causal) implications. This has raised concerns about the validity of our interpretations: if very similar models can lead to opposite conclusions, how do we know which of our modelling predictions are actually true (or rather ‘robust’) [2]?

In this project we want to tackle this issue by building on promising recent work [3], which proposed ways to detect cases where model extensions are robust in their predictions – and tools to select the best model in cases where they are not. Specifically, you will use available models for viral dynamics to simulate data and try to answer one or more of the following research questions:

- Can we use the insights from [3] to select between different infection models based on data?
- Can we recognize when our (causal) model interpretations are robust?
- Given a dataset and a set of models with competing predictions, to what extent can we reduce uncertainty in predictions?

### References

[1] Perelson (2002). Modelling viral and immune system dynamics. Nature Reviews Immunology, https://www.nature.com/articles/nri700

[2] de Boer (2012). Which of our modeling predictions are robust? PLOS Computational Biology, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002593#s7

[3] Blom and Mooij (2022). Robustness of model predictions under extension. Conference on Uncertainty in Artificial Intelligence, https://arxiv.org/pdf/2012.04723v2.pdf

## Robust Reinforcement Learning via Uncertainty Quantification

**Supervisors**: Marnix Suilen, Dr. Nils Jansen, and Dr. Jurriaan Rot

### Introduction

Markov decision processes (MDPs) are the standard model for decision-making under uncertainty [6]. The uncertainty in an MDP stems from the probability distributions that govern which successor state the agent reaches after making a decision. When these probability distributions are unknown, the well-known machine-learning technique of reinforcement learning (RL) can be used to explore the problem and learn a decision-making policy from collected data [8].

Robust MDPs extend the MDP framework by instead allowing for sets of probability distributions [5, 9]. In this case, the agent does not know with what probability a successor state is chosen exactly but only has knowledge of the bounds on this probability. As such, the agent has to reason over the best-case or worst-case distribution in this set. Under certain structural assumptions on these uncertainty sets, efficient, robust dynamic programming solutions exist for robust MDPs [3]. Furthermore, robust MDPs have been used in RL as intermediate models in the learning process, where the uncertainty sets represent a form of confidence around an estimated probability distribution [4, 7]. Robust dynamic programming is then used on these intermediate models to find optimistic policies that encourage exploration.

Figure 1: A robust MDP with interval uncertainty sets also known as an interval MDP (left). After choosing an action, the successor state is determined by a probability distribution with probabilities inside the given intervals. The set of all probability distributions at such a state-action pair in an interval MDP is described by a convex polytope (right).

### Project Goal

This project is all about finding a way to quantify the amount of uncertainty in a robust MDP, as a means to monitor the learning progress of RL algorithms. Robust MDPs can be seen as an uncountable set of MDPs that differ in their transition probabilities. The natural interpretation is that the larger the uncertainty sets, the more uncertainty there is ‘in’ the robust MDP. Hence, we want to measure the size of this uncertainty.

### Approach

To quantify the uncertainty in robust MDPs, we aim to lift distance metrics between two MDPs [2] to a set of MDPs. These distance metrics may then be computed by standard algorithms, or, more interestingly, we attempt to use deep RL on a new MDP that precisely captures the distance metric [1].

Concretely, you will be working on:

- Defining a measure for the amount of uncertainty in robust MDPs by studying literature on similarity measures between MDPs from the AI and machine learning communities.
- Design and implement an approach to compute this distance via deep RL.
- An experimental evaluation that shows the effectiveness of the approach in the context of a standard RL algorithm using robust MDPs.

### Embedding of the fellowship

The student working on this project will, besides ELLIS, be embedded in the daily activities of the research groups of Nils Jansen and Jurriaan Rot. The topic is part of the ERC starting grant “DEUCE: Data-Driven Verification and Learning Under Uncertainty”, and the student would directly join the ongoing discussions with Ph.D. student Marnix Suilen. We conduct weekly meetings with the whole team, and facilitate an open and cooperative research culture, where students are directly involved. There will be the possibility and funding to visit at least one workshop or conference on related topics to interact with top researchers in the field.

### References

[1] Norman Ferns and Doina Precup. Bisimulation metrics are optimal value functions. In UAI, pages 210–219. AUAI Press, 2014.

[2] Javier Garciıa, A ́lvaro Visus, and Fernando Fernandez. A taxonomy for similarity metrics etween markov decision processes. Mach. Learn., 111(11):4217–4247, 2022.

[3] Garud N. Iyengar. Robust dynamic programming. Math. Oper. Res., 30(2):257–280, 2005.

[4] Thomas Jaksch, Ronald Ortner, and Peter Auer. Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res., 11:1563–1600, 2010.

[5] Arnab Nilim and Laurent El Ghaoui. Robust control of markov decision processes with uncertain transition matrices. Oper. Res., 53(5):780–798, 2005.

[6] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dy- namic Programming. Wiley Series in Probability and Statistics. Wiley, 1994.

[7] Marnix Suilen, Thiago D. Simao, David Parker, and Nils Jansen. Robust anytime learning of markov decision processes. In NeurIPS, 2022.

[8] Richard S. Sutton and Andrew G. Barto. Reinforcement learning - an in- troduction. Adaptive computation and machine learning. MIT Press, 1998.

[9] Wolfram Wiesemann, Daniel Kuhn, and Ber ̧c Rustem. Robust markov de- cision processes. Math. Oper. Res., 38(1):153–183, 2013.

## Shielded Reinforcement Learning under Delayed Observations

**Supervisors**: Merlijn Krale, Dr. Thiago D. Simão and Dr. Nils Jansen

### Motivation

Reinforcement learning (RL) has become a widespread tool for solving complex sequential decision-making problems: RL agents can currently beat humans in games ranging from Stratego to Dota 2 [1,2], and are even getting used for self-driving cars [3].

However, RL is often unsafe in the sense that agents may take harmful actions while exploring. This limits how RL can be deployed in real-life situations: a self-driving car, for example, should have some guarantees on how it will behave in safety-critical situations, even while it is still learning.

### Challenges

One way of alleviating this problem is via shields which prevent the agent from taking unsafe actions [4]. By only intervening in safety-critical situations, shields allow for strict safety specifications while the advantages of RL can still be leveraged. Shielding has been well-studied in fully observable environments[4,5,6], but in real life, agents must often rely on delayed and partial observations. For example, the sensors of a self-driving car may be slow and error-prone, or an automated stock trader may have to act on partial and delayed knowledge about the stock market.

### Existing Methods

Shielding under delayed and partial observations has been studied separately. [7] constructs a shield for delayed observations by solving a delayed two-player game [8]. However, such games are hard to solve for long delays and still assume full observability of previous states. In contrast, [5] represent the problem as a partially observable Markov decision process (POMDP) and create shields based on the support of the belief space. However, this method does not take delayed observations into account.

### Goal

This project aims to create shields that account for both delayed and partial observability. A potential approach is to rely on belief support (like in [5]) but update this support with the knowledge that observations are delayed. Part of the challenge is to determine how such updates can be performed efficiently.

Main Tasks

- Review literature and formulate the problem statement;
- Design a method for shielding under partial and delayed observations;
- Implement the method in a reusable manner;
- Design experiments to evaluate the new methods; and
- Write a paper reporting the main findings.

### Embedding

This project will be conducted at the Department of Software Science (SWS) as part of the LAVA-LAB research group (see https://lava-lab.org/). The student is encouraged to actively participate in this group (including weekly meetings with the whole team) and will be able to work with multiple Ph.D. students currently researching related topics. The project is set up in collaboration with Thiago D. Simão (Eindhoven University of Technology), and may involve a number of (funded) research visits to his group. Furthermore, funding is available for visiting at least one workshop or conference on a related topic to interact with top researchers in the field.

### References

[1] C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. de Oliveira Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, and S. Zhang. Dota 2 with large scale deep reinforcement learning. CoRR, abs/1912.06680, 2019.

[2] J. Pérolat, B. D. Vylder, D. Hennes, E. Tarassov, F. Strub, V. de Boer, P. Muller, J. T. Connor, N. Burch, T. W. Anthony, S. McAleer, R. Elie, S. H. Cen, Z. Wang, A. Gruslys, A. Malysheva, M. Khan, S. Ozair, F. Timbers, T. Pohlen, T. Eccles, M. Rowland, M. Lanctot, J. Lespiau, B. Piot, S. Omidshafiei, E. Lockhart, L. Sifre, N. Beauguerlange, R. Munos, D. Silver, S. Singh, D. Hassabis, and K. Tuyls. Mastering the game of stratego with model-free multiagent reinforcement learning. CoRR, abs/2206.15378, 2022.

[3] R. Chopra and S. S. Roy. End-to-end reinforcement learning for self-driving car. In B. Pati, C. R. Panigrahi, R. Buyya, and K.-C. Li, editors, Advanced Computing and Intelligent Engineering, pages 53–61, Singapore, 2020. Springer Singapore. ISBN 978-981-15-1081-6.

[4] M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu. Safe reinforcement learning via shielding. In AAAI, pages 2669–2678. AAAI Press, 2018.

[5] Jansen, B. Könighofer, S. Junges, A. Serban, and R. Bloem. Safe reinforcement learning using probabilistic shields (invited paper). In CONCUR, volume 171 of LIPIcs, pages 3:1–3:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020

[6] W. Yang, G. Marra, G. Rens, and L. D. Raedt. Safe reinforcement learning via probabilistic logic shields. In IJCAI, pages 5739–5749. ijcai.org, 2023.

[7] F. C. Córdoba, A. Palmisano, M. Fränzle, R. Bloem, and B. Könighofer. Safety shielding under delayed observation. In ICAPS, pages 80–85, 2023.

[8] M. Chen, M. Fränzle, Y. Li, P. N. Mosaad, and N. Zhan. What’s to come is still unsure - synthesizing controllers resilient to delayed interaction. In ATVA, volume 11138 of Lecture Notes in Computer Science, pages 56–74. Springer, 2018.

## Deep Learning-Based Control for Stochastic Dynamical Systems

**Supervisors**: Thom Badings and Dr. Nils Jansen

### Motivation

Autonomous systems are increasingly deployed in safety-critical settings. These systems must accomplish their task safely without the intervention of a human operator. For example, consider an unmanned aerial vehicle (UAV), whose task is to deliver a package to a designated target area. The UAV should not crash into obstacles or run out of battery, while operating in an environment that is inherently uncertain. In other words, the UAV must behave safely, despite uncertainty about the UAV's dynamics and the environment. How can we design a policy (a.k.a. controller) for the UAV that guarantees the satisfaction of a given control problem, despite various sources of uncertainty?

### Challenge

Autonomous systems, such as this UAV, can be modeled as stochastic dynamical systems, which can be seen as continuous-state/action Markov decision processes (MDPs). Because of the continuous nature of these models, computing a policy that provably satisfies a given complex control task is very difficult. Thus, the challenge is to develop novel control design methods that can solve complex control tasks in continuous spaces despite various sources of uncertainty. One promising solution is to use learning-based methods from machine learning to learn policies from data (Chang et al., 2019). Such methods have shown enormous potential to find policies that solve complex and nonlinear control tasks, for example, using reinforcement learning (Sutton and Barton, 2018). However, verifying that these policies satisfy certain performance specifications is extremely difficult, especially when neural networks are involved. In particular, the combination of deep learning and verification for solving stochastic control problems is a particularly understudied research area.

### Goal

The main goal of this project is to study novel ways in which deep learning and verification can be combined to solve complex control tasks in continuous spaces and under stochastic uncertainty. In particular, the project focuses on leveraging deep learning methods (in particular neural networks) as mechanisms to guess (or learn) policies, while using verification to formally verify the performance of those policies.

### Possible solutions

A common approach to solve complex control tasks for stochastic dynamical systems is to create an abstraction of the continuous dynamics into a finite MDP (Badings et al., 2022). One idea, inspired by Abate et al. (2022) is to use neural networks to learn better abstractions of stochastic dynamical systems. Another idea is to use neural networks to learn so-called certificate functions (such as Lyapunov functions), which are a classical method to verify the performance of a dynamical system. However finding such certificate functions is extremely hard in practice, which makes the use of learning methods promising (Lechner et al., 2022).

### Main tasks

The tasks related to this project are on the intersection between machine learning, control theory, and probabilistic verification. While prior knowledge in control and verification is not a prerequisite, we do look for a student who is interested in learning more about these fields. The main tasks involved in this project are:

- Review literature and formulate the problem statement.
- Develop and implement new methods that alleviate the limitations of existing methods.
- Design experiments to evaluate the new methods.
- Write a paper with the main conclusions.

### Embedding of the fellowship

The student working on this project will, besides ELLIS, be embedded in the research group of Nils Jansen. The student would directly join the ongoing discussions with Thom Badings and other PhD students in the group. We conduct weekly meetings with the whole team, and facilitate an open and cooperative research culture, where also students are directly involved. There will be the possibility and funding to visit at least one workshop or conference on related topics to interact with the top researchers in the field.

### Learn more

To learn more about abstraction for stochastic dynamical systems, we suggest browsing the paper Badings et al. (2022). To learn more about neural certificates, you can look at Lechner et al. (2022). In case of any questions, you can reach out to Thom Badings.

### References

A. Abate, A. Edwards, and M. Giacobbe. Neural abstractions. In NeurIPS, 2022.

T. S. Badings, L. Romao, A. Abate, D. Parker, H. A. Poonawala, M. Stoelinga, and N. Jansen. Robust control for dynamical systems with non-gaussian noise via formal abstractions. JAIR, 2022.

Y. Chang, N. Roohi, and S. Gao. Neural lyapunov control. In NeurIPS, pages 3240–3249, 2019.

M. Lechner, D. Zikelic, K. Chatterjee, and T. A. Henzinger. Stability verification in stochastic control systems via neural network supermartingales. In AAAI, pages 7326–7336. AAAI Press, 2022.

R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT press, 2018.