The Donders Institute (DI) Network’s Policy for Research Data and Software Management (hereafter referred to as this policy or RDSM policy) unifies and expands on the primary research data management policies of Radboud University, Radboudumc, and the Max Planck Society (MPG). Next to presenting the policy and providing guidelines, this document also describes the ambitions regarding data and software management, and open science of the DI Network for the period 2025-2030. The partners within the DI Network will use this policy and their primary institutional policy to shape their internal RDSM procedures. The policy addresses all forms of research output, including both research data and software.
Research Data Management (RDM) is an important aspect in research to ensure that research data is safe and secure and remains valuable for the long-term.
Download the Donders Institute Network Policy for Research Data and Software Management
You can find all the information of the RDSM policy on this page or in the document you can download here.
This policy addresses all forms of research output, including both research data and software.
The DI Network aims to support sustainable research, both on the societal as well as the environmental level. Therefore, our RDSM policy aims to enhance societal trust in science in the long term, by avoiding mishandling of data or misconduct in scientific work. In addition, we practice the open science standards that have been adopted by research institutes all over the world. By applying FAIR open science practices we ensure access by researchers from organisations that do not have access to state-of-the art facilities. Further, our RDSM policy also recognizes the environmental footprint of data storage. Partners will therefore take this into consideration when drafting and implementing the sections of their policy and score of data archiving.
Putting this RDSM policy into practice comes with responsibilities for all layers of the Donders Institute Network, e.g. management, partners, individual researchers and support staff. This requires investments in RDSM resources, infrastructure and time across all these layers. In addition, it is recommended that these investments are aligned via the Donders Board and/or at the partner level via the partner management and/or the partner data stewards.
For whom this policy is intended
This policy applies to researchers, lab managers, research assistants, technical and administrative personnel and all designated staff who conduct studies and collect and/or process research data and/or create research software within the DI Network (hereafter referred to as researcher).
Research data and research software
The definition of research data includes all information, digital and non-digital, that was generated during the research process and used to form scientific conclusions. This encompasses information that is used for forming and testing hypotheses as well as information upon which conclusions are based. If the information contributes to results in a scientific report, it is research data. In the DI Network, typical data can take the form of:
- Empirical data (observational and experimental data, for example behavioural, physiological, neural data)
- Transcripts
- Methodologies
- Models
- Simulated data
- Observation reports
- Stimuli for experiments
- Structured literature searches
- Audio/video recordings
- Questionnaires
Generated research software also falls within the scope of this policy. The definition of research software includes software that is created during research, as scientific output of a specific study, or for a research purpose in general. Code and software from third parties that are used in the research process do not fall under this definition.
In the DI Network, typical software can take the form of:
- Analysis code
- Code for conducting an experiment
- Code for web-applications
- Compiled applications including the source code
Research data and software management
Research data and software management (RDSM) is the management of research data and software along their respective life cycles (see figures). RDSM includes a variety of activities such as planning, collecting, processing, analysing, archiving, and publishing. Research Software often plays a central role in generating and analysing data. Therefore, good management of research software is as important. Research software differs from research data in some respects. For example, unlike data that can be archived and published, it needs to be regularly maintained and updated. The figure below highlights the similarities and differences between research data and software lifecycles.
Having good RDSM has many advantages. It allows the researcher to make informed decisions about the research data and software and helps to keep the research project organised. This means that the researcher can keep the data and software safe, prevent unauthorized access, and avoid data or software loss. Moreover, the researcher can save time, conform to standards of scientific integrity, adhere to open science principles, and facilitate the reuse of data or software.
FAIR
FAIR stands for Findable, Accessible, Interoperable, and Reusable. The FAIR principles are key guiding principles to achieve good RDSM. All partners in the DI Network strongly adhere to the FAIR principles.
Findable
(Meta)data should be discoverable by humans and machines. To a large extent, this is achieved by placing (meta)data in a trusted digital repository. The data are referenced with a persistent identifier (e.g. DOI or handle) and the metadata include, among other things, the identifier of the data.
Accessible
It should be clear how humans and machines can gain access to data. Access should be possible using open and free protocols such as the internet. Metadata should be accessible, even when the data are not.
Interoperable
All research data must be stored and shared in generally accessible, open file formats to ensure long-term usability and exchange across different platforms. File formats that only work with certain software may only be used if no open alternative exists and, in such cases, a widely accepted open format conversion must be provided whenever possible. Researchers are responsible for selecting formats that support interoperability and reusability, ensuring long-term accessibility across different systems and programs. Researchers are strongly encouraged to populate their datasets with standardized and controlled vocabularies and ontologies relevant to their field (e.g., MeSH terms, ICD disease codes, etc.), as this facilitates the integration of multiple datasets. Additionally, it is important to specify in the metadata which vocabulary was used.
Reusable
Research data should be reusable by others. Therefore, documentation on the content, context, and structure of the data should be included. This also includes information about where the data comes from and how it has been handled or changed over time, by who, and for what purposes. In addition, a (machine-readable) license or data use agreement that clearly describes the terms and conditions for the (re)use of the data should be included.
Open Science
Open Science is a movement in which academic outputs such as data and publications are made available as openly as possible, encouraging reuse. Open Science is becoming more prominent in research. By practicing Open Science, research becomes more reliable and transparent. Good RDSM and applying the FAIR principles can contribute to Open Science. Researchers are therefore encouraged to follow associated practices from an early stage in the research data lifecycle.
Roles and responsibilities
Detailed roles and responsibilities in research data and software management are outlined in the organisational policy and/or partner-specific procedures. In general, the responsibility is shared between the Director of Research, the data steward, and the researcher.
The current Director of Research of the partner is responsible for the development, implementation, and compliance of this policy and other applicable organisational RDSM policies.
The data steward or other RDSM professional (in this document further referred to as data steward) assists in the compliance with the RDSM Policy within the institute and monitors this compliance within the institute on a regular basis. At the partner level, additional tasks of the data steward can differ.
All researchers are responsible for effective and accurate data management, and consequently adherence to policy and procedures that apply to the partner where they perform their research activities. For PhD candidates or trainees, additional regulations may apply regarding research data that are collected for a dissertation or thesis. The responsibility of adhering to such regulations is shared with their supervisor(s).
Promoting compliance with policy
At the partner level, the data steward monitors compliance with this policy and other applicable RDSM policies and procedures. Monitoring may be set up in different ways. Examples of monitoring include reviewing data management plans, randomly selecting a subset of studies and checking the compliance with appropriate storage locations, checking the usage of repositories for archiving and publishing, etc. In case of non-compliance, the data steward first informs the responsible researcher(s) and advises them on the actions that are necessary to address this non-compliance. If non-compliance persists, the data steward informs the Director of Research.
Moreover, and given that RDSM is a collaborative effort where multiple perspectives are brought together, the data stewards will seek the feedback of researchers on the feasibility and clarity of the RDSM guidelines and how it aligns with their work processes as well as on areas that require attention or can be improved. The end goal is that RDSM processes are integrated within work processes instead of being added on top of them. Such an approach is paramount to promote and optimise responsible and effective RDSM.
RDSM in practice
Research planning and design
Taking proper RDSM into account during project design and initiation will help researchers in handling their data efficiently.
Control of research data
Generally, control of the research data lies with the institution or organisation that takes responsibility for the initiation, management, and/or financing of the research (e.g. the sponsor of the study), and whose research staff is involved in generating the research data. The controller is authorized to exercise rights to the research data and is accountable for the data. When research data are generated in collaborations between institutions or through external funding, other conditions on control over research data may apply. Therefore, it must be clearly agreed upon who controls the research data before a research project starts. If this is not clear to the researcher, they should contact the data steward.
Agreements
In the case of collaborations with external parties it may be that an agreement is needed to ensure data or software ownership, data protection and open science, proper acknowledgements, description of IP rights and protection of privacy of the participants. Data stewards must be contacted in case of a planned collaboration with external parties. In consultation with the applicable legal department and privacy officers it is assessed whether and/or which agreement is needed. Agreements can only be signed by persons that are authorized to do so (e.g. Director of Research or dean).
Ethics
Human participants
All research with human participants requires an independent ethical review. Research can only start after a positive decision of the ethics committee. Research performed at the DI Network is reviewed by various acknowledged, independent ethical committees. Research which falls under the remit of the Dutch law WMO is reviewed by the Medical Ethical Reviewing Committee Oost Nederland. Other research is reviewed by Ethics Committee of Social Sciences (DCC, DCCN, MPI), the Ethics Assessment Committee Humanities (CLS), or the Research Ethics Committee (DCN).
The informed consent procedure, which ensures that participants are properly informed and provide their consent to participation, is included within the review by the applicable ethical committee. The informed consent emphasizes voluntariness and describes the rights of the participants. In addition, participants are informed on the open science policy and how their privacy is protected. Finally, the importance of open science and sharing of data is explained and consented to. Therefore, before data collection from human participants begins, it must be determined under what access conditions data will be published so that research participants can be properly informed and consent to this. This is typically done within a Data Management Plan.
Animal research
For research conducted with animals, the researcher first needs to apply for a project license for the planned experiments from the Dierexperimenten Commissie (DEC), which is then evaluated/approved by the Centrale Commissie Dierproeven (CCD).
Personal data
Researchers may collect, analyse or otherwise process personal data. This refers to information that can either directly identify a living individual or which, when combined with other information, can be traced back to a living individual. Personal data can be present within the research data themselves, but also in research-related data. Whenever this is the case, the personal data in the research become subject to the General Data Protection Regulation (GDPR). Researchers in their role as controller, joint controller, or processor of these personal data must comply with this regulation and are expected to take into account the consequences that this regulation has for their research throughout the research process, from research planning to data acquisition to data publication. Each partner has dedicated privacy support and procedures in place that facilitate achieving compliance.
Among others, there are three main guiding principles:
- Personal data can only be processed when a clearly defined legal basis exists, and participants are informed about their privacy rights.
- Researchers should follow data minimisation practices, meaning that they should only collect, store and share personal data that is required for the research project.
- Personal data are pseudonymised and anonymised to the extent possible. If this is not (fully) possible, e.g. because it would result in a loss of information or impact the usability of the data, the identifiable data must be archived or published under conditions specified in the informed consent.
If the processing of personal data is likely to entail a high privacy risk (e.g. because special categories of personal data or sensitive personal data are processed), it may be necessary to perform a Data Protection Impact Assessment (DPIA). Depending on the partner, the data steward or other privacy support staff can run a quick screening (pre-DPIA) to assess whether a full DPIA is required.
Data or Software Management Plan
To promote best practices in the management of research data and software, the DI Network strongly advises all researchers to develop a Data (or Software) Management Plan (DMP) for any project that involves the (re)use or generation of research data and software. A DMP ensures that data and software are well-organised, secure, and lead to FAIR outputs, thereby enhancing the quality and impact of the research. Furthermore, a DMP supports estimating RDSM-related financial (and environmental) costs and taking them into account when applying for funding. Evaluating the need for research ethics, privacy assessments, and (legal) agreements is also part of a DMP. Researchers are strongly advised to prepare a DMP before data collection or software generation and to update this plan during the research if necessary. DMPs are considered dynamic documents, meaning they should be updated whenever research plans change. Researchers are encouraged to discuss them with colleagues or supervisors. DMPs are often mandatory for projects receiving specific funding. Researchers are generally expected to use institutional templates and tools to develop their DMPs, aligning with funder and institutional requirements. Within the DI Network each partner provides their own procedures for writing and reviewing DMPs, guidance on templates and tools to use, and conditions under which providing a DMP is mandatory.
Data collection, processing and analysing
While research is being carried out, research data must be stored at a facility that is adequate in terms of availability (the data may not inadvertently be lost), integrity (the data may not inadvertently be modified), and confidentiality (the data may not inadvertently be made available to unauthorised persons).
Data are stored in such a way that they can be accessed by at least the main researcher and one other authorized member of the partner. This is to maintain, if needed, access to the data when the researcher is absent or leaves the partner.
Each partner will provide a detailed overview of appropriate storage locations via their internal communication channels.
Partners in the DI Network generally do not allow storing and sharing research data (in particular sensitive data allowing identification of human research participants) in personal (commercial) cloud drives or services, external (USB) drives, devices that are not institutionally managed, and private storage locations within the institutional infrastructure.
Collecting data
When research data are acquired using measurement devices in labs or at external locations, it is sometimes not possible to immediately store data in appropriate facilities. In this case, data should be transferred to secure storage locations without undue delay ensuring data security, prevention of data loss and privacy protection. Where necessary, partners provide tooling and guidance to facilitate transfer of acquired research data to adequate storage locations. In situations where an immediate transfer of acquired research data to an adequate storage location is not possible, the researcher must consult the data steward and/or other technical support staff to discuss secure temporary storage solutions.
Reusing existing data or code
Partners in the DI Network strongly encourage the reuse of existing datasets or code in new research. It is only allowed to reuse data or code for which a license or data sharing agreement is available. Researchers who plan to reuse existing data must be aware of and comply with the conditions under which the data or code are shared, and respect intellectual property rights. If such conditions are unclear, the researcher should try to clarify these with the controller or provider of the data. It is generally strongly recommended to obtain existing data through reliable sources, such as trusted repositories that assign a persistent identifier to a dataset and provide clear information regarding conditions under which the data can be used and how long the data will remain available.
Non-digital data
Any non-digital research data (e.g., hand taken notes, paper questionnaires) must be digitised as soon as possible. Paper originals must be securely stored in a locked cabinet. Each partner will define their specific preservation periods for the paper originals, after which they can be disposed of. This preservation period gives the researcher time to fix any issues, such as skewed, missing, or unreadable pages that may be present in the digitised version. If it is not possible to digitise the paper documents, then the researcher must store these in a locked cabinet where only authorised personnel can access them, preferably after consultation with the data steward.
Research-related data
Research-related data include but are not limited to administrative personal data that the researcher collects and cannot dispose of during the course of their research, for example filled in consent forms, applications and approval for ethics committees and grant providers, data management plans, reimbursement forms and encrypted key files that must be preserved for pseudonymized datasets. Such data do not qualify as research data, as they are not used to form scientific conclusions. Therefore, different storage and archiving procedures apply to such data. Research-related data must always be stored separately from the research data. Further partner-specific guidelines are communicated through the relevant channels of each DI Network partner.
Processing and analysing data
Data processing and analysis is the phase where research materials are examined to generate insights that result in the research findings. The instruments, methods, code, or visualizations used for analysis should be documented and need preservation for support of written or published research outputs in order to make data reusable.
Data analysis in the DI Network often requires computing power that cannot be provided by using the standard network devices. The DI Network therefore houses infrastructure that provides the necessary computing power for data analysis, such as the High-Performance Compute (HPC) Clusters at DCCN, MPI, and CLS. Alternatively, access to external compute infrastructure is available, such as DCC using the national supercomputer Snellius. Access to computing infrastructure is arranged at a partner level, but shared usage of computing resources takes place (e.g. DCC is partner within the HPC cluster).
Researchers should discuss with the data steward or other technical support staff if they are in need of additional computing power and/or storage space during the processing and analysis phase of their project.
Data archiving and publishing
The archiving and publication of research data underlying a scientific publication are relevant throughout the research process to ensure that data are preserved for the long-term, also after the research is finished. For the sake of transparency and scientific integrity it is vital that research data can be independently verified to exist and is described in sufficient detail to enable replication of the data collection and analysis. Archiving and publication are done through data archives or repositories. Within the DI Network the Radboud Data Repository or MPI for Psycholinguistics Archive are available for this. Other trusted repositories are also allowed. Researchers should follow their partner’s guidelines on which data archives are approved and recommended.
The following must at least be archived:
- the raw data in their original form (the unprocessed version of the data that has not been manipulated in a way that would limit further analyses).
- the code and/or a clear description supporting the path from raw data to results presented in a publication, and - where suitable - particular intermittent data representations.
All research data must be preserved for at least 10 years after the date of the scientific publication. After this initial preservation period, the value and use of the data can be evaluated and the preservation period extended if necessary. There could be ethical or legal obligations to initially choose a longer preservation period, such as 15 years (medical research) or 25 years (research with medication).
We strive towards open access publication of research data, as this introduces the fewest barriers and thus maximises the potential for reuse of the research data. However, if there are valid ethical concerns, privacy limitations, security issues, contractual agreements, or other serious grounds, additional restrictions may be placed on how open the data are made available. The general principle is that research data and software must be made available as open as possible, as closed as necessary.
When data are archived or published, it must meet the Findable, Accessible and Reusable criteria of the FAIR principles, and the Interoperable criterion should be applied as much as possible.
DI Network and RDSM
Data stewards in the network
Each partner in the DI Network has at least one data steward who represents the partner within the network. Specific tasks and time allocations of the data stewards are organised at a partner level. At the network level, data stewards team up and have regular meetings to share knowledge and experience and discuss opportunities to further develop RDSM within the DI Network. In addition, they organise activities creating awareness for RDSM within DI Network. Furthermore, DI Network data stewards closely work together with other research support staff (e.g. lab coordinators, IT staff) to optimize the use of shared facilities and resources in line with good RDSM practices.
RDSM in cross-partner collaborations
As part of its Research Strategy the different DI Network partners aim to collaborate with one another on interdisciplinary research areas across four different themes that range from neurobiology and genetics to language behaviour. In addition to the unique opportunities and insights that these collaborative interdisciplinary research endeavours would generate, they also bring a handful of challenges that require coordination in the area of RDSM and data and knowledge sharing among the partners.
In cross-partner collaborations in the DI Network, data sharing between partners is often required. As three legally separate entities are represented in the DI Network (RU, Radboudumc, MPI), this inevitably means that data sometimes need to be transferred or made accessible from one organisation to another. Therefore, several types of agreements exist that set out necessary arrangements for which data will be shared, how they can be processed, for how long, and what safeguards apply for data security and privacy (in case personal data are shared).
It is essential that researchers consult data stewards (and other relevant support channels) as early as possible when planning new research that involves multiple organisations (both within the DI Network and external organisations) so that the control of the data and the necessary legal arrangements can be clearly identified.
At the same time, DI Network data stewards together with the DI Network management and relevant support staff, such as privacy officers, will focus on taking away as many administrative barriers as possible to further enhance data sharing within the DI Network. This includes e.g. development of a clear procedure to set up such agreements, work on making standardized templates available that can be used within the network and looking into shared adequate storage facilities for research data.
Societal impact and public outreach
The DI Network is committed to increase societal impact and outreach by actively engaging with the public, increasing scientific literacy within society and societal understanding among scientific researchers. As part of this, the Donders CityLab has been established, providing researchers the opportunity to run micro-experiments that are both educational and entertaining for participants and provide an opportunity for rapid large-scale data collection for researchers. It is envisioned that larger CityLab experiments can take place in the future, which may bring new challenges for RDSM. DI Network data stewards actively support these new ways of data collection and the RDSM challenges that they may provide. Furthermore, data stewards will explore possibilities to connect to society about topics related to research data management (e.g. open science, data security, or privacy).
Citizen science
Citizen science is a part of open science that increasingly receives national and international recognition. In citizen science, citizens make an active contribution to scientific research, for example through formulating research questions, collecting data, or analysing data. Citizen science fits in the DI Network research strategy where we are increasingly prioritising societal impact of our research, allowing the public to take note of our research and actively participate in data collection. Citizen science brings new challenges in RDSM, privacy, and ethics. The DI Network data stewards are committed to following developments in citizen science and where necessary to be involved in initiatives arising within the network.