What data should be archived
This webpage provides information on the following aspects of data archiving:
- Two perspectives: scientific integrity and reuse of data
- Minimum and maximum retention periods
- Do I actually have research data (to archive)?
Two perspectives
There are two considerations that help you decide what data you are going to archive at a minimum: do you want to archive your data for the sake of research integrity, or do you want to be able to share them with other researchers?
Perspective of scientific integrity | Perspective of reuse of data |
All raw, processed and analysed data | Final versions of analysed data If possible: also raw and processed data |
Documentation and/or codebooks necessary for understanding the data | Documentation and/or codebooks necessary for understanding the data |
Readme.txt file for understanding the structure and content of the deposit | Readme.txt file for understanding the structure and content of the deposit |
Informed consents forms and information brochure | Empty copy of consent form and a copy of information brochure |
Approval by an ethics committee | |
Data management plan | |
Audit trails1 |
- Raw data are the original data that you have collected but have not yet processed or analysed. For instance: audio files, archives, observations, field notes and data from experiments. Data you have not collected yourself and that your are reusing, may be considered raw data
- Processed data are the data that you have digitised, translated, transcribed, cleaned, validated, checked and/or anonymised
- Analysed data are the models, graphs, tables, texts and so on that you have created based on the raw and the processed data, and that are intended to aid in the discovery of useful information, the presentation of conclusions, and decision-making
On the following page, you can read more about documentation, codebooks, and other files that can facilitate understanding of the data over the long term.
To decide whether or not a data file should be archived, the following aspects may be considered:
- How significant is the file for research?
- Is the information unique?
- How useable is the file?
- Is the file related to other permanent files?
- What is the timeframe covered by the information?
- How much will it cost to maintain the files in perpetuity?
1 An audit trail is a transparent description of the steps taken from the start of a research project to the development and reporting of findings.
Minimum and maximum retention periods
According to Radboud University's research data management policy, the minimum retention period for archiving research data is 10 years. In its policy, a research institute may opt for a longer minimum.
Radboud University's policy does not lay down a maximum retention period - but a particular research institute may choose to do so.
If such a maximum does apply, any dataset archived for longer than the maximum will be subject to deletion. It is best to discuss this prospect with the archive in question well in advance to the end of the period. Most data archives strive to retain data over the long term, and those that do may have policies in place prohibiting the deletion of data.
Keep in mind that a dataset may have been cited by others. Therefore, the metadata of that dataset must be preserved and accessible, and there should be indicated why any files in the dataset are subject to deletion or have been deleted, as the case may be.
Do I actually have research data?
From the perspective of some disciplines, you may wonder whether your research actually involves research data. Research data is all information, digital and non-digital, generated as part of the scientific process, on which scientific conclusions are based.
Usually, every researcher presenting original research has research data. Original research is research that is not exclusively based on summary, review, or interpretation of earlier publications.
In some disciplines, data is a pretty straightforward concept, such as survey data, interview transcripts and statistical data; or observational data (observation of phenomena in their natural setting), experimental data (active intervention by the researcher), or stimulations (imitating a real-world process using computer models). As a researcher, you need to make sure that you archive these data for the long term as well. As explained below, for other types of data this is however less clear.
Data as part of a publication
If your data is part of a publication (e.g. in a table or as reference list to primary sources), it might feel there is nothing to archive for the long term in a data repository.
However, data repositories are often a better place for sustainable archiving than journal platforms, especially when those data repositories are free of costs and openly accessible (which journals are often not). There are cases in which extensive, structured reference lists become valuable databases on their own, and could or should be archived in a data repository.
Secondary data
If your data entails scans or transcripts of archival documents, these data are often owned by someone else. This can also be the case when you reuse publicly available secondary data such as statistical data (however, this depends on the license). At a minimum, you should refer to the original source, including an identifier to the data. Additionally, make agreements on long term archiving of data, if relevant including making data publicly available, as public data is in the interest of everyone. Make sure not to forget to archive and (if allowed) make publicly available any of the derived data, such as analysis schemes, scripts and codebooks.
Annotations
Annotations of texts might become databases on their own, and could/ should be archived in a data repository, with a reference to the original source. If those annotations include the original text, make sure to check copyright issues with the owner.
A specific type of publication
There is also research output (secondary research) that does not involve research data at all, such as:
-
Review publications, which are generally based on existing published papers and do not report any original research done by the authors of the review paper.
-
Perspective papers, that usually present a personal point of view on fundamental concepts or prevalent ideas in a field.
-
Opinion papers, presenting the personal point of view of the author on the methods, outcomes or interpretation of a single study.
-
Commentaries, which are usually short publications intended to draw readers attention to a previously published report.