During your research, it is useful to document your data according to the FAIR principles, so that you work transparently and so that the data can be understood by yourself and others in the future. Adding documentation is an ongoing process, where you can describe the whole project: from planning your research to completing your research.
Planning your research
Documentation on practical issues can be added at the very beginning of the project and can serve as a useful reference document throughout the project. For example, agreements or decisions on file names, structuring folders, versioning or team workflow can be useful throughout the project.
During your research
During collecting data and working on your data is the perfect time to add most of your documentation. For example record concepts, variables and codes in your dataset in detail in a codebook or add notes to your literature review. In a readme you can explain how you transformed data: did you transcribe, did you recode, aggregate (of datasets, data or variables), how did you deal with missing variables, did you anonymise the data?
Completing the research
When archiving and/or publishing your data, include documentation that explains the context behind the dataset and contains information about how the research was conducted. By doing this, you ensure that your data is understandable, verified and can be reproduced if necessary. You can do this by adding sufficient metadata, but also by adding documentation on the history of the project, its objectives, hypotheses, methods, etc. Combine this with the documentation you wrote in the previous phases of your project.
Documentation requirements
The documentation of your research should anyway consist of:
- A readme file: this is the first file users need to open and should therefore contain everything they need to know. It describes the context, content and structure of the dataset or clearly refers to other documentation files that do so, such as a codebook.
- Metadata: make full use of the metadata fields found in most archives and repositories. Some metadata fields such as title, descriptions and keywords may come to mind quickly, but language, time period and location of the dataset are also very useful for users.