Costing data management

Creating, managing and sharing research data carries a cost, requiring the allocation of staff time and other resources. Generally, the costs of good data management are lower if performed on an ongoing basis, rather than applied retrospectively.

Researchers are encouraged to consider the resource implications of their research project and write these costs into research proposals and data management plans, at the earliest opportunity.

Data management will vary, depending upon the amount and type of digital data being handled, type of activities performed and resources required. In the planning phase:
  • Identify data management tasks in your project: common activities to prepare data for analysis and sharing include data cleansing, transcription, anonymisation, copyright clearance, documentation writing, and metadata creation.
  • Determine staff time that must be spent on the task: identify who will be responsible for each of the tasks and the total amount of time spent on the task.
  • Establish additional resources needed: data management activities require purchase of additional resources, either as a one-off payment or recurring cost.
Examples of additional research data management costs:
  • Extra storage space is needed during or after research.
  • Specific storage space is needed during or after research, for example refrigerators.
  • Expertise on for instance data management, ICT, legal issues, ethical issues and/or data security needs to be hired.
  • Specific software is needed to manage, secure, share or store the data.
  • It is time consuming to prepare your data (including transcriptions, proper metadata and documentation) for long-term storage.
  • You need extra help conducting and/or transcribing the interviews.
The UK Data Service has developed a simple activity-based costing tool (focus: social sciences).
Inspired by this tool, the National Coordination Point Research Data Management (LCRDM) developed its own data management costs guide, a practical overview of possible costs per activity within each phase of the data life cycle.