Skip to Main Content

Data Management and Sharing Plans

This guide will take you through each of the elements you need to consider when creating a data management and sharing plan

Data Documentation

Data documentation, also known as metadata, helps you understand your data in detail, and also helps other researchers find, use, and properly cite your data.

Various metadata standards are available for particular file formats and disciplines. A ReadMe.txt file, a Codebook, or Coding Manual should be created to accompany your data files. As you collect or create your data, you want to capture the following information:

  • Make a note of all file names and formats associated with the project, how the data is organized, how the data was generated (including any equipment or software used), and information about how the data has been altered or processed.
  • Include an explanation of codes, abbreviations, or variables used in the data or in the file naming structure.
  • Keep notes about where you got the data so that you and others can find it.

Things to document about your data:

  • Title: Name of the dataset or research project that produced it
  • Creator: Names and addresses of the organization or people who created the data
  • Identifier: Number used to identify the data, even if it is just an internal project reference number
  • Dates: Key dates associated with the data, including project start and end date, data modification data release date, and time period covered by the data
  • Subject: Keywords or phrases describing the subject or content of the data
  • Funders: Organizations or agencies who funded the research
  • Rights: Any known intellectual property rights held for the data
  • Language: Language(s) of the intellectual content of the resource, when applicable
  • Location: Where the data relates to a physical location, record information about its spatial coverage
  • Methodology: How the data was generated, including equipment or software used, experimental protocol, other things you might include in a lab notebook

Data Documentation Resources

Data dictionaries, ReadMe.txt, and Codebooks are all ideal ways of documenting your data. ReadMe.txt files provide information about your data files and help ensure that your data files can always be correctly interpreted by anyone using them. Data dictionaries are often used to describe each element of your dataset - the variable names and values in your spreadsheets. Codebooks are more detailed than data dictionaries and might include information that is in a ReadMe.txt file, as well as describing elements of your dataset, and the instruments used to gather the data (surveys, interview questions). 

The following resources provide additional information on how to create these documents:

Data Organization

Data organization includes having a consistent folder and file structure, along with using sustainable file formats and having an established file naming convention. If you need a refresher on file naming conventions and sustainable file formats please visit the previous page: Data File Naming & Management.

File Structure

When organizing your data you want to use a consistent file structure so that you'll always be able to find your files. Your ReadMe.txt file should record the file structure you decide on for your project in additional to your other data documentation (file name conventions, abbreviations, variables, etc). The ReadMe.txt file should be located at the very top of your file structure hierarchy so it is easy to locate. 

Create separate folders for your raw data, processed data, code and outputs, and documentation to avoid confusion. All file names should follow the file naming convention you have established. 

File Structure Resources

There are several protocols that can be followed for structuring your files. 

  • TIER Protocol from Open Science Framework. The OSF includes a clonable template of their TIER Protocol that can create a new hierarchy of folders to match their specifications. It works well with OSF in addition to GitHub, Google Drive, and DropBox.
  • Reproducible Science template is a more complicated file structure that follows Cookiecutter Data Science. It provides a standardized, flexible project structure for conducting and sharing data.