Skip to main content

Scholarly Communications

In 2003, ACRL defined scholarly communication as "the system through which research and other scholarly writings are created, evaluated for quality, disseminated to to the scholarly community, and preserved."

Best practices in data storage and preservation entail several factors:

1. Data Organization
2. Data Documentation
3. Data backup and security
4. Using appropriate file formats

Read more below on how to implement these criteria below. Read more on how to submit datasets for publication in WM Publish on our library data services page.

Data Organization

 

Keeping track of versions of documents and datasets is critical. Strategies include file version control:

Directory structure naming conventions: 

  • Directory top-level folder should include the project title, unique identifier, and date. 
  • The substructure should have a clear and documented naming convention, such as numbering or naming the experiment runs, dataset versions, and/or researchers.

File naming conventions

  • Reserve the 3-letter file extension for application-specific codes, for example, formats like .wrl, .mov, and .tif.
  • Identify the activity or project in the file name
  • Many disciplines have recommendations, for example: DOE’s Atmospheric Radiation Measurement (ARM) program

Data Documentation: Implementing metadata to properly communicate about your data

Data documentation, also known as metadata, helps you understand your data in detail, and also helps other researchers find, use, and properly cite your data.

Various metadata standards are available for particular file formats and disciplines. General guidelines are provided below. For help in documenting your data, email scholcomm@lists.wm.edu.


Important things to do while you collect or create your data

  • Make a note of all file names and formats associated with the project, how the data is organized, how the data was generated (including any equipment or software used), and information about how the data has been altered or processed.
  • Include an explanation of codes, abbreviations, or variables used in the data or in the file naming structure.
  • Keep notes about where you got the data so that you and others can find it.


Things to document about your data:

Title: Name of the dataset or research project that produced it

Creator: Names and addresses of the organization or people who created the data

Identifier: Number used to identify the data, even if it is just an internal project reference number

Dates: Key dates associated with the data, including project start and end date, data modification data release date, and time period covered by the data

Subject: Keywords or phrases describing the subject or content of the data

Funders: Organizations or agencies who funded the research

Rights: Any known intellectual property rights held for the data

Language: Language(s) of the intellectual content of the resource, when applicable

Location: Where the data relates to a physical location, record information about its spatial coverage

Methodology: How the data was generated, including equipment or software used, experimental protocol, other things you might include in a lab notebook

Data storage

You have several options for storage and processing of your data during the active phase of research:

  • ​​WMApps accounts are available to faculty upon request and include unlimited storage in Google Drive. Google drive is for non-sensitive data types only. 
  • Faculty have access to Box which provides 100GB of storage space with security features for sensitive data storage.
  • Contact IT for further support


Data backup

  • Make 3 copies (e.g. original + external/local + external/remote)
  • Copies should be geographically distributed (local vs. remote) and may include personal computer hard drives, external hard drives, departmental or university servers CDs or DVDs aren’t recommended, because they fail frequently Tape backup system Cloud storage: several commercial options are available; each have different requirements, encryption, and storage fees

Data security

Unencrypted security is ideal for storing your data so that you and others can easily read it, but if encryption is required because of sensitive data:

  • Keep passwords and keys on paper (2 copies) and in a PGP (pretty good privacy) encrypted digital file.
  • Don’t rely on 3rd party encryption alone.
Uncompressed is also ideal for storage, but if you need to do so to conserve space limit compression to your 3rd backup copy.

To make sure your backup system is working properly, test your system periodically. Try to retrieve data files and make sure you can read them.

Need more help? The UK Data Archive provides additional guidelines on data storage, backup, and security.

Consideration of file formats

​As technology changes, researchers should plan for both hardware and software obsolescence and consider the longevity of their file format choices to ensure long term readability and access.

File formats more likely to be accessible in the future have the following characteristics:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Examples of preferred file format choices include:

  • ODF, not Word
  • ASCII, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format. If you deposit your data in a repository, your files may be migrated to newer formats, so that they’re usable to future researchers.

W&M Libraries Data Services

William & Mary Libraries will archive datasets and their documentation in W&M ScholarWorks, supporting the data through changing technologies, media, and data formats.  William & Mary Libraries provides guidance and support for all aspects of the data lifecycle, from planning data management strategy through preserving data at the conclusion of the project, and works with researchers to insure this process includes appropriate documentation and requirements for data integrity. 

A standardized metadata record for the data will be added to repository. This record includes a standardized data citation with a Digital Object Identifier (DOI) to provide permanent linking and access, gives credit to the research team, and enables users to obtain a copy of the product.  W&M Publish provides public, open access to the deposited data, via full-text searchable records, making them discoverable through Google, Google Scholar, and other large search engines.

Read more on how to submit your data here

Dataset Submission Instructions

  1. Contact your liaison librarian (or Hargis Library for VIMS) to discuss your research data needs and procedures for file transfer.
  2. Complete the Dataset Information Form (below) including all information available/applicable
  3. ​Submit  Dataset Information form and data to librarian as instructed. Librarian will provide further information.
  4. Upon receipt of confirmation email from librarian, update all other online references and citations to the data set with assigned DOI. Include DOI in all future references to data set.