GUIDELINES FOR DEPOSITORS: Preparing Datasets

Version 4.0 July 2020

Guideline quick links

Navigate to a section of the guidelines using the links below:

  1. Guidelines
  2. Preparing Datasets
  3. Interfaces
  4. Downloads and metadata
  1. Preparing Datasets: File Management
  2. Preparing Datasets: File Naming
  3. Preparing Datasets: Appraisal and Selection

File Management

It is important that data are correctly prepared before it is deposition with ADS. Data cleaning can take a digital archivist a lot of time and can lead to higher deposit charges. Your Data Management Plan should set out the details of the file management system that you will use during your project. This should include your choice of file formats, a file naming strategy, a logical file structure and a version control system.

Choosing File Formats

Format is a fundamental characteristic of a digital file that governs its ability to be used effectively. The ADS is able to accept most major file formats.

When choosing file formats you should consider if they are:

  • suitable to record your data
  • stable, not under constant revision
  • supported on various hardware
  • supported by various operating systems (Windows/Macintosh/Unix/Linux)
  • supported by comprehensive public documentation
  • ideally free of legal restriction in its use
  • popular, these formats are more likely to remain supported

Data Structure

A logical file structure is essential to any dataset because: it allows for data to be easily retrievable, increases effective searches, helps control access, and reduces data loss. When deciding upon a file structure it is useful to first decide what the primary data of the project is.

In archaeology this often comes down to the collection either being organised by:

  1. Material
    • Types of material culture
    • Archaeological samples (bones, soils, genetic samples, etc)
  2. Location
    • Where data are grouped by region or archaeological site
  3. Chronology
    • Time period
    • Year

The following points highlight key considerations when designing your data structure:

  • Create a template
  • Record and share your hierarchy
  • Determine the level of granularity that you want to use – how many subfolders within subfolders
  • Minimize temporary folders and files
  • Use a sensible file naming system (see below)
  • Record any changes and inform all team members

Top


File Naming

File names contain contextual information about the file and should be considered from the very outset of a project. It is important that your Data Management Plan defines your file naming system and all project members are familiar with the system.

File Naming Conventions

  • File names should use only:
    • alpha-numeric characters (a-z, 0-9)
    • hyphen (-)
    • underscore (_)
  • Both upper and lower case characters can be used but:
    • keep files within your project consistent
    • ensure supplied documentation accurately reflects the case of your filenames
    • CAPITALS ARE HARD TO READ and affect ordering
  • Individual file names should be unique within a dataset
  • Files must have a file extension
    • normally 3 characters long
    • should be lowercase

Version Control

Being consistent with what you call files makes keeping track of which version is the most up to date much easier, particularly when you have multiple people contributing to a file.

Version Control Tips

  • Add a draft or version number to the file name and/or the date
  • Initials in file names can tell you who worked on the file last
  • Clean out older drafts of the same data

It is wise to keep older drafts until the final version is made but whether you want to keep old versions of files and data is debatable. You have to ask yourself: are you ever going to look at them again?

Case Study Example

In the blog post The Silbury Hill Archive: the light at the end of the tunnel ADS Digital Archivist Jenny O'Brien highlights how good version control can help speed up the time spent on archive preparation.

Top


Appraisal and Selection

Just as for physical archives, it is not possible, or desirable, for all digital data to be kept forever. Therefore before depositing data it is important that an appraisal and selection assessment is carried out on the collection. Your Data Management Plan should include an Appraisal and Selection Policy.

Points to consider when appraising and selecting data for deposit:

  • Relevance
  • Scientific or Historical Value
  • Uniqueness
  • Potential for Redistribution
  • Non-Replicable
  • Economic Case
  • Full Documentation

The ADS Guidance on the Selection of Material for Deposit and Archive provides more detailed information on how to develop a managed approach to appraising and selecting datasets for long term curation.

Top