ADVICE

Guidelines for Depositors (Version 3.0 September 2015)

Contents

Introduction to the Guidelines
Why Deposit Data?
Depositing with the ADS
What to Deposit
How to Deposit
Costs
Preparing Collections for Deposit
Data Management Plans
File Management (Formats, Structure, Naming, Versioning)
Metadata
Selection and Retention
File-level Metadata Requirements
Documents
Databases, Spreadsheets and Statistics
Raster Images
Geophysics and Remote Sensing
CAD and Vector Images
Geographical Information Systems
Video and Audio
Virtual Reality
Photogrammetry
Collection-level Metadata Requirements
Deposit Check List
Downloads
Acknowledgements


Preparing Collections for Deposit

Data Management Plans (DMP)

Good data management from the very beginning of a project can be key to its success and makes preserving data and preparing it for deposit with ADS much easier. Creating a data management and sharing plan helps you consider how research data will be managed during the research process and shared afterwards with the wider research community.

Effective data management and sharing plans should consider data throughout its life-cycle. An archaeological data life-cycle model divides the research process into a number of tasks: project planning; data collection; data analysis; data archiving; data discovery; leading to data reuse and re-analysis. Imagining data being reused by someone else may cause you to approach the creation and design of your data in a new light. Moreover studies show that reuse of data is the single surest way of maintaining the integrity of data and tracking errors and problems with it.

A typical data management plan should establish the:

  • types of data being produced
  • file structure, versioning and naming strategy
  • metadata to be collected
  • standards and quality assurance measures
  • ethical and legal issues or restrictions on data sharing
  • copyright and intellectual property rights of data
  • data management roles and responsibilities
  • plans for sharing data between project members
  • data storage and back-up measures during the project
  • costs and resources needed
  • long term archiving of, and access to, data

ADS has further information on Data Management and Sharing Plans and the Digital Curation Centre has produced a useful Data Management Plan Check List and an online tool to help you develop a data management plan.


Return to Contents

File Management

It is important that data is correctly prepared before it is deposited with ADS. Data cleaning can take a digital archivist a lot of time and can lead to higher deposit charges. Your Data Management Plan should set out the details of the file management system that you will use during your project. This should include your choice of file formats, a file naming strategy, a logical file structure and a version control system.

Choosing File Formats

Format is a fundamental characteristic of a digital file that governs its ability to be used effectively. Without strong format typing a digital file is merely an undifferentiated string of bits. The information encoded in a file can only be interpreted properly and rendered in human-sensible form if that file format is known.

Choosing the right file format for your data is integral. A file format must be able to contain the type of data that is relevant to your research purpose and it must also be suitable for long term archiving.

When choosing file formats you should consider if they are:

  • suitable to record your data
  • stable, not under constant revision
  • supported on various hardware
  • supported by various operating systems (Windows/Macintosh/Unix/Linux)
  • supported by comprehensive public documentation
  • ideally free of legal restriction in its use
  • popular, these formats are more likely to remain supported

The ADS is able to accept most major file formats. Click here for a table of preferred and accepted formats. This information is also repeated by data type in the section File-level Metadata Requirements.

File Structure

A logical file structure is essential to any dataset because it allows data to be easily retrievable by different project members and by digital archivists once the collection is deposited. By following a logical data structure throughout a project time is saved when preparing data for archiving, more effective searches and it can be used to control access. Adhering to a predefined file structure also reduces data loss (‘where did I save that file’) and it provides your files with an absolute location. This is particularly relevant when using software like GIS as the file structure needs to be consistent for maintaining the retrieval of files from the Geodatabase.

When deciding upon a file structure it is useful to first decide what the primary data of the project is. In archaeology this often comes down to collection either being organised by:

Material – in the widest sense, so everything from types of material culture to archaeological samples (bones, soils, genetic samples, etc) or

Location – where data are grouped by region or archaeological site

Chronology - but often there is so much temporal overlap with sites or material spanning several periods it is difficult to create distinct sets of data

The following points highlight key considerations when designing your data structure:

  • Create a template – if you work on multiple projects use the same structure for each.
  • Record your hierarchy and share it with team members.
  • Determine the level of granularity that you want to use – you don’t want one folder with lots of files as this makes it hard to find individual items but too much granularity can also make it hard to find the correct folder.
  • Try to keep your data hierarchy clean of temporary folders and files.
  • Use a sensible file naming system (see below).
  • It is important to acknowledge that research designs can change and therefore so to can file structure – remember to record any changes and inform all team members.

File Naming

File names contain contextual information about the file so we know what it is without having to open it. File names can also be used to order files thus affecting file retrieval. File naming should be considered from the very outset of a project. It is important that your Data Management Plan defines your file naming system and all project members are familiar with the system.

The ADS primary servers utilise a UNIX operating system, and as with many other operating systems, there are certain file naming conventions which should be adhered to when transferring files.

ADS File Naming Conventions

  • File names should use only alpha-numeric characters (a-z, 0-9), the hyphen (-) and the underscore (_). No other punctuation or special characters should be included within the filename.
  • Use the underscore character to imply a space within your file name. Spaces in file names cause particular problems when files are transferred to our UNIX server.
  • A full stop (.) should only be used as a separator between the file name and the file extension and should not be used elsewhere within the file name.
  • Files must have a file extension to help the ADS and future users of the resource determine the file type. File extensions are normally 3 characters long and should be lower case.
  • Both upper and lower case characters can be used in a file name but keep files within your project consistent and ensure that supplied documentation accurately reflects the case of your filenames. Once files are moved over to a case-sensitive operating system such as UNIX, report.doc would become a different file to Report.doc. Also remember CAPITALS ARE HARD TO READ and affect ordering.
  • Individual file names, regardless of file structure, should be unique within a dataset.
  • Keep file names consistent. Descriptive or non-descriptive file names can be used. A descriptive file name helps explain the contents of the file. For example: TSA04_final_report_v3.pdf (version 3 of final site report for site TSA04 as a pdf file), or 12102004_trench_1.tif (digital photograph of trench 1 taken on 12/10/2004). A non-descriptive file name might be a unique ID number allocated to an image within an accompanying image catalogue database. Non-descriptive file names are acceptable as long as their content is adequately described in accompanying metadata or database.

Version Control

Being consistent with what you call files makes keeping track of which version is the most up to date much easier, particularly when you have multiple people contributing to a file.

Version Control Tips

  • Add a draft or version number to the file name and/or the date
  • Initials in file names can tell you who worked on the file last
  • Clean out older drafts of the same data

It is wise to keep older drafts until the final version is made but whether you want to keep old versions of files and data is debatable. You have to ask yourself: are you ever going to look at them again? See the section on selection and retention for more information.

Case Study Example

In the blog post The Silbury Hill Archive: the light at the end of the tunnel ADS Digital Archivist Jenny O'Brien highlights how good version control can help speed up the time spent on archive preparation.


Return to Contents

Metadata

Metadata is data about data. Metadata makes it possible to discover and share data. In order for data to be useful they must be seen in context.

Good metadata, like a good library catalogue, helps readers to identify the available resources quickly, thus refining their research, and putting them in touch with the resources they need. However for that to work effectively, the metadata has to be implemented accurately and in a standard format.

In the Guidelines for Depositors a distinction is being made between file-level and collection-level metadata. Each of these sets of metadata are very important, so please provide as detailed and complete records as possible.

Collection-level and file-level metadata requirements and advice are detailed in the next pages of the Guidelines for Depositors.

File-level metadata requirements are listed by data type and include Documents, Databases, Spreadsheets and Statistics, Raster Images, CAD and Vector Images, Geographical Informations Systems, Geophysics and Remote Sensing, Virtual Reality, Video and Audio.

Case Study Example

Read about data lost through lack of documentation and metadata in the Guides to Good Practice case study the Newham Archive: A Case Study of the Loss of Digital Data.


Return to Contents

Selection and Retention

Just as for physical archives, it is not possible, or desirable, for all digital data to be kept forever. Therefore before depositing data it is important that an appraisal and selection assessment is carried out on the collection. Your Data Management Plan should include an Appraisal and Selection Policy.

Points to consider when appraising and selecting data for deposit:

  • Relevance: The resource content fulfills the priorities stated in the funding or commissioning body’s current strategy, including any legal requirement to retain the data beyond its immediate use.
  • Scientific or Historical Value: Is the data scientifically, socially, or culturally significant? Assessing this involves inferring anticipated future use.
  • Uniqueness: The extent to which the resource is the only or most complete source of the information that can be derived from it, and whether it is at risk of loss if not accepted, or may be preserved elsewhere.
  • Potential for Redistribution: The reliability, integrity, and usability of the data files may be determined; these are received in formats that meet designated technical criteria; and Intellectual Property or human subjects issues are addressed.
  • Non-Replicable: It would not be feasible to replicate the data/resource or doing so would not be financially viable.
  • Economic Case: Costs may be estimated for managing and preserving the resource, and are justifiable when assessed against evidence of potential future benefits; funding has been secured where appropriate.
  • Full Documentation: the information necessary to facilitate future discovery, access, and reuse is comprehensive and correct; including metadata on the resource’s provenance and the context of its creation and use.

As the Silbury Hill case study exemplified this should be a process that should be carried out throughout your project, with old versions of files being deleted when they are no longer required.

The ADS Guidance on the Selection of Material for Deposit and Archive provides more detailed information on how to develop a managed approach to appraising and selecting datasets for long term curation. It should interest archaeologists from across the sector responsible for managing data or who work in data-intensive fields, and those supporting them in institutional repositories, data centres or archives.


Back | Top | Next