Preservation Policy

Version 1.4

Created Date:30 September 2009
Last Updated:27 April 2016
Review Due:April 2017 (unless significant change)
Authors:Tim Evans and Ray Moore
Maintained By:Tim Evans
Previous Version:ADS Preservation Policy 1.3.1

1. Principal Statement 1

'The Archaeology Data Service (ADS) supports research, learning and teaching with high quality and dependable digital resources. It does this by preserving digital data in the long term, and by promoting and disseminating a broad range of data in archaeology. The ADS promotes good practice in the use of digital data in archaeology, it provides technical advice to the research community, and supports the deployment of digital technologies.' 2

The long term preservation and reuse (reuse value in itself aids preservation) of digital data is then core to ADS activities in providing ‘high quality and dependable digital resources’ to its user community. The latter has broadened over time from a largely academic focus to encompass a range of groups with an interest in Archaeology including commercial archaeology, heritage organisations, museums, Further and Secondary Education, community archaeology and the interested public in general.

The ADS actively follows preservation and management strategies based on this policy with the aim of ensuring the authenticity, reliability and logical integrity of all resources entrusted to its care. It further endeavours to provide its user community with usable versions for research, teaching or learning, in perpetuity.

2. Contextual Links

This document systematizes an overview of archival practice developed by the ADS since its inception in 1998. It does not exist in isolation but as part of a suite of documents guiding good governance and practice by the ADS. Policy and strategy documents include

  • ADS Five Year Plan: October 2016  October 20213 (strategy document)
  • ADS Risk Register4
  • ADS Collections Policy (6th Edition)5
  • ADS Repository Operations6
  • ADS Disaster Recovery Plan7

The ADS is further governed by the policy and strategy of its host institution; the University of York. Documents include

  • University of York Records Management Policy 20048
  • University of York Information Access and Security Policy9
  • University of York Legal Statements and linked policy and strategy documents therein10

As noted in the Collections Policy the ADS has agreements with a number of funding agencies that support archaeological research, to encourage funding recipients to offer their datasets for deposit11

  • Arts and Humanities Research Council (AHRC)
  • British Academy
  • Carnegie Trust
  • Natural Environment Research Council (NERC), for science-based archaeology
  • Historic England
  • Leverhulme Trust
  • Wellcome History of Medicine Project

The ADS has Service Level Agreements (SLA) with a number of organisations including

  • The UK Data Archive (UKDA)12 for provision of a remote deep storage facility
  • To host and provide a preservation service to the online journal Internet Archaeology13

The ADS has Memoranda of Understanding (MoU) with a number of external organisations concerned with preservation and reuse of data14 including

  • Association of British Geological Survey
  • The Association of Local Government Archaeological Officers (ALGAO)
  • The Council for British Archaeology (CBA)
  • The Royal Commission on the Ancient and Historical Monuments of Wales (RCAHMW)
  • The Royal Commission on the Historical Monuments of England (RCHME now part of Historic England)
  • The Collections Trust (formerly the Museums Documentation Association)
  • The National Trust
  • The Bedern Group

3. Preservation Objectives

The core objective of the long term preservation of digital data for reuse by a broad archaeological community has been described above.

The ADS endeavours to undertake long term preservation working within a framework conforming to the ISO (14721:2003) specification of a reference model for an Open Archival Information System (OAIS) as defined by a recommendation of the Consultative Committee for Space Data Systems!5.

OAIS provides a conceptual framework in which to discuss and compare archives through developing a common language. It describes the responsibilities and interactions of Producers, Managers and Consumers of digital and paper records. It defines processes necessary for the ingest, long-term preservation and dissemination of information objects.

Specifically the model describes a series of ‘transformations, both logical and physical, of the Information Package and its associated objects as they follow a lifecycle from the Producer to the OAIS and from the OAIS to the Consumer’. These packages comprise

  • Submission Information Package (SIP): Supplied by a data Producer (creator or depositor) including documentation to facilitate archiving and reuse
  • Archival Information Package (AIP): Generated from the SIP and the long term preservation package managed within the OAIS including administrative, technical and reuse documentation
  • Dissemination Information Package (DIP): Generated from the SIP/AIP and made available to Consumers (users) including documentation to facilitate reuse.

Clearly OAIS influences archival policy and strategy significantly. OAIS does not proscribe preservation strategies but the active management and lifecycle approaches tend toward migration in various forms rather than other techniques like emulation or technology preservation. The ADS uses a number of migration types for ongoing preservation

  • Normalisation: Data may exist natively or is migrated to widely supported open international standards such as ASCII (text) or TIFF (images).
  • Version migration: Data is migrated through successive versions of a format. For example, AutoCAD Release 9 (AC1004) has been migrated to AutoCAD Release 2010/11/12 (AC1024). Version migration may be the only option for preserving proprietary formats that don’t migrate to open standards. This is only practical where the software using proprietary formats is widely used within a community and accessible (affordable) to an archive. It is not practical for an archive to maintain a suite of limited use proprietary software.
  • Format migration: As well as normalisation data may be migrated to other formats for a number of reasons including dissemination. For example, a spatial dataset may be preserved as GML 3.2 but disseminated as an ESRI Shapefile. ESRI software sees wide usage amongst the archaeological community.
  • Refreshment: Migration between media which leave data (the bit stream) totally unchanged. For example, from one system to another.

Data that cannot be normalised and/or migrated between versions is not suited to long term preservation within the framework described.

As well as the physical process of preservation OAIS describes Preservation Description Information (PDI) as the ‘information which is necessary for adequate preservation of the Content Information and which can be categorized as Provenance, Reference, Fixity, and Context information’ which is preserved with an AIP

  • Provenance information: Concerned with ‘history’ and records, for example, ‘the principal investigator’.
  • Reference information: Concerned with unambiguously identifying content information through, for example, the provision of an ISBN number for a publication.
  • Fixity Information: A fixity value or checksum provides a simple way to protect the integrity of data by detecting errors in data. The MD5 (Message-Digest algorithm 5) and the SHA (Secure Hash Algorithm) are widely used cryptographic hash functions. Applying these algorithms to a file produces an (almost certainly) unique hash or checksum value and will consistently produce this value if a file is unchanged. Thus it provides a mechanism for validating and auditing data.
  • Context information: In terms of OAIS is concerned with environment. Examples include ‘why the Content Information was created and how it relates to other Content Information objects’.

Documentation including metadata concerned with resource discovery and reuse is then an equally important part of an archival package.

The above defines two of the cornerstones for a successful archival strategy within an OAIS framework

  • Use of software (by Producers) supporting formats with clear migration paths for both preservation and reuse.
  • The existence of adequate documentation to facilitate ongoing preservation and reuse.

The other cornerstones are

  • Ongoing access to adequate hardware systems by skilled staff.
  • That robust backup/recovery strategies are in place.

It is widely recognised that there are inherent weaknesses associated with these last two points; equipment fails or needs replacing, skilled staff leave or are difficult to recruit, digital media are notoriously frail to name some. These weaknesses can be quantified through risk assessment16 and lessened through forward planning including disaster recovery17 and systems budgets2.

In terms of reuse the ADS currently supports open access to its holdings (some data may be subject to a time limited embargo at the behest of a Producer or for legal reasons). The contents of most collections are available online. Because of bandwidth concerns larger files may only be available on request either as a specifically organised download or on portable media for which charges at cost may be made. The ADS is actively investigating various network technologies such as Point of Access (PoA) optical networks and Grid Computing seeking better mechanisms for disseminating 'big data'19.

In order to quantify and qualify success in reaching these stated objectives the ADS actively seeks compliance with two community driven initiatives for best practice:

1) Trustworthy Repositories Audit and Certification (TRAC): Criteria and Checklist20 authored by the US Centre for Research Libraries. The purpose of the checklist is identifying repositories capable of reliably managing digital collections. The ADS regularly undertakes self certification on an annual basis.

2) The Data Seal of Approval (DSA). The Data Seal of Approval was established by a number of institutions committed to the long-term archiving of research data. By assigning the seal, the DSA group seeks to guarantee the durability of the data concerned, but also to promote the goal of durable archiving in general. The Data Seal of Approval is granted to repositories that are committed to archiving and providing access to scholarly research data in a sustainable way. It is assigned by the DSA Board and renewed every year through a modification procedure.21

Since 2010 the ADS have been awarded the DSA22, and currently hold the Data Seal for the current guidelines (2014-2015). Looking forward, the current Data Seal (2014-2015) will be extended to the end of 2017 for existing holders  new guidelines for a 2016-2018 Seal are currently being finalised23.

4. Identification of Content

Content is driven by community; what the community is producing and what it wants to reuse. Also, as described above, the ADS uses migration in various forms as a long term preservation strategy. This influences which formats the ADS accept. Current practice with regard to content is set out in detail in the ADS Collections Policy (6th Edition)24. All projects are subject to the ADS Charging Policy25. Thus projects will need to build long term preservation costs into funding applications.

5. Procedural Accountability

ADS staff have established job descriptions which define roles and responsibilities. These are formalised following review by the University of York using the Higher Education Role Analysis (HERA) job evaluation methodology26.

Accountability pertaining to preservation and reuse falls to:

  • Director: Overall responsibility for financial management and for policy including compliance with legislation affecting digital preservation and its management.
  • Collections Development Manager: Responsible for approaching grant holders, negotiating with depositors and acquiring access to collections; managing collection services for the ADS; first point of contact for information about data deposition, joint cataloguing, or data access and re-use.
  • Applications Developer (Systems Management): Planning, selecting, purchasing and commissioning new computer equipment; evaluating, purchasing and the installation of software packages; overseeing system and network security of all ADS systems.
  • Communications and Access Manager: responsible for developing and managing ADS communication and access strategy, promoting the ADS, with overall responsibility for user services and outreach activities
  • Administrator: Responsible for essential administrative and financial management.
  • Application developer: Responsible for the development of software applications and user interfaces; Accessioning, mounting, cataloguing, validation, conversion, migration and curation of data sets; undertaking data audits and discussion with clients (Producers); and answering user queries.
  • Digital Archivists: Responsible for accessioning, mounting, cataloguing, validation, conversion, migration and curation of data sets; development of user interfaces; undertaking data audits and discussion with clients (Producers); and answering user queries.
  • Digital archivist (Digital Preservation lead): Responsibilities include monitoring and developing management and preservation strategies for digital data; ensuring compliance with preservation best practice and certification; ensuring secure offsite back-up.
  • All staff: Accountable to their line managers for compliance with this policy and with related policies, strategies, standards and guidelines.

The ADS also has recourse to its Management Committee though it should be noted that this group acts in a purely advisory capacity and without legal liability27.

6. Guidance and Implementation

The ADS came into being in 1996 as one of the data services grouped under an Arts and Humanities Data Service (AHDS – no longer extant) umbrella. As such it was and still is very much involved in the lifecycle approach to long term preservation as, for example, defined by Neil Beagrie and Dan Greenstein then of the AHDS in their 1998 publication A Strategic Policy Framework for Creating and Preserving Digital Collections28.

The generally recognised categories of the lifecycle of digital assets are (equivalent OAIS functional entities in brackets)

  • Data creation (Administration)
  • Acquisition, retention or disposal (Ingest, Administration)
  • Preservation and management (Archival Storage, Data Management, Administration)
  • Access and use (Access, Administration)

The ADS maintain a purpose built Collections Management System (CMS) that is used to track and document potential and actual collections of data throughout this lifecycle. The CMS is modular and broadly follows the above flow with People, Tracking, Accessions and User Services modules. Additionally there are Assist (help) and Admin (input controls and security) modules.

6.1 Data creation

Lead role: Collections Manager
Policy document: Collections Policy

The pre-ingest period of a resource or potential resource is of major importance from the time a project is conceptualised. Whereas a well formed SIP aids repository processes a poorly formed one may well preclude ingest (see 6.2). For a SIP to be well formed it must conform to a repository’s requirements The ADS is active in a number of ways in providing guidance to potential depositors during this period including

  • Collections Policy29
  • Guides to Good Practice30
  • Advisory services31
  • Guidelines for depositors32

6.2 Acquisition, retention or disposal

Lead role: Digital Archivist (Preservation lead)
Policy document: Preservation Policy

A number of documents guide the process of ingesting a SIP including

  • Repository Operations33
  • Ingest Procedures (Ingest Manual)34
  • Data Procedures (dealing with specific data types and file formats)35
  • Procedure Checklists36
  • File formats table  delivery, preservation and presentation37.
  • Security Overview38

The existence of a SIP and a signed deposit licence pertaining to it triggers accessioning. The licence grants a non-exclusive right to the ADS to distribute supplied data. Copyright is not transferred39.

The ADS uses the concept of a collection of digital objects to describe a discrete resource. Thus a collection may be related to a distinct project. Necessarily any number of accessions (SIPs) of related objects may be made into a collection as a project may be ongoing either submitting data in stages or providing reloads (sometimes known as editions). A producer may also deposit multiple collections pertaining to different projects. Collections and accessions are assigned identifiers which are unique within ADS systems.

As already described the ADS migrates files from a producer supplied SIP into its systems in various formats as part of a corresponding AIP (for preservation) and DIP (for dissemination). The retention of the significant properties of files is a primary concern during any migration as detailed in ADS Data Procedures. Copies of supplied files are also maintained in the same systems which are known within the ADS as the original files. These reflect files as delivered in terms of format and content but they may have been processed to, for example, remove spaces from file names (Unix based systems cannot process file names containing spaces).

A formalised directory structure is built under folders reflecting collections and accessions identifiers. These comprise

  • original (contains files supplied in the SIP which may have seen some processing as described above)
  • preservation (contains the AIP data – see also admin)
  • dissemination (contains the DIP)
  • previous (contains data that has been updated by a depositor including previous editions of a resource)
  • admin (contains data concerned with the administration of a resource including licence information, collection level metadata, preservation metadata; in OAIS terms the Preservation Description Information noted earlier).

All files within a collection are recorded in an extension to the ADS CMS known as the Object Management System (OMS). The OMS records a high level of technical metadata including physical location, filename, size, format (identified using the DROID40 software and recording MIME type and PRONOM identifier) and fixity value.

File migrations and processing  for example the migration of JPG files in the SIP to TIFF files in the AIP  are recorded in the CMS. The relationship between files within separate parts of the AIP, DIP and SIP are recorded in the aforementioned CMS. Thus a link between the various representations of an object, and any processes to achieve this are recorded and maintained.

Fixity values (checksums) are validated every 3 months. This is achieved using a manual program within the CMS/OMS which runs a new checksum, and compares it to the version stored within the database. Any discrepancies are investigated by the curatorial team.

Occasionally files are included in a SIP that are not suitable for ingest either by accident, through the lack of a clear preservation path or inadequate documentation. These files are deleted following consultation with the producer.

Currently delivery media of fully accessioned SIPs are retained indefinitely as a record of the original deposition, but not with any guarantee of longevity. In certain circumstances media may be returned to a supplier, for example, where data has been provided on a portable hard drive. Retention of physical media is periodically reviewed by the curatorial team.

6.3 Preservation and management

Lead role: Digital Archivist (Preservation lead)
Policy document: Preservation Policy

6.3.1 Storage and resilience

The ADS maintain multiple copies of data in order to facilitate disaster recovery (i.e. to provide resilience). All data (AIP, DIP, SIPs) are maintained on the main ADS production server in the machine room of the Computing Service at the University of York. The Computing Service further back up this data to tape and maintain off site copies of the tapes. Currently the backup system uses Legato Networker and an Adic Scalar tape library. The system involves daily (over-night), weekly and monthly backups to a fixed number of media so tapes are recycled.

All data (AIP, SIP, DIPs) are synchronised once a week from the local copy in the University of York to a dedicated off site store maintained in the machine room of the UK Data Archive at the University of Essex41. This repository takes the form of a standalone server (see SLA) behind the University of Essex firewall. The server is running a RAID 5 disk configuration which allows rapid recovery from disk failure. In the interests of security outside access to this server is via an encrypted SSH tunnel from nominated IP addresses. Data is further backed up to tape by the UKDA (see UKDA Preservation Policy42).

CMS/OMS based data is currently maintained in an Oracle 10 database which, similarly to the above, are backed up to tape on a daily (over-night), weekly and monthly schedule.

It should be noted that preservation data is not compressed for storage by the ADS even though the saving on storage would be significant. Data compression is generally seen as something to be avoided for the preservation copies of files43. There are good reasons for this for even lossless compression techniques where bit or byte losses can cause much more damage to files in formats such as jpg or zip that use compression techniques than uncompressed files44. This informs on the formats preferred for archiving and on storage.

6.3.2 Data management

As already noted the ADS maintain a custom built Collections Management System (CMS) which has been developed to act as a data management system. Beyond detailing the accessioning of data into a collection, through the OMS extension it also holds technical metadata (such as file type and location) for files. The CMS schema also records metadata describing the processing of files such as the conversion of supplied files into versions for the AIP and DIP or later migrations to different versions and formats.

Through the development of the OMS, the technical metadata, process information and fixity or checksum (MD5, SHA-1, etc) for each file can be linked. This level of management enables short and long-term management of data such as auditing and versioning.

Data refreshment is an ongoing process. It is undertaken regularly (on a weekly basis) during the already noted synchronisation of locally held data to an off site data repository within the UKDA. This one way synchronisation compares checksum values at source and destination to detect change and acts accordingly. Data integrity is also ensured through a semi-automated validation of all checksum values  this is undertaken every three months via the CMS/OMS.

As already described file migration between formats is a common activity during the accessioning process but can also occur throughout the lifecycle of a file. It may become necessary for a number of reasons including

  • version change (many formats change or evolve over time)
  • format obsolescence (a format is or is becoming deprecated)
  • another format becomes a more attractive preservation option

The ADS has recently (2015) successfully completed a migration of all CAD files to AutoCAD Release 2010/11/12 (AC1024). A report on this process is in the process of being published45.

An ongoing Technology Watch is maintained by ADS Curatorial staff and acted upon as and when necessary. As with migrations during accessioning it is important that the significant properties of a file are retained. However, it should be noted that in some cases significant properties may be altered in order to ensure ongoing preservation and usability (document formatting might be such a case). As such migrations are likely to be complex involving the DIPs or AIPs of multiple resources and multiple systems a migration plan is drawn up before commencing operations.

These processes then carry on throughout the lifecycle of data held by the ADS. It was noted in the Principal Statement (1) at the beginning of this document that the avowed intention of the ADS is preservation ‘in perpetuity’. However, all lifecycles have a beginning and an end and that some are shorter than others. Thus the reality is that we can only talk about the foreseeable future and there are a number of reasons why a resource or part thereof might have a limited lifecycle including

  • There is a breach of the agreement detailed in the deposit licence that cannot be resolved (deposit licence46 clause 8.9.1)

  • A depositor (producer) no longer wishes to make a resource available (deposit licence ibid clause 8.9.2)
  • A resource was deposited with a formally agreed lifespan
  • A resource or part thereof no longer has a suitable migration path for ongoing preservation

In all such cases the ADS will endeavour to contact depositors (or their organisations) to discuss the situation. The data in question may be removed from ADS systems following discussion. It may be returnable to a depositor in certain circumstances (this service may be chargeable). End of life events will be detailed in the CMS.

The ADS maintains, and adds to when circumstances allow, a Preservation Legacy Fund. A proportion of the cost of each collection contributes to this fund. Should the ADS cease to be a viable organisation the Fund will be used to provide an exit strategy that ensures the ongoing preservation of the data in its care.

6.4 Access and use

Lead role: Communications and Access Manager
Policy document: Rights Management Framework

This section is concerned with the access and use of the Dissemination Information Package or DIP; finding a resource, rights management and receiving a data collection or part thereof. It is also concerned with the availability, reliability and security of delivery systems. As already noted reuse of data can aid preservation.

A dedicated post of Communications and Access Manager has responsibility for investigating ways of aiding and encouraging the use of its collections.

6.4.1 Prerequisites

Access and use of resources held by the ADS is governed by a legal and regulatory framework

  • A Deposit Licence for each resource47
  • Copyright and Liability Statements48
  • A Common Access Agreement49

6.4.2 Resource discovery

It should be noted that the ADS holds two distinct types of dissemination data

  • DIPs representing a discrete archive which contain files in various formats
  • Record level datasets or collections. These may be available as standalone searchable datasets or as part of the ADS union catalogue the contents of which range from national reference collections to single records describing the accessible part of a resource; the DIP.

The ADS uses a qualified Dublin Core metadata schema for describing the collections it holds which reflects its roots as a onetime AHDS Service Provider. Where practical various thesauri are used in order to standardise the terminology used to describe collections50. This record level data is currently stored in an Oracle 10 database and is available online through ArchSearch; the ADS union catalogue51.

The ADS provides metadata to many aggregators and portals via OAI-PMH and SOAP web services, including:

  • The Heritage Gateway52
  • Europeana53
  • Thomson Reuters Data Citation Index54
  • Keepers Registry55
  • NERC Data Catalogue Service56
  • MEDIN Data Discovery Portal57
  • ARIADNE Portal58

The ADS also publishes a number of datasets as a Linked Data RDF based triple store59.

The ADS uses the Digital Object Identifier (DOI) System for uniquely identifying its digital content. The DOI System is an ISO International Standard and managed by an open membership consortium including both commercial and non-commercial partners.

DOIs are persistent identifiers which can be used to consistently and accurately reference digital objects and/or content. Within the ADS, DOIs are used to reference digital archives, and in the future selected individual digital files. The DOIs provide a way for the ADS resources to be cited in a similar fashion to traditional scholarly materials. DOIs can be thought of as a combination of a URL and an ISBN number.

Each DOI has metadata associated with it, such as subject, location (URL), publisher, creator, etc. While the metadata can change for a DOI, the actual DOI name will never change. This allows for an archive's DOI to be permanent while the actual location of the archive can change. In this sense, citing a DOI is much more robust and permanent than merely citing a URL, since the DOI will always resolve to the current location of the archive.

6.4.3 Rights management60

Access to the holdings of the ADS is free at the point of use to users for research and educational purposes. All users are required to accept the terms and conditions of the ADS Copyright and Liability statements and to the AHDS Common Access Agreement before they can use ArchSearch or access any of our archived data (see 6.4.1).

The ADS reserves the right to control the downloading of some or all resources by a system of user authentication at some point in the future.

6.4.4 Receiving data

ADS data is largely available online. Because of possible bandwidth issues some larger datasets may only be made available on request for a dedicated download. Some large datasets may be deemed as too big to deliver via a network but may be supplied on portable media. There may be charges for these services. Note charges would be for staff time in setting up deliveries and not for the data itself.

6.4.5 Security of delivery systems

A number of documents have relevance here

  • Systems Overview61
  • Risk Register62
  • Disaster Recovery Plan63

ADS delivery systems are split between a small number of physical production server and a larger number of dedicated virtual servers hosted by the University of York IT Services (ITS). ADS delivery systems sit behind the University of York firewall; physical servers are based within the Computing Service machine room and with a maintenance contract with a next business day service. All delivery systems (physical and virtual) are backed up to tape (as 6.3.1) and external hard drives.

Application upgrades and migrations between applications are planned and documented with ITS unless these constitute a minor operation.

6.4.6 Consumer access analysis

Analytics inform on consumer activity. They can be used to feed back into dissemination systems. Since 2013 the ADS use Piwik (a free and open source web analytics application)64 to collect such data; the University of York Legal Statements65 covers this usage. Prior to this web access statistics were generated using the Analog log file analyser package66

6.4.7 Outage

Records are kept wherever possible of service downtime both organisational (ADS) and institutional (University of York). There is a scheduled maintenance period of Tuesdays 8-9am (UK time). Services may be unavailable during this period.

7. Glossary

Accession: A deposit into a Collection (ADS)

ADS: Archaeology Data Service

AHDS: Arts and Humanities Data Service (now defunct)

AHRC: Arts and Humanities Research Council

AIP: Archival Information Package (OAIS)

ALGAO: Association of Local Government Archaeological Officers

Big Data: An EH funded research project looking at preservation and management strategies for large datasets.

CBA: Council for British Archaeology

CCSDS: Consultative Committee for Space Data Systems (OAIS)

Checksum: see fixity metadata

CMS: Collections Management System (here an ADS system)

Collection: A collection consists of one or more Accessions

Consumer: A user of data (OAIS)

Context information: Concerned with environment. Examples include ‘why the Content Information was created and how it relates to other Content Information objects’ (OAIS)

Data integrity: ensuring data is whole or complete and continues in this state (see fixity metadata)

Digital preservation: Ongoing managed activity to ensure continued access to authentic versions of content

DIP: Dissemination Information Package (OAIS)

DOI: Digital Object Identifier. Managed system for persistent identification of content-related entities on digital networks

DSA: Data Seal of Approval

Fixity information: Part of a PDI. A fixity value or checksum provides a simple way to protect the integrity of data by detecting errors in data. The MD5 (Message-Digest algorithm 5) and the SHA (Secure Hash Algorithm) are widely used cryptographic hash functions. Applying these algorithms to a file produces an (almost certainly) unique hash or checksum value and will consistently produce this value if a file is unchanged. Thus it provides a mechanism for validating and auditing data

Format migration: Moving data from one format to another. Particular attention should be paid here to maintaining the significant properties of files

GDAL: Geospatial Data Abstraction Library

GIS: Geographic Information System

GML: Geography Markup Language

HE: Historic England

MD5: Message-Digest algorithm 5 is a widely used cryptographic hash function. Used to generate checksum or fixity values. See also SHA-1

Metadata: Data about other data (e.g. fixity, PDI and resource discovery information)

MoU: Memoranda of Understanding

OAI: Open Archives Initiative

OAIS: Open Archival Information System

OMS: Object Managemement System. Data management extension of the CMS

Outage: System downtime whether planned or unplanned

NERC: Natural Environment Research Council

Normalisation: process of migrating files into widely supported open international standards

PDI: Preservation Description Information (OAIS)

PGDS: Parks and Gardens Data Service

Producer: A creator of data (OAIS)

Provenance information: Part of a PDI. Concerned with ‘history’ and records, for example, ‘the principal investigator’ (OAIS)

RAID: Redundant Array of Inexpensive Disks  a technology that provides high levels of storage reliability

RCAHMS: Royal Commission on the Ancient and Historical Monuments of Scotland

RCAHMW: Royal Commission on the Ancient and Historical Monuments of Wales

Reference information: Part of a PDI. Concerned with unambiguously identifying content information through, for example, the provision of an ISBN number for a publication (OAIS)

Refreshment: Migration between media which leave data (the bit stream) totally unchanged. For example, from one system to another

SHA-1: Secure Hash Algorithm is a widely used cryptographic hash function. Used to generate checksum or fixity values. See also MD5

SIP: Submission Information Package (OAIS)

Significant properties: The essential characteristics of a digital object which must be preserved over time for the digital object to remain accessible and meaningful. Proper understanding of the significant properties of digital objects is critical to establish best practices and helps answer the fundamental question related to digital preservation (DPC)

SLA: Strategic Level Agreement

TRAC: Trustworthy Repositories Audit & Certification

UKDA: UK Data Archive

University of York: ADS host organisation

Version migration: Migrating data through successive versions of a format

Web Services: A software system designed to support interoperable machine-to-machine interaction over a network.

Footnotes

[#1]  Beagrie, N., Semple, N., Williams, P. & Wright, R. 2008. Digital Preservation Policies Study Part 1: Final Report for JISC provides the structure of this document
[#2]  Mission Statement
[#3]  Agenda_15_03_2016/ADS_Five_Year_Plan_2016-21_v.1.4.pdf
[#4]  ADS Risk Register v1.10 (2016)
[#5]  http://archaeologydataservice.ac.uk/advice/collectionsPolicy
[#6]  http://archaeologydataservice.ac.uk/advice/RepositoryOperations
[#7]  ADS Disaster Recovery Plan
[#8]  http://www.york.ac.uk/recordsmanagement/rm/policy.htm
[#9]  http://www.york.ac.uk/about/departments/support-and-admin/information-services/information-policy/index/information-security-policy/
[#10]  http://www.york.ac.uk/docs/disclaimer/disclaimer.htm
[#11]  http://archaeologydataservice.ac.uk/advice/collectionsPolicy#section-collectionsPolicy-2.5.AcquisitionStrategies
[#12]  http://www.data-archive.ac.uk/
[#13]  http://intarch.ac.uk/
[#14]  http://archaeologydataservice.ac.uk/about/memorandaOfUnderstanding
[#15]  http://public.ccsds.org/publications/archive/650x0b1.pdf
[#16]  ADS Risk Register v1.10 (2016)
[#17]  [#17] [ADS Disaster Recovery Plan
[#18]  The ADS has an annual systems budget for renewal of physical hardware. This budget is reviewed and set by the ADS Director, Administrator and Applications Development team.
[#19]  http://archaeologydataservice.ac.uk/research/bigData
[#20]  http://www.dcc.ac.uk/resources/repository-audit-and-assessment/trustworthy-repositories
[#21]  http://www.data-archive.ac.uk/media/57322/dsa_overview.pdf
[#22]  http://www.dcc.ac.uk/resources/case-studies/ads-dsa
[#23]  http://datasealofapproval.org/en/news-and-events/news/2016/2/3/extension-current-data-seal/
[#24]  http://archaeologydataservice.ac.uk/advice/collectionsPolicy
[#25]  http://archaeologydataservice.ac.uk/advice/chargingPolicy
[#26]  http://www.york.ac.uk/admin/hr/managers/role-evaluation/
[#27]  http://archaeologydataservice.ac.uk/about/management
[#28]  http://www.ukoln.ac.uk/services/papers/bl/framework/framework.html
[#29]  http://archaeologydataservice.ac.uk/advice/collectionsPolicy
[#30]  http://guides.archaeologydataservice.ac.uk/
[#31]  http://archaeologydataservice.ac.uk/advice/adviceForAHRCGrantApplicants
[#32]  http://archaeologydataservice.ac.uk/advice/guidelinesForDepositors
[#33]  [ADS Repository Operations
[#34]  [ADS Ingest Manual
[#35]  The ADS Data Procedures documents are maintained on the ADS internal wiki. These documents are reviewed on a periodic basis, or in the light of technological developments in for example, a DPC Technology Watch report. PDF versions of the documents are presented below:

[#36]  ADS procedures are ensured via a number of internal checklists covering every aspect of creation the SIP, AIP and DIP, and internal [consistency checks.
[#37]  [Formats used for SIP-AIP-DIP
[#38]  [Security overview
[#39]  http://archaeologydataservice.ac.uk/attach/guidelinesForDepositors/ads_licence_form.pdf
[#40]  http://www.nationalarchives.gov.uk/information-management/manage-information/preserving-digital-records/droid/
[#41]  A responsible archive needs to maintain a copy of its data at a remote site. The ADS has reached a new 5 year agreement (with options to renew) with the UK Data Archive (UKDA) based at the University of Essex in Colchester (approximately 200 miles distant) to act as an off site repository. This repository takes the form of a standalone server (see SLA) behind the University of Essex firewall.
[#42]  http://www.data-archive.ac.uk/curate/preservation-policy
[#43]  http://www.erpanet.org/advisory/list.php?start=5&end=10
[#44]  http://old.hki.uni-koeln.de/people/herrmann/forschung/heydegger_archiving2008_40.pdf (presented at the Archiving 2008 conference)
[#45]  Green, K.; Niven, K.; Field, G. Migrating 2 and 3D Datasets: Preserving AutoCAD at the Archaeology Data Service. ISPRS Int. J. Geo-Inf. 2016, 5, 44. doi:10.3390/ijgi5040044
[#46]  http://archaeologydataservice.ac.uk/attach/guidelinesForDepositors/ads_licence_form.pdf
[#47]  http://archaeologydataservice.ac.uk/attach/guidelinesForDepositors/ads_licence_form.pdf
[#48]  http://archaeologydataservice.ac.uk/advice/termsOfUseAndAccess
[#49]  http://archaeologydataservice.ac.uk/advice/termsOfUseAndAccess
[#50]  http://archaeologydataservice.ac.uk/advice/depositCreate3#section-depositCreate3-2.3.Part3DocumentingTheProject
[#51]  http://archaeologydataservice.ac.uk/archsearch/
[#52]  http://www.heritagegateway.org.uk/gateway/
[#53]  http://www.europeana.eu/
[#54]  http://wokinfo.com/products_tools/multidisciplinary/dci/repositories/
[#55]  http://thekeepers.org/thekeepers/keepers.asp
[#56]  http://data-search.nerc.ac.uk/
[#57]  http://portal.oceannet.org/search/full
[#58]  http://www.ariadne-network.eu/
[#59]  http://data.archaeologydataservice.ac.uk/query/
[#60]  http://archaeologydataservice.ac.uk/advice/collectionsPolicy#section-collectionsPolicy-5.2.RightsManagement
[#61]  [#61] [Systems overview
[#62]  [#62] [ADS Risk Register v1.10 (2016)
[#63]  [#63] [ADS Disaster Recovery Plan
[#64]  http://piwik.org/
[#65]  http://www.york.ac.uk/docs/disclaimer/disclaimer.htm
[#66]  http://archaeologydataservice.ac.uk/about/accessStatistics