Open Archival Information System (OAIS)
A brief overview
The development of the OAIS reference model has been pioneered by NASA’s Consultative Committee for Space Data Systems (CCSDS) and has been accepted as an ISO (14721:2003) standard[1]. A technical recommendation is also available for consultation on the CCSDS website[2]. As a reference model, OAIS provides a conceptual framework within which to consider the functional requirements for an archival system suited to the long-term management and preservation of digital data. The OAIS framework has applications for both proposed and existing archival systems and also as a way of comparing systems through the mapping of discipline-specific jargon to OAIS terminology. Such terminology can, when mapped, be made clear and unambiguous enough to allow understanding by those beyond dedicated archival staff. The OAIS core entities and work flows within the model are shown in fig. 1 below (after CCSDS Fig.4.1[3]) .
The Submission Information Package (SIP)
Data producers create Submission Information Packages (SIP). A SIP equates to a deposit of digital data plus any documentation and metadata necessary for the archive to facilitate the long term preservation of the data and to provide access for consumers (i.e. reuse). The SIP provides a basis for the creation of an Archival Information Package (AIP) and a Dissemination Information Package (DIP) generated by the archive. The process involves generating preservation and dissemination versions of the deposited data where necessary. For example, a Microsoft Word .doc file might be converted to an XML based format such as an Open Office text document for long term preservation and to PDF for dissemination. Metadata documenting this processing is added to the AIP as is any relevant information from the SIP. Similarly any resource discovery metadata and reuse documentation in the SIP is added to the DIP. Consequently metadata and documentation supplied as part of a SIP assume major importance in terms of data deposition. The OAIS standard notes of the SIP that ‘Its form and detailed content are typically negotiated between the Producer and the OAIS’. In practice most repositories offer guidelines to depositors about acceptable formats, delivery media, copyright issues and necessary documentation and metadata.
In general the archival community are actively seeking to become compliant with the reference model through the process of certification (see Archival Strategies). It should, however, be noted that such audit checklists are a very recent development and, for the time being, a state of trust needs to exist between creator and archive.
Creating an Archival Information Package (AIP)
Data in the Submission Information Package (SIP) should be in (or have clear migration paths to) suitable preservation formats and, together with the associated documentation, this data should be sufficient to support the creation of an Archival Information Package (AIP). The Submission Information Package (SIP) assumes major importance in the relationship between data producer and an OAIS compliant archive where, as well as the data, documentation and metadata play important roles in informing preservation and reuse.
The AIP should consist ‘of the Content Information and the associated Preservation Description Information (PDI), which is preserved within an OAIS’.
- The Content Information is defined as the ‘set of information that is the original target of preservation. It is an Information Object comprised of its Content Data Object and its Representation Information. An example of Content Information could be a single table of numbers representing, and understandable as, temperatures, but excluding the documentation that would explain its history and origin, how it relates to other observations, etc’.
- The PDI is the ‘information which is necessary for adequate preservation of the Content Information and which can be categorized as Provenance, Reference, Fixity, and Context information’[4].
With the provision of a well formed SIP an archive will have minimal problems in generating the AIP. It is the rich metadata that provides for the ongoing management of the data it references through, for example, the automated audit of data using fixity or checksum values or through migration as a batch process.
The Dissemination Information Package (DIP)
Data in the Submission Information Package (SIP) should also be in, or have migration paths to, formats suitable for dissemination for reuse. The submitted format can in many cases be the same for both preservation and dissemination. The SIP needs to contain any documentation that facilitates reuse including metadata relating to resource discovery, fitness for use, access, transfer and use. A well formed SIP will facilitate the generation of the Dissemination Information Package (DIP).
Many of the formats noted as suitable for preservation are also suitable for dissemination and, in general, this is the ideal situation as datasets need only be stored once. However, there is an already noted problem in that archivists generally prefer simple file formats such as ASCII whilst users prefer the smaller file sizes of binary files.
Key points for data creation
- In order to effectively undertake the long term preservation and dissemination of data archival organisations need a well formed Submission Information Package (SIP)
- Consideration must be given to software and the formats it supports during data creation. Where long term reuse is a goal there must be clear migration paths for both preservation and reuse
- Inadequate documentation during data creation is the single biggest barrier to the future reuse of data. Documentation including metadata facilitates reuse as well as supporting in house administration and management during a project. Any other documentation that may facilitate reuse should also be included in the SIP.
[1] http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=24683&ICS1=49&IC S2=140&ICS3
[2] http://public.ccsds.org/publications/archive/650x0b1.pdf
[3] http://public.ccsds.org/publications/archive/650x0b1.pdf
[4] See Section 1.7.2 Terminology of http://public.ccsds.org/publications/archive/650x0b1.pdf