The archive and the archiving body
Based on this analysis, the computer files that make up the Archive can be broadly split into three categories (Table 1) which are further discussed in the Archive files section.
- (i) Geophysics data that were collected in the field, then processed and analysed (The life of geophysical data). These consist of working files in proprietary formats, preservation files that were exported so they can be migrated by an Archiving Body, and image files for quick browsing.
- (ii) Project material, consisting of additional files that are relevant to the overall project. These may comprise of project field notes (e.g. as pdf) and the final report, which should be written according to accepted guidelines (see The life of geophysical data).
- (iii) Project documentation that contains all the metadata, both for the geophysical data themselves (e.g. traverse spacing, grid size) and for the project overall (e.g. dates, personnel, weather). Conceptually it is useful to think of metadata as that component of the Archive that can be stored in a database. In addition, information about georeferencing is crucial and is therefore listed here as a separate entity. The last part of the project documentation is the file description that outlines how the files are arranged in the Archive (e.g. using a sophisticated folder structure) and what naming conventions were used.
|Geophysics data||working files|
|Project material||project notes|
|Project documentation||geophysics metadata|
Table 1: Individual components of the Archive
Once the Archive is formed, it can be deposited to an Archiving Body. Some Archiving Bodies may have particular requirements for the folder structure of archives that they ‘ingest’ and this will have to be taken into account before the Archive can be deposited. There may even be cases where the Archive’s layout that appears most appropriate for the geophysical data and the project are incompatible with the requirements of the Archiving Body and compromises will have to be found. Given the comprehensive project documentation that will have been prepared as part of the Archive it is relatively easy to extract the metadata that a particular Archiving Body requires for their own system. Regrettably, this in many cases is still a manual process (e.g. for OASIS) and solutions will have to be found, possibly based on specific XML schemata.
Not all Archiving Bodies have the same functionality and several broad types can be distinguished.
- 1. In-House Archiving: a solution whereby the Archive, as a set of files or packed into a single zip/tar file, is maintained by a contractor or academic department themselves. It is essential to refresh the media regularly (e.g. create a new DVD copy every year, copy data to a new harddrive) and keep equivalent copies in different places. Each refresh should be checked for copy errors and labelled with a new refresh number. See also Planning for the Creation of Digital Data.
- 2. File Repository: a commercial storage facility to which the Archive is submitted, for example via the Internet, sometimes also referred to as ‘Cloud Storage’. The repository charges for the guaranteed long-term preservation of the Archive in its deposited form. Different access agreements are possible (e.g. only the depositor, or other parties with appropriate access credentials). The repository will have mechanisms in place to regularly refresh the Archive and keep copies in safe and secure locations.
- 3. Managed Archiving: in addition to a file repository, this is providing migration and indexing of the content of the Archive. For this, the preservation files from the Archive are noted and regularly migrated to new formats as standards evolve. In addition, a subset of the metadata is used to describe the Archive and thereby make it part of the overall body of material held by this Archiving Body.
- 4. Accessible Archiving: making a managed archive available to other users, usually over the Internet. This can be through a simple web interface or via some data interchange standards. Access agreements and policies are usually drawn up by the Archiving Body upon deposition of the Archive to specify who can access what parts of the Archive. Often only the preservation files are made accessible as they do not require specialised software. The difficulties related to access rights for different user groups and the granularity of file access were noted above (Issues when archiving).
These types of Archiving Bodies are not mutually exclusive; for example In-House Archiving may be complemented by an automatically updated File Repository or Accessible Archiving may host some of its Archives only through Managed Archiving. Nevertheless, one major distinction remains with regards to the treatment of the deposited Archive. File archiving (types one and two) is static and does not make any changes to the files that form the Archive, it simply preserves them. In contrast, information archiving (types 3 and 4) attempts to maintain the information captured in the data files, for example by using preservation formats and migrating the files regularly. Clearly, there is a considerable range of functions that Archiving Bodies can offer and the associated costs are linked to their level of service.