Documenting data creation and processing
In addition to file formats, it is also essential to develop documentation, including metadata, to facilitate preservation, data discovery and reuse. Documentation is one of the cornerstones of archival practice and should exist in-house in order to facilitate management of project data. As highlighted with the identification of file types, the process of documentation should be actively pursued from the outset of a project as it is often difficult to create retrospectively. Although documentation (as information) is often implicit within the files themselves, retaining this information separately not only facilitates resource discovery and data management but also allows the files themselves to undergo trasformations (preservation or otherwise) without affecting the integrity of the associated metadata or documentation.
The metadata and documentation components that should be included in a marine archive are outlined in section Deciding What to Archive and a detailed metadata set is specified in section Metadata and Documentation. However, a number of metadata standards which are highly relevant to marine data also currently exist and will be discussed in the following section as well as in the sections focused on specific marine survey techniques.
Metadata for marine projects
As stated above, it is important that metadata is created while project data is being actively generated and processed. It is at these points that creators have the clearest idea of what information each file contains, where it was collected, how it was collected and how it was subsequently processed. As well as aiding data management and collection within a project, metadata and documentation is also designed to help others discover, re-use, interpret and manage data.
Metadata can be used to document many different aspects of a project at many different levels. It can be recorded broadly for the project as a whole or for datasets from specific techniques down through to metadata for specific files. When creating metadata it is important, where possible, to identify and adhere to relevant and established metadata and documentation standards. It is important to also realise that metadata standards not only vary according to the relevant techniques but also to geographic region. If long term preservation in a digital archive is one of the intended aims of a project it is therefore necessary to determine the metadata standards required by the archive in question. This section aims to identify a number of established metadata standards that are specifically relevant to marine project data.
General standards for marine data
Geographic metadata is identified as having special relevance to marine projects given the prominent geographic component of the acquisition techniques. Essentially all marine outputs, raw or processed, have a spatial element whether defined as a region or specific point. A number of standards covering geographic data have been developed at both national and international levels and it is advisable that data creators are aware of these as they are often referenced in marine specific metadata standards.
- ISO 19115:2003 Standard for Geographic Information – Metadata defines mandatory and conditional metadata sections, metadata entities, and metadata elements for describing geographic information. The standard also defines the minimum set of metadata required to serve a full range of applications (data discovery, determining data fitness for use, data access, data transfer, and use of digital data). In addition, optional metadata elements, to allow for a more extensive standard description of geographic data, are described which allows the extending of metadata sets to fit specialized needs.
- UK GEMINI (Association for Geographic Information, 2010) is the UK metadata standard (compliant with ISO 19115) for geo-spatial datasets.
- North American Profile (NAP) of ISO 19115 is the US variation of the ISO standard.
It is worth noting that, under the INSPIRE directive, there are likely to be other regionalised specifications developed for European countries.
Although much of the information that appears important for the successful management and reuse of marine project data does not obviously fit into the ISO standard, this information can be accommodated by certain elements of the standard. For example, metadata about the equipment used, settings, methodology, accuracy and software, as described in detail in Section 3 of this guide, may fit into the UK GEMINI (or other INSPIRE compliant European equivalent) Abstract element of which the specification notes in terms of usage:
- State what the ‘things’ are that are recorded
- State the key aspects recorded about these things
- State what form the data takes
- State any other limiting information, such as time period of validity of the data
- Add purpose of data resource where relevant (e.g. for survey data)
- Aim to be understood by non-experts
- Do not include general background information
- Avoid jargon and unexplained abbreviations.
Alternatively the Additional Information Source element could be used to point to associated documentation such as a brief survey overview. The lack of a relation element in the ISO 19115 metadata set could be seen as a shortcoming. Such information could also be recorded in the associated documentation pointed to in the Additional Information Source element. Some ISO 19115 standards support a Lineage element which can be used to record ‘information about the events or source data used in the construction of the dataset’. The latter is of particular importance in the case of distributed archives where source data and derived datasets might be archived with different organisation. Lineage, however, is only one of a number of possible relations a digital object or dataset might have.
Marine specific standards
As outlined in section 1.3 of this guide (‘Current Provisions and Initiatives‘), a number of initiatives exist at both national and international level which are currently working on establishing standards, sharing metadata and enabling access to marine datasets. While a specific generic metadata set is described in section 3 of this guide, data creators should be aware that certain data centres will have specific requirements.
The International Hydrographic Organization (IHO) has produced the document, ‘Spatial Data Infrastructures “The Marine Dimension”‘ (2009) which advises that metadata should be created to characterize marine data properly, facilitate discovery, retrieval and reuse of data and that it should conform to the ISO 19115 standard to ensure full interoperability. In particular the document highlights the importance of recording the geographic reference systems used. In addition, a previous document (IHO 2008, chapter 5) outlines the metadata elements recommended by the IHO to be recorded in order to assess the quality of survey datasets.
In the UK, MEDIN have issued a number of documents that outline metadata elements that should be recorded when submitting data to their data centres. The MEDIN Data Guideline Structure outlines specific metadata that should be recorded when data is being collected and covers project-level, survey details and station details (amongst others). Additionally, MEDIN have a defined ‘Standard Discovery Metadata’ set, a standards compliant (INSPIRE, ISO19115, GEMINI2) specification used to record dataset metadata in a standard and easily shareable way. The elements themselves are described in Seeley et al (2009) and a number of tools are available from the MEDIN website for creating such metadata. The MEDIN website also provides a number of links to external marine standards.
In North American, a number of specifications exist in addition to the general geospatial specification outlined in NAP ISO19115 and the Content Standard for Digital Geospatial Metadata (CSDGM) (supporting guides also exist to aid creation of this metadata). The US National Oceanographic Data Center (NODC) Data Submission guide outlines a minimum data creation metadata specification (section 4) for marine data and is similar is scope to the National Spatial Data Infrastructure (2005) document ‘Geospatial Positioning Accuracy Standards Part 5: Standards for Nautical Charting Hydrographic Surveys’. It is also worth noting that the Marine Metadata Interoperability (MMI) Project, which has been set up to provide guidance on marine metadata in North America, appears to have lots of developments in process to make metadata more easily exchangeable and may provide significant clarification and development of the current situation in the future.
This consists of anything that will facilitate preservation and reuse of a dataset. It could, for example, be published reports, brief grey literature reports or even a few scanned pages from a notebook. These might provide information missing from, supportive of, or more detailed than metadata records. They can often provide further contextual information about how a dataset fits together. Documentation may have particular relevance to marine project data where a number of survey techniques involve a series of traverses over a spatially defined area. Composite mosaics can be produced as either part of acquisition or as part of post processing. In the latter case it is clearly critical to document how data from each traverse relates to the others. The possibility exists to use an Additional Information Source or similar element in an ISO 19115 compliant metadata standard to point to such information. A robust and adhered to file naming convention can also reinforce this.