Planning for the creation of marine survey data
The steps involved in the acquistion and processing of survey data is described in detail in the Geophysical Data in Archaeology guide under the section entitled The Life of Geophysical Data. This section of the Geophysics guide covers the full lifecycle of survey data from data acquisition and storage in instruments through to data composition, interpretation and reporting. In addition, this section of the Marine guide aims to highlight some general issues with file types that are particularly relevant to marine survey data. It is, however, recommended that readers familiarise themselves with section 2 of the Geophysics guide in order to place the following sections in context.
Identifying file formats from the outset
As with any other data type, once the preservation of marine survey digital outputs has been identified as desirable, then it is best approached as a task right from the initial planning stages of a project. During the project design phase the future of the data to be created should be given ample consideration and, where reuse is considered worthwhile, data must be in – or have clear migration paths to – formats suitable for long term preservation and dissemination (as identified in Archiving marine survey data of this guide). This section will outline general file types in relation to marine data but for a more detailed discussion see the general section on Planning for the Creation of Digital Data.
Binary and ASCII Formats
As with terrestrial geophysics, many of the software packages associated with marine survey data acquisition produce files that are in both proprietary and binary formats (see Austin & Mitcham 2006, 14). Whereas ASCII / plain text based files – preferably in open standard formats – are preferred, binary files – as discussed elsewhere in these Guides – are generally not seen as the best solution for the long term preservation of data except where such a format is a well established standard. In addition, as with terrestrial geophysics and other survey types such as laser scanning, a tension also exists between users and archivists of large datasets in terms of preferred formats. Users, and often data centres, generally express a preference for binary data in openly published formats because file sizes are significantly smaller, which makes handling and exchanging data easier. Many data centres, in contrast however, prefer data to be archived internally as ASCII text (e.g. NODC) which is generally seen as the most stable of standards for preservation within a long term archival strategy. This is resolvable in many cases through normal archival practice where the working, dissemination or data exchange version of a file can differ from the final preservation version although such considerations are best addressed at the start of a project so that a final migration strategy, where required, can be planned.
Beyond plain or delimited ASCII text, recent developments across many data types have highlighted a move by many software producers and users towards the use of XML (eXtensible Markup Language) based formats, or at least an XML format export facility being available in many software packages. The use of XML formats for marine data is largely confined to datasets being used in GIS packages and common GIS files such as ESRI Shape files and MapInfo files can be migrated to alternative supported formats such as the XML-based Geography Markup Language (GML). Outside of specific packages, tools such as the Geospatial Data Abstraction Library (GDAL/OGR) exist as a cross platform C++ translator library for raster and vector geospatial data formats (and is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation.
There are, however, a number of reasons why a format recognised as an open standard might be unsuitable for archiving. Formats using lossy compression (where data is lost as part of the compression process) are generally seen as unsuitable (see ‘Planning for the Creation of Digital Data‘). An open standard needs to be well and widely supported before it can be considered as a reliable preservation format. Even if a format is an open standard, the available software to read it might be proprietary and expensive which can inhibit the potential for reuse.
‘Big data’ issues
In addition to file types, the volume of data often collected through marine surveys can also have implications throughout the data lifecycle and in particular in areas such as the storage and dissemination of datasets. The VENUS Preservation Handbook highlighted that, “unless large volumes of data are being transmitted onshore during the data acquisition phase of the project there will be a vulnerable window where data is only stored in a single location (possibly at sea)” (Archaeology Data Service 2008, 18). Data creators should adequately secure data during the creation stage as discussed in Planning for the Creation of Digital Data.
In addition to data transfer during the creation stage, the transfer of data to the archive and its dissemination to a wider audience can also be problematic with large files being difficult to transfer. ‘Big Data’ is discussed in detail by Austin and Mitcham (2006) and data creators should be aware of these issues when collecting and creating data through to compiling the project archive.