Deciding what to archive
The selection of survey data is covered in detail in the geophysical survey guide under documenting and archiving and archive files and it is advised that readers familiarise themselves with these sections. In summary:
- Documenting and Archiving covers the reasons for archiving survey data, the importance of raw data, the issues that may arise when archiving (e.g. whether data can be made available and how this should be documented) and the issues associated with proprietary ‘closed’ formats and open formats. The section also highlights the importance of creating and archiving adequate data documentation and metadata to ensure that the appropriate contextual and technical information exists to allow the data to be usable.
- Archive Files defines the main categories of data (working files, preservation files, images, project notes, etc.) that should make up the project archive.
These sections should be used in conjunction with the technique specific sections of this guide in order to determine the types of data, file formats and metadata to select and archive for a marine dataset.
Preservation intervention points for marine data
Although the question of what to preserve is relevant to all data, it is particularly so for certain types of data generated by marine projects because of the size of the files involved together with the cost of generating them. Although for many the final publication data is viewed as the definitive dataset, as highlighted in the geophysical survey guide the raw data (or the rawest available – acquired data has often been pre-processed) is deemed equally important. However, this is not to say that all data must be selected for archiving and the notion of selection needs special attention when dealing with the often complex data collection methods involved in marine projects. Marine projects may feature a series of data lifecycle stages involving data transformation by processes such as decimation, aggregation, recasting, and annotation in addition to data being migrated from format to format. In many cases, within these lifecycles there may be more than one point in the process at which intervention for the purposes of preservation might be desirable but equally, as long as the processing history is fully documented and repeatable, it may also be unnecessary to keep certain intermediate datasets., Please see the general section on Preservation Intervention Points for a more detailed discussion.
As an example, much of the focus of development within the VENUS project was on the integration of the raw data streams as captured from in-water devices. The primary example here was the combination of data streams into a coherent and usable set of sampled photographic and navigational data which, alongside the relevant bathymetric datasets, could then be passed to the photogrammetric modelling stage of the data cycle. As well as raw data acquisition there will be an analysis phase to any project. For example, survey techniques normally involve a series of traverses over a spatially defined area and composite mosaics can be produced as either part of acquisition or as part of a post processing stage. The composite can then be fed into a range of geospatial tools including 3-D visualization, Geographical Information Systems (GIS) and Computer Aided Design (CAD) software.
Such processes of refining, combining and post processing data streams often represent a series of downsampling stages for the raw data acquired from the devices. This processing of data streams into ‘new’ data sets poses a number of archival issues in terms of selection and retention and Preservation Intervention Points which have been discussed in previous sections and will be identified in the following data and technique specific sections.
The file tables in the following section summarise a sample of data formats that are used in marine survey projects and those considered to be applicable to long-term preservation are highlighted along with others likely to be appropriate to data acquisition, post-processing and dissemination. These tables are not intended to be exhaustive, there are undoubtedly other formats suited to preservation and other formats associated with the technologies under consideration. These tables have been generated from a combination of field observation with VENUS partners, ADS practice and English Heritage’s maritime division (Fort Cumberland) and the ‘Big Data’ survey of marine archaeological practice in the UK.