Creating the archive
Although the Archive can be gradually built up following the steps above, usually there will be further work necessary at the end of a survey project to finalise the Archive. To ensure that nothing is forgotten the items listed in Archive file could be considered in turn.
All the files created during the project should be copied into corresponding folders of the hierarchical structure (see above). In particular, proprietary files that were kept in folders allocated by a software package should now be copied into an appropriately named folder within the hierarchical structure (e.g. GEOPLOT – see Code Snippet 1 above). This applies not only to geophysics files but also to all other files, for example GIS, CAD or graphics files. Some of the latter files (e.g. ArcMap mxd documents or svg files) usually only point to the files containing data (e.g. shapefiles) instead of actually incorporating the data. The appropriate folder structure therefore has to be maintained when copying files and if possible relative file paths should be stored (in ArcGIS, for example, this can be selected under File > Document Properties > Data Source Options > Store relative path names to data sources). Care must always be taken to copy all files that belong together, even if they may be hidden from view by settings of the operating system (e.g. cmd files).
A decision has to be made as to which files should be exported to preservation formats and Preservation files provides guidance on this topic. The minimum requirement is to create preservation files for the raw data (i.e. after grid assembly and prior to data improvement) and for the final processed results. However, it is desirable to export all data composites and if possible even all data grids into preservation formats. Preservation files should be created in geophysics coordinates and in map coordinates and stored in appropriately named folders within the hierarchical structure (e.g. PRESERVE_XYZ and PRESERVE_XYZ_MapCoords; tacitly assuming that the former holds preservation files in geophysics coordinates).
It is inevitable that during the export from proprietary formats to preservation formats some information is lost (see Preservation files) and it is hence important to record these metadata separately as part of the Comprehensive Documentation. If not easily expressed in a database, a text document can be used to describe them.
Image exports should be made from all relevant data and stored in appropriate folders (e.g. called \Images) to show their relationship with the data from which they originate. This includes not only the geophysics composites in geophysics as well as map coordinates but also the GIS and CAD files that are used to convey interpretation of results. While the geophysics data can be pictured well by raster images (tiff, bmp, png; but not jpg!). GIS and CAD file often also contain text and line drawings and are therefore better visualised in open image formats that can represent raster and vector information, like pdf or svg. If geophysics images are created with georeferencing information (e.g. world files or GeoTIFF) other users can easily load them into a GIS environment. Further details about image files can be found in Image files.
Sometimes not all the information contained on paper-based field notes is easily captured in the electronic files that are created for a project; an example might be sketches of notable features in the survey area. It is useful to scan such records and store them in a suitable electronic format, for example pdf/a (see Project notes). These can then be put into a folder, for example called \Project\Notes_Scanned.
Another folder (e.g. \Project\Report) can hold the report that is created for the project (Project report), including the word-processing text file (e.g. doc) as well as a print copy (e.g. pdf). If these were initially stored in a different folder a copy should be placed in the Archive’s hierarchical folder structure.
Geophysics and project metadata
Some of the geophysics metadata are normally stored in files created by the proprietary geophysics software (e.g. size of data grids, line separation, reading interval) and are therefore incorporated into the Archive as part of those files (see above). However, it is recommended to include this information explicitly into the tables of metadata to make it more readily available. These tables might be created as an export of a database that has been set up to hold such information (see above) or the tables shown in Comprehensive documentation could be copied and filled with the relevant information, either as a spreadsheet or a text document, and saved in an appropriately named folder (e.g. \Project\Metadata). As mentioned before, there is as yet no agreed exchange format to transfer metadata directly to an Archiving Body’s own database.
Information on the data processing history is stored in some geophysical processing packages but is not always easy to export. In this case a screen capture of the software’s display of processing steps can help to document these metadata. Otherwise, and for GIS and CAD packages which do not usually track processing, notes on the most important steps should be kept in text documents and saved as additional metadata for the project.
As discussed in Geophysics georeferencing, georeferencing information is required on three aspects:
- the geophysics coordinate system (procedure used to lay out the grid; where is the coordinate origin; estimates of accuracies for re-establishing the grid),
- coregistration to site grid (coordinates of control points, both in the geophysics coordinate system and the site grid, together with their respective accuracies; geophysics or site coordinates of map-features, if coregistration with a map is required; geophysics or site coordinates of ground features used for georeferencing), and
- georeferencing (description and approximate location of ground features for reference; distance to ground features for key points along the baselines; compass bearings of baselines).
All this can be captured in short text documents with some sketches, either created in a digital drafting package (e.g. InkScape) or in neatly drawn paper documents that are scanned to supplement the textual description. A suitable storage location would be in a folder called \Project\Georeferencing, or alternatively in the folder hierarchy for each respective survey block.
It is not necessary (or even desirable) to create a list of all files that make up the Archive (hundreds or even thousands; Code Snippet 2## shows how to easily create a text file with all file names that can then be shortened). If a clear folder structure has been chosen the logical relationship of the files is immediately clear. Code Snippet 3 (below) shows how a text file with the folder structure can easily be created. This can form the basis for a text document that describes how files are stored in the Archive.
DIR /S /A-D /ONE /B >fdir.txt
TREE /A /F >fdir.txt
Code Snippet 2: DOS batch file (e.g. fdir.bat) for creating a text file that contains all file names in folders and sub-folders of the current folder (i.e. where the batch file resides). The ‘switch’ /B ensures that only the filenames are included and can be omitted if more information is required.
DIR /S /AD /B >ftree.txt
TREE /A >ftree.txt
Code Snippet 3: DOS batch file (e.g. ftree.bat) for creating a text file that contains the folder tree starting from the current folder (i.e. where the batch file resides).
In addition to a listing of the folder structure with a brief description of folder names, some of the file names may need to be explained where meaning is conveyed in the name. An example already introduced in File description could be if the composite with raw data for a survey block is named with a three letter site code and a two digit number for the survey block (e.g. HCT04) and any data improvement or processing is expressed with a processing label and a running number (e.g. HCT04P08). Any such naming convention needs to be clearly described. It also needs to be identified what are the raw data and what are the final processed data. These could for example be the ones with the highest processing number or they might have a label indicating this (e.g. HCT04F). The location of sites and survey blocks with relation to the georeferencing information also needs to be made clear.
A frequent source of frustration when reusing data is that files may have well known file extensions (e.g. dat, txt) but exactly in what format the information is stored may not be obvious and should hence be explained (more details can be found in File description).
The guiding principle for including information into the File Description Document should be that a person wishing to re-use the data should be able to make sense of the files and folders without having to open and inspect them all.