Skip to content

Help & guidance Guides to Good Practice

File description

Armin Schmidt and Eileen Ernenwein, 2nd edition, Archaeology Data Service / Digital Antiquity, Guides to Good Practice

All the files that make up the Archive should be listed in the File Description Document that outlines what the files are, explains how they are arranged in the Archive (e.g. naming conventions, or that each Geoplot composite consists of a cmp, cmd and cms file) and what they mean. This can take the form of a text document or spreadsheet that is clearly labelled as File Description Document.

It is essential to organise the project files well in a hierarchical folder structure so that relationships are clear and relevant files can be found easily. Although every project has its own storage strategy it is recommended to use a hierarchical folder structure, for example in the form

<Project> \ <Site> \ <Survey_Block> \ <Technique> \ <Format>.

The <Survey_Block> would be the smallest survey area for which all measurement parameters are the same (e.g. same traverse interval, same dates and hence same moisture) and for each <Survey_Block> the different survey techniques have their own <Technique> folder, which then saves data of different formats in separate <Format> folders (e.g. ArchaeoSurveyor, Surfer, XYZ). Those formats that are considered preservation formats should be clearly indicated, for example by a prefix ‘PRESERVE’ The survey blocks may be allocated to separate <Sites> that could be grouped under individual folders (e.g. A1 Widening, 2010 Surveys). As an example for data from the monastic grange at High Cayton, the folder structure of Table 2 could be useful.


Table 2: Folder structure for data from High Cayton

The relationship between different files can be captured well in such a hierarchical structure and requires only few explanatory notes in the File Description Document (e.g. what is the ‘North Field’ and where are the ‘Kilns’).

Some extra information is however needed to describe the individual data sets in these folders. For proprietary data it is essential to provide information about the software with which they were created (e.g. “all files in folders \GPR\Raw were collected and stored with PulseEKKO software”). It should be considered that there are many programs that produce dat files – but these are all different and to make use of them requires additional information. It is also desirable to specify the groups of files that belong together. For example a single ‘shape file’ consists of at least three actual computer files (shp, shx and dbf) and Geoplot composites require three computer files as well (cmp, cmd and cms; note that with certain settings in Microsoft Windows cmd files are not shown in a folder listing and may hence be forgotten when compiling the Archive). Although such information may be difficult to obtain for proprietary data formats, it only needs to be compiled once and can then be used generically for all other projects.

Working Files included in the Archive should be explained as comprehensively as possible in the File Description Document. For example what do the names of the composites mean? (e.g. “HCT04P01 contains High Cayton Survey Block 04, processing step 1”). What is the naming convention of the GPR transects? (e.g. “line01 to line22 for Site 1, line23 to 26 for Site 2, with line24 being repeated and superseded by line25”).

Similarly, additional information should be provided for preservation and image files. It is, for example, important to know how the composite data were exported to XYZ text files with respect to the exact coordinates (e.g. measured from the centre of a raster cell or from a corner). This information can be included in the File Description Document or in form of a screen capture (Figure 8) during the conversion process. For image files it is essential to record how they were created, the range of values they represent and their size (see Image files).

Screen capture of XYZ data export
Figure 8: Screen capture of XYZ data export