Skip to content

Help & guidance Guides to Good Practice

Issues when archiving

Armin Schmidt and Eileen Ernenwein, 2nd edition, Archaeology Data Service / Digital Antiquity, Guides to Good Practice

A well known issue in data preservation is the perishable nature of the archiving media. An external backup harddrive may easily fail, CDs and DVDs fade and only have limited life span, and fire and other hazards can destroy most storage media. Archiving therefore resorts to ‘refreshing’ the binary data to new media at regular intervals and storing them in duplication in secure places (for further information on this, please also consult Archival Strategies.

A problem that many have already encountered is the relatively short life span of many specialist software products. Proprietary data formats have become obsolete (e.g. InSite data files), or require licensed specialist software to read (e.g. georeferencing information in ArcGIS saved in specific aux files). It is hence desirable to export data from a specialist software package into a preservation file format so that it can be re-used more easily. The process of moving data to such preservation formats is usually referred to as ‘migration’ and also includes updating to newer archiving formats as they become available. A good example is the use of AutoCAD dxf and dwg files through successive versions, but even old Microsoft Word files are becoming difficult to open in the latest versions and need to be migrated to newer formats.

Nevertheless, there are good reasons for using proprietary data formats while working with the data. They usually allow for efficient and comprehensive storage of all the information needed for a project (e.g. ca. 100 individual Geoplot files for 1 ha magnetometer survey). The export to a preservation format will lose some of this information (otherwise the proprietary format would probably not have been introduced) and may also be less storage-efficient (e.g. binary grid data as used in Geoplot composites converted to XYZ text files).

To use data beyond pretty pictures requires accurate location information and the geophysical data therefore need to be georeferenced. Many specialist geophysical software packages process the data in their own geophysics coordinate system (see The life of geophysical data) and additional information is required to put them into the right location on a map. Different methods are available for this (see Appendix 2), including location information for grid corners (e.g. tape measurements from fixed landmarks) or GNSS/GPS georeferenced polygon GIS files for the grid layout. While such georeferencing information forms part of a project’s Archive (Archive files) it can also be used to export the measured geophysical data with their correct georeferenced map coordinates as an XYZ text file using a specified datum (e.g. WGS84 degrees or UTM metres instead of the geophysics coordinates).

While many of these issues can be solved through technical approaches the question of who should have access to the data is far more difficult to resolve. Looters could use geophysical data published with full map coordinates to illicitly excavate and destroy valuable archaeological remains. In the USA, site location information is confidential and protected information under the Archaeological Resources Protection Act of 1979 (16 U.S.C. 470hh[1]). If a geophysical survey is undertaken in advance of a planning application the results may also be deemed as classified information. As shown in the previous section archiving has many benefits and an Archive’s access policy should hence be tailored in such a way as to allow for embargos on data release where this is considered necessary. It would also be possible to introduce different levels of access to an Archive. For example the ‘general public’ might be able to see only an un-rectified picture of a data plot while bona-fide users could access the full data set. However, the vetting of users is a thorny issue and once a dataset has been released to one user it can easily be passed on to many others with little possibility for tracking such pathways. The media industry has developed complex rights management procedures but these are unlikely to be applied to archaeological geophysics data. These issues are currently being addressed by the various Archiving Bodies, which have robust policies in place that a simple file repository would not be able to set up.

[1] http://www.gpo.gov/fdsys/pkg/USCODE-2008-title16/html/USCODE-2008-title16-chap1B-sec470hh.htm