Onward to the Brave New World?

Preserving a Past for the Future

Paul Miller

The Brave New World

Journals such as assemblage and Internet Archaeology are far from the only scions of the archaeological world boldly to embrace the digital age and both serve a vital role in spreading the gospel to the unconverted. They demonstrate that computer-assisted archaeology is far from the preserve of geeks, hackers, phreakers and nerds, but rather represents a powerful force in garnering, shaping and presenting evidence of the past through its manifestation within the recovered material record. Computers offer a liberating means of enabling and facilitating practically every aspect of modern archaeological endeavour and the Luddites who spurn them can only succeed in becoming increasingly distanced from the cutting edge of archaeological discourse.

From their earliest adoption in the 1960s, computers have inveigled their way into the very heart of public, commercial, academic and voluntary archaeology to such an extent that most archaeologists today would be rendered helpless without them. In the public sector alone, computers drive many aspects of the statutory Development Control process, from facilitating the Sites and Monuments Record (SMR) at its heart to storing, analysing and presenting data captured under contract during the mitigation process.

Discord in Paradise

This increasing dependence upon the computer, although undeniably beneficial for us all, is creating a problem with which our established practices are ill-suited to cope, which shows no inclination to go away, which grows worse every day and which threatens the backbone of the recovered archaeological resource if not resolved soon.

The problem, already addressed for recovered material culture (Museums and Galleries Commission 1992) and traditional recording media (Ferguson and Murray 1997), is 'the archive'. Quite simply, existing archival procedures are wholly inappropriate for long-term preservation of digital data's physical media and fail fundamentally in facilitating access to those data in order to enable the re-use and re-examination of earlier work.

The Archive Under Pressure...

The work of the archivist, it must be said, is often regarded as unbelievably boring, and yet without these hard working professionals much of the material recovered archaeologically and many of the paper, mylar or photographic records of this recovery process would have been lost long ago. Traditional archivism is concerned with the creation of catalogues storing the location and description of all that is deposited, within a humidity, light and temperature controlled environment. The traditional archivist stereotypically inhabits a dark basement surrounded by filing cabinets and fiche readers, is wary of users placing dirty fingers upon precious 'treasures' and has a remarkable ability to locate minutiae within his/her domain.

Storage of media

In a recently completed survey of traditional museum archives (Swain, forthcoming), concern was expressed over the effectiveness with which existing, under-funded, localised archives could deal even with preserving archaeology's digital by-product, let alone with actively facilitating access. Unlike the majority of bulk finds from excavations and the accompanying hardcopy archival records, digital media require far more than merely cataloguing and placing in a temperature and humidity-controlled room. The media themselves -- the 8-track tapes, Exabyte tapes, 8" disks, 51/4" disks, 31/2" disks, CD-ROMs, flopticals, ZIPs, etc., require active curation, with estimates ranging between three and 30 years for the life span of supposedly 'archived' media. Even high tolerance, high capacity, on-line storage used with mainstream computer systems, from which you are doubtless accessing assemblage to read these words, has a surprisingly short life span, with current best practice suggesting that data on such filestores be 'cycled' every few years from one disk to another in order to prevent degradation, denudation and eventual loss.

Migration of data

More pressing still is the requirement to migrate data. Computer hardware and software continue to evolve rapidly with much of the recent past's state of the art now obsolete or, at best, only partially supported today. Thus, data created in the 1980s on a Digital Equipment Corporation (DEC) Rainbow running CP/M or on an Apple Lisa may well have been routinely copied from one disk to another in order to preserve them, but the fact that the data therefore still exist doesn't mean that a new user will be able to open them on his/her Personal Computer (PC) today. Active migration of data from one version of software to the next and from one computer platform to its replacement is a crucial part of the digital archival process. It is far better, for example, to open data created in (an imaginary) SuperGIS 5.2 in your new copy of SuperGIS 6 and re-save it as a version 6 file, rather than to wait two years only to discover that SuperGIS 7 is incapable of reading version 5.2 files. By this time SuperGIS 5.2 is long-gone and you've probably just deleted version 6 as well, in order to make room on your hard disk for the latest whizzy and disk guzzling features of version 7.

Similarly, hundreds of thousands of words were written about archaeology using Amstrad's PCW word processor, with its quirky operating system, green screen and strange 3" disks. As this machine's dominance was challenged by the growing availability of the PC, there were a large number of tools and services available to help authors copy their sparkling prose from the Amstrad PCW and into their favourite PC-based word processor. Now, though, the PCW has vanished almost without trace. The translation tools have largely disappeared and most of the commercial services have moved onto other things; translation of PCW disks is no longer big business. Even University Computing Services across the United Kingdom, many of which kept the capability to read data from these machines, have increasingly been disposing of them in the past year or so as spare parts become more elusive and ageing components wear out. Many authors didn't bother to translate their data at the time -- after all they were far too busy and there were plenty of people around who knew what to do, so they could always worry about it later. When the time comes to prepare second editions of books, or to try to publish ageing Ph.D. theses, these authors are faced with the worrying realisation that there doesn't seem to be any easy means of translating their data anymore.[ENDNOTE 1]

Facilitating Access and Re-use

Efficiently preserved (routinely moved across media and between hardware platforms and software versions) digital data offer truly enormous potential to the user of archival material, whether digital or otherwise. As well as being a research tool in their own right in the same fashion as a box of pot sherds or a lever-arch file of context sheets, digital data also have the capacity to act as a key capable of unlocking the rest of an archive, whatever its medium and wherever it might be located. A tool such as the phenomenally successful World Wide Web (WWW) offers new potentials for enabling the creation of this key, potentials which are being realised by the Archaeology Data Service.

Making the key fit

The Archaeology Data Service (ADS) is a new service, funded by the United Kingdom's Higher Education Funding Councils as part of the Arts and Humanities Data Service (AHDS). The ADS seeks to encourage preservation of existing digital data, to offer guidance in the collection, storage and preservation of future data and to facilitate the extensive re-use of all these data, both within and without the Higher Education community. Working closely with experts around the world, the ADS is shaping the best in current digital preservation theory and practice to fit the realities of British Archaeology, with the aim of ensuring the survival of valuable digital data and the realisation of their true potential at the heart of any modern archive.

The Archaeology Data Service offers a long-term home to peer-reviewed digital data without a more obvious host and is building mechanisms by which data may enter the ADS catalogue whilst remaining physically at another, possibly distant, location. The potential for 'archiving' major resources such as the National Monuments Records, or frequently updated data sets such as many Sites and Monuments Records is therefore great, even if the actual data themselves never really leave the NMR or SMR concerned. The scope for revolutionising the speed and thoroughness of archaeological research may be unimaginable.

Fundamental to digital data's role in opening up the whole archive to external scrutiny is the use of those data in creating 'metadata' about the archive. Translated almost literally, metadata is 'data about data'. Enhancing the definition somewhat, metadata may be seen as the means by which (often unintelligible) data is transformed into information, of value to data creators, archivists and users alike. Simple metadata about an archaeological excavation archive, for example, includes the name and location of the site, the organisation responsible for its excavation and the present whereabouts of the physical archive(s). More complex metadata (known as 'documentation' within the ADS, to minimise confusion between the two) might conceivably include information on sampling strategies, levels of post-excavation analysis, etc. and distinctions between these most complex forms of metadata and the data themselves remain tenuous at best.

First approaches to archaeological archives have traditionally been daunting experiences. Even if interested only in a single excavation, the researcher is likely to face shelf after shelf of brown boxes filled with the site's physical remains. More than likely, the metalwork is elsewhere with a specialist or conservator and the glass is being studied by a second specialist at the other end of the country. Paperwork pertaining to the site will, depending upon local circumstance, either be stored with the physical archive, stored elsewhere nearby, or even deposited with the National Monuments Record (NMR) many miles away. The erstwhile researcher then, faces four daunting obstacles prior even to beginning work:

  1. Discovering the location of the main archive.
  2. Travelling to that location.
  3. Finding out what's missing from the archive that might be important.
  4. Tracking down the present whereabouts of those missing elements.

A researcher interested only in the site's paper records, which happen in this case to be deposited with the relevant National Monuments Record, potentially wastes time on steps one and two before discovering this and then has to travel to the NMR. Each of these problems obviously increases manifold for the researcher interested in more than one excavation, becomes yet more complex for those studying different forms of data (such as excavations, artefact corpora and aerial photographic landscape surveys) in conjunction and is rendered possibly insurmountable for those wishing to traverse local authority or national boundaries in pursuit of a topic of research at anything other than the most superficial level.

Digital data in general and metadata in particular cannot claim to offer a panacea to these problems, yet judicious use of the digital information associated with an archaeological resource may temper the worst extremes of the current system's intractability. By providing certain basic pieces of information in a consistent manner about any archaeological resource, such as name, type of resource, where the archive(s) are located, etc., it becomes easier for the researcher to rapidly locate those resources likely to be of value. Most archives probably hold this level of detail about themselves in a paper catalogue.

By extending the catalogue to cover neighbouring archives and by maintaining consistency of expression between new archives, (whilst by no means enforcing standardised recording or analytical processes upon the content of the archive) it becomes feasible for the researcher to begin to ask questions spanning the holdings of more than one archival resource. Local Authority Sites and Monuments Records, research corpora and the National Monuments Records all attempt this in different ways and for different reasons. By extending the catalogue still further, to encompass the artefact corpora, Sites and Monuments Records, etc. and span the British Isles as a whole, this enhanced research potential is taken towards its logical conclusion.

By constructing the catalogue digitally and by generating it, where possible, in a near-automatic fashion from existing digital records within and about extant archives, researchers are able to query and sort this catalogue based upon research-driven criteria, rather than by such simplistic attributes as period or shelf number. Made accessible via the World Wide Web, such a catalogue allows researchers to plan their work from home or place of work, identifying those resources likely to be of value and confirming their physical locations for a visit. Data held digitally either at the ADS or elsewhere may be queried on-line, or even downloaded for local manipulation, reducing still further the time and effort expended in visiting large numbers of small archives.

In collaboration with other members of the Arts and Humanities Data Service and relevant bodies within the museum and archaeological world, the ADS is developing procedures to realise this vision. At the heart of the solution lies a requirement not to modify existing archival data except when necessary, nor to enforce monolithic and standardised recording schemes upon the archaeological community. The Archaeology Data Service recognises the importance of standards, of which there are many, but also recognises that the best standards have evolved to meet a particular set of requirements; the requirements of a large urban excavation unit are unlikely to be the same as those of a long-term rural research excavation, or a geophysical survey company. Rather than requiring the use of any one standard, ADS advocates the use of the most suitable standards for local circumstances, so long as the standard selected is identified clearly, is implemented consistently and is published in some fashion in order to allow future users reference to it.

Placed logically above these standardised data sets and their associated documentation rests the core of the system providing access to these resources, the Archaeology Data Service's recommended metadata scheme for the provision of information facilitating the discovery and evaluation of resources -- whatever their form.

The 'resource discovery metadata' system advocated by the Archaeology Data Service is called the Dublin Core, named after the meeting at which it was first created in Dublin, Ohio (Weibel et al. 1995). Discussed more fully in Miller and Greenstein (1997), the Dublin Core comprises fifteen informational elements which together are felt to describe the essence of any resource, whether a book, excavation archive, Samian bowl, or ancient monument.

Each of the fifteen elements: Title, Creator, Subject, Description, Publisher, Other Contributors, Date, Resource Type, Format, Identifier, Source, Language, Relation, Coverage and Rights Management, is optionally extensible, allowing potentially detailed descriptions both of the standards and recording schema utilised in expressing any resource and individual aspects of each element.

Dublin Core records are created for each resource available via the ADS and when searched they allow the user to view important information about a large number of, very different, resources in a standardised form. One of the metadata elements, Identifier, holds information on where the resource itself may be found, includes details such as an electronic address pointing straight to files which may be downloaded, archive codes and shelf marks for physical resources in a specific archive and contact details for relevant archival personnel who may be queried, or with whom appointments may be made.

Conclusion

Digital data are increasingly commonplace within the archaeological world, yet existing preservation mechanisms are ill-suited to the care and maintenance of this important resource type. Initiatives such as the Archaeology Data Service are only now beginning to address the true implications of conserving digital resources, both for their own sake as an important element of the complete archaeological archive and as a powerful key which enables researchers to rapidly and effectively unlock information residing within other parts of any archive.

Procedures developed and tested now will become still more important in the near future, as a greater part of each archive becomes predominantly digital in nature. The Archaeology Data Service is well placed to predict and evaluate these growing requirements and to develop mechanisms to cope, to the benefit of the profession as a whole.

The Archaeology Data Service's core cataloguing system is currently under development, and is due to be released for widespread testing from April 1998. Interested readers should keep an eye on the ADS web site, or join the ads-all mailing list to be kept informed of developments.

Endnotes.

[1]Having dealt with queries like these, the Archaeology Data Service now knows of a couple of old Amstrad machines which can be used to translate data onto PCs, so get in touch if you're stuck. We don't own the machines, though, and their owners may wish to charge for their services. [Back to page]

References

Museums and Galleries Commission. 1992. Standards in the museum care of archaeological collections. London: Museums and Galleries Commission.

Ferguson, L.M. and D.M. Murray. 1997. Archaeological Documentary Archives, IFA Paper No. 1. Manchester: Institute of Field Archaeologists.

Miller, P. and D. Greenstein. Eds. 1997. Discovering Online Resources Across the Humanities: a practical implementation of the Dublin Core. Bath: United Kingdom Office for Library and Information Networking.

Swain, H. forthcoming. Survey of Museum Archives, funded by English Heritage and the Museums and Galleries Commission. Title to be confirmed.

Weibel, S., J. Godby, E. Miller and R. Daniel. 1995. OCLC/NCSA Metadata Workshop Report.

About the Author

Paul Miller is Collections Manager for the Archaeology Data Service, responsible for the building, describing and maintaining of a distributed collections database. He is also closely involved with the Dublin Core effort and works to encourage the greater adoption of (Dublin Core and other) metadata and resource description through membership of numerous committees inside and outside archaeology. His D.Phil thesis looked at a different area entirely, examining the application of Geographic Information Systems (GIS) to the modelling and evaluation of archaeological deposits beneath urban areas.

© Paul Miller 1997

Go on, e-mail assemblage today!



© assemblage 1997