Sheffield University logo   Archaeotools Logo          Natural Language Processing Research Group logo

About | Partners | Presentations | Publications

Archaeotools: Data mining, facetted classification and E-archaeology

In September 2007 the ADS and the Natural Language Processing Research Group at the University of Sheffield began work on the Archaeotools project funded under the e-Science Research Grants Scheme which itself is a collaboration between three major funding bodies, the AHRC, the EPSRC and the JISC.

knowledge map

This two year project builds upon previous ADS work to develop tools (the Common Information Environment - Archaeobrowser project) and will allow archaeologists to discover, share and analyse datasets and legacy publications which have hitherto been very difficult to integrate into existing digital frameworks. the project has three interrelated objectives, each represented by a distinct workpackage.

The first aim is to index the ADS database of over one million metadata records describing sites and monuments in the UK, according to three criteria: When, What and Where. The project will use the techniques of facetted classification, derived from information science and demonstrated in the Archaeobrowser project, to allow users to easily and intuatively navigate the 'three-dimensional space' created by the classification scheme. A map-based interface will be developed to allow the spatial dimension to be best explored.

Secondly the project will employ natural language processing (NLP) to allow automated tools to search within documents for terms which are part of known classification schemes, adding them to the facetted index, and providing much deeper and richer access to unpublished archaeological literature. Although this literature forms the primary record of most archaeological investigation within the UK, the level of scholarly and public access is traditionally very limited, imposing a major constraint on archaeological research. Tools will also be explored which will allow users to impose their own classifications and index the documents according to their own criteria, adding further user-defined dimensions to the classification.

Thirdly, these tools will also be employed to investigate whether it is also possible to identify and harvest index terms within older antiquarian literature as represented by back runs of archaeological journals currently being digitised and being made available online. As site reports in this older literature rarely give precise geospatial coordinates it will be necessary to investigate if natural language processing will allow the recognition and harvesting of place names. If this is achievable then the placenames can be supplied to existing services (GeoCrossWalk) which can look up the names in an online gazetteer of names and return precise grid coordinates which can be added to the index. This phase of the project will use the Proceedings of the Society of Antiquaries of Scotland which are already hosted in digitised form by the ADS

At the end of the project we intend to have created a major sustainable resource for archaeological research and made it available to all users via the ADS. It will also be possible to make recommendations for the future format and indexing of grey literature, and to draw lessons for the wider humanities e-Science community.

The Joint Information Systems Committee        The Arts and Humanities Research Council   EPSRC