Skip to content

Archaeotools: Natural Language Processing and Faceted Classification

In September 2007 the ADS and the Natural Language Processing Research Group at the University of Sheffield began work on the Archaeotools project funded under the e-Science Research Grants Scheme which itself is a collaboration between three major funding bodies, the AHRC, the EPSRC and the JISC.

This two year project builds upon previous ADS work to develop tools (the Common Information Environment – Archaeobrowser project) and will allow archaeologists to discover, share and analyse datasets and legacy publications which have hitherto been very difficult to integrate into existing digital frameworks. the project has three interrelated objectives, each represented by a distinct workpackage.

The first aim is to index the ADS database of over one million metadata records describing sites and monuments in the UK, according to three criteria: When, What and Where. The project will use the techniques of facetted classification, derived from information science and demonstrated in the Archaeobrowser project, to allow users to easily and intuatively navigate the ‘three-dimensional space’ created by the classification scheme. A map-based interface will be developed to allow the spatial dimension to be best explored.

Secondly the project will employ natural language processing (NLP) to allow automated tools to search within documents for terms which are part of known classification schemes, adding them to the facetted index, and providing much deeper and richer access to unpublished archaeological literature. Although this literature forms the primary record of most archaeological investigation within the UK, the level of scholarly and public access is traditionally very limited, imposing a major constraint on archaeological research. Tools will also be explored which will allow users to impose their own classifications and index the documents according to their own criteria, adding further user-defined dimensions to the classification.

Thirdly, these tools will also be employed to investigate whether it is also possible to identify and harvest index terms within older antiquarian literature as represented by back runs of archaeological journals currently being digitised and being made available online. As site reports in this older literature rarely give precise geospatial coordinates it will be necessary to investigate if natural language processing will allow the recognition and harvesting of place names. If this is achievable then the placenames can be supplied to existing services (GeoCrossWalk) which can look up the names in an online gazetteer of names and return precise grid coordinates which can be added to the index. This phase of the project will use the Proceedings of the Society of Antiquaries of Scotland which are already hosted in digitised form by the ADS

At the end of the project we intend to have created a major sustainable resource for archaeological research and made it available to all users via the ADS. It will also be possible to make recommendations for the future format and indexing of grey literature, and to draw lessons for the wider humanities e-Science community.

Partners

The ArchaeoTools project was a collaboration between the Archaeology Data Service, University of York and the Natural Language Processing Research Group at the University of Sheffield.

The following people were involved in the project:

Archaeology Data Service

  • Prof. Julian Richards – Director
  • Dr Stuart Jeffrey – Project manager
  • Tony Austin – Systems manager
  • Stewart Waller – Application developer

Natural Language Processing Research Group

  • Prof. Fabio Ciravegna – Project manager (Sheffield)
  • Sam Chapman – Research Associate
  • Ziqi Zhang – Research Associate

Funding Partners

The following bodies were funding partners in the e-Science Research Grants Scheme:

  • The Arts and Humanities Research Council (AHRC)
  • The Engineering and Physical Sciences Research Council (EPSRC)
  • The Joint Information Systems Committee (JISC)

Presentations

The first scheduled presentation for the Archaeotools project, titled ‘When ontology and reality collide: The Archaeotools project, faceted classification and natural language processing in an archaeological context’ was given by the ADS at the 36th Annual Conference on Computer Applications and Quantitative Methods in Archaeology (CAA 2008) held on 2nd-6th April 2008 in Budapest, Hungary. The PowerPoint presentation for this talk is avialable below.

May 9th 2008 saw a joint workshop between the STAR project and Archaeotools project held had the King’s Manor in York. This was attended by all project members plus invited colleagues working on similar projects in Scotland, the Netherlands and elsewhere. A general presentation from this day is available for download below.

On June 24th the ADS participants in the project gave a departmental seminar to the University of York Computer Science department on the techniques and tools being used in the project.The PowerPoint presentation for this talk is available below

In September (9th) an update of project progress was presented at the UK eScience All Hands Meeting in Edinburgh. The PowerPoint presentation for this talk is avialable below

Additional presentations by Julian Richards on the Archaeotools project were also given to the Science & Engineering Facilities Council, Daresbury on 5th November 2007, and at the Society for American Archaeology annual meeting in Vancouver on 28 March 2008

PowerPoint files:

  • ‘When ontology and reality collide’, Budapest, April 2008 – PPT (3.3 MB)
  • ‘Archaeotools/STAR workshop’, York, May 2008 – PPT (5.6 MB)
  • ‘UoY Dept. of Computer Science Seminar’, York, June 2008 – PPT (4.4 MB)
  • ‘eScience All Hands Meeting’, Edinburgh, September 2008 – PPT (3.1 MB)
  • CAA 2009, Williamsburg, USA, March 2009 – PPT (1.3 MB)
  • ‘eScience projects update meeting’, London, June 2008 – PPT (1.6 MB)
  • ‘eScience workshop, Mapping Information with and without Geography: Approaches to Data Visualization and Structure in the Arts, Humanities and Social Sciences’, Edinburgh, September 2009 – PPT (4.9 MB)

Publications

The Archaeotools project, faceted classification and natural language processing in an archaeological context.

Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z. ‘The Archaeotools project, faceted classification and natural language processing in an archaeological context. UK e-Science All Hands Meeting 2008’, ”Philosophical Transactions of the Royal Society A”, 2009 367, 2507-2519. doi: 10.1098/rsta.2009.0038

When ontology and reality collide: the Archaeotools project. The CAA 2009.

Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z., 2009 ‘When ontology and reality collide: the Archaeotools project, faceted classification and natural language processing in an archaeological context’. In Jerem, E. & Szevere´nyi, V. (Eds.), 2009 ”On the road to reconstructing the past”, Proc. 36th Int. Conf. on Computer Applications and Quantitative Methods in Archaeology (CAA), Budapest, Hungary, 2008. Forthcoming 2009.

Abstract:

As a direct result of a successful proof of concept demonstrator for a faceted classification browsing system for archaeological records, ‘Archaeobrowser’, the Archaeology Data Service (ADS) and the Natural Language Processing Research Group at the University of Sheffield have embarked on a further project, named ‘Archaeotools’. Archaeotools is funded by the UK’s e-Science Research Grants Scheme which itself is a collaboration between three major funding bodies, the AHRC, the EPSRC and the JISC. This project represents the first UK service implementation of a faceted classification system in an archaeological context, specifically to enhance the ADS’s ArchSearch facility moving the search paradigm away from the search box approach towards a more intuitive and informative faceted browser system. Archaeotools is also using natural language processing to tackle the problem of unstructured, but highly valuable, archaeological data by automatically extracting metadata from legacy datasets such as ‘grey literature’.

The Archaeology Data Service and the Archaeotools project: faceted classification and natural language processing. Published 2010.

Richards, J.D., Jeffrey, S., Ciravegna, F., Waller, S., Chapman, S., and Zhang, Z., 2010 ‘The Archaeology Data Service and the Archaeotools project: faceted classification and natural language processing’. In Whitcher-Kansa, S., Kansa E.C. & Watrall, E. (Eds.), in press 2009 ”Archaeology 2.0 and Beyond: New Tools for Collaboration and Communication”, Los Angeles: Cotsen Institute of Archaeology.

Integrating archaeological literature into resource discovery interfaces using natural language processing and name authority services. IEEE/All hands 2009.

Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z., 2009 Integrating archaeological literature into resource discovery interfaces using natural language processing and name authority services. In ”The Proceedings of the 5th IEEE International Conference on e-Science” (Oxford, UK, 9-11 December 2009). Forthcoming 2009.