Semantic Technologies Enhancing Links and Linked Data for Archaeological Resources (STELLAR) was a 1 year project funded by the Arts & Humanities Research Council (AHRC).
STELLAR was a collaboration between the ADS and co-investigators at the University of Glamorgan and English Heritage, to enhance the discoverability, accessibility, impact and sustainability of ADS datasets and STAR project outcomes (services and data resources) by enhancing the interoperability between resources using the latest integration technologies and development of semantic search facilities and associated user interfaces. STELLAR built on outcomes and tools from the previous AHRC funded STAR project, which in its turn extended semantic search techniques initially developed through the EPSRC funded FACET project, a collaboration with the Science Museum.
Aims
Enhance the discoverability, accessibility, impact and sustainability of ADS datasets and STAR project outcomes (services and data resources) by enhancing the interoperability between resources using the latest integration technologies and development of semantic search facilities and associated user interfaces.
Go to the Archaeology Data Service Linked Data repository data.archaeologydataservice.ac.uk
Sample queries are available from the ADS Linked Data Info Page.
Objectives
- Develop best practice guidelines for mapping and extraction of archaeological datasets into RDF/XML representation conforming to the CIDOC CRM-EH standard ontology
- Develop an enhanced mapping tool for non-specialist users to map and extract archaeological datasets into RDF/XML representation conforming to CIDOC CRM-EH
- Map and extract archaeological datasets into RDF/XML representation conforming to CIDOC CRM-EH (by non-specialist users)
- Develop best practice guidelines and tools for generating Linked Data corresponding to extracted datasets
- Publish corresponding Linked Data
- Evaluate the mapping tool and the Linked Data provision
- Engage with the archaeological community to inform research and disseminate outcomes
Best practice guidelines and tools will be developed both for mapping/extracting archaeological data as RDF and for generating archaeological Linked Data. To this end, third party data providers will use the tools developed by the project to map and extract archaeological datasets into RDF/XML representation conforming to the CIDOC CRM-EH standard ontology. These datasets will be generated as Linked Data. Evaluation will consider both the mapping and linked data generation exercises, taking account of technical and pragmatic issues.
The project commenced in March 2010 and finished March 2011.
Report on ADS Datasets ingested for STELLAR 2011
The initial phase of work was to select and download a number of sample datasets to use in the case-study. The following datasets were downloaded from the Channel Tunnel Rail Link (CTRL) excavation archive ( Stuart Foreman (2009) Channel Tunnel Rail Link Section 1 {data-set}. York: Archaeology Data Service {distributor} doi:10.5284/1000230) held by the Archaeology Data Service:
- Cobham Golf Course, Cobham, Kent
- Cuxton, Kent
- Eyhorne Street, Hollingbourne, Kent
- Saltwood Tunnel, Kent
- West of Sittingbourne Road, Boxley, Kent
- Thurnham Villa, Kent
- Tutt Hill, Westwell, Kent
These datasets were chosen as they represent good examples of ‘standardised’ databases produced by excavations undertaken by two of the largest commercial units in England (Oxford Archaeology and Wessex Archaeology). All of these databases included information typical of an excavation archive – stratigraphic (context/entity/group), small finds, event and environmental sampling tables – that it was hoped would be useful case-studies for the STELLAR tool. In addition, several other datasets were also downloaded.
- Hartshill (doi:10.5284/1000365): an excavation database that included details of the earliest ironworking yet known in Britain
- Wellington Quarry, Worcestershire (doi:10.5284/1000392): a database from extensive excavations of a multi-phase prehistoric/Romano-British settlement and associated cemetery.
- St Peter’s Church, Barton-upon-Humber(doi:10.5284/1000389): A post-excavation database with extensive details of excavation of a medieval/post-medieval cemetery
All files were downloaded as Comma Separated Values (the ADS preferred method of delivery of tabular data) and stored separately.
The next phase of work was undertaken using the STELLAR console.
- Each dataset was imported into a local database (.db file)
- An SQL statement was produced to export data from each table, with each column equivalent to the STELLAR templates (CRM-EH), for example:
''SELECT DISTINCT '' '' ..................small_find_no AS find_id,'' '' .................description AS find_type,'' '' .................material AS find_material,'' '' ................. context AS within_context_id,'' '' .................note AS find_note,'' '' .................period AS production_period'' ''FROM'' ''.................Finds''
- All SQL statements were saved as text files, for example: “hartshill_crmeh_finds.sql.txt”, they were then run against the local database using the STELLAR console. The resulting tables – conforming to the CRM-EH – were thus automatically exported to CSV, for example “hartshill_crmeh_finds.csv”.
- The CSV tables were then individually uploaded to the STELLAR web tool (http://reswin1.isd.glam.ac.uk/stellar/default.aspx), mapped against the relevant template scheme (so for example “hartshill_crmeh_finds.csv” was mapped against CRMEH_FINDS).
- For each dataset, a unique namespace was created. In each case it was mapped to the ADS’ Linked Data page: http://data.archaeologydataservice.ac.uk/ and given a unique suffix based on the datasets doi, so for example Hartshill was given the namespace: http://data.archaeologydataservice.ac.uk/id/ADS/DOI/10.5284/1000365/
- The CTRL datasets had the same doi, so were made unique by extending the string to include the individual site, for example: http://data.archaeologydataservice.ac.uk/id/ADS/DOI/10.5284/1000230/cuxton/
- Once these steps had been completed – each individual CSV was exported to RDF.