The release of the SENESCHAL vocabularies as Linked Open Data is a very exciting development for practitioners of archaeological linked data. This is the first step in enabling the proper alignment of UK archaeological terms for our archive metadata. Before SENESCHAL, we had no authoritative vocabularies to align our Linked Open Data with, so string literals were used based on what was recorded in a Collection Management System (CMS). This is obviously less than ideal and leaves this data exposed to the pitfalls of a pre-Linked Open Data world, such as spelling mistakes and unreferenceable terms, which makes true interoperability much more difficult.
This is the first of a two-part blog reporting on the progress of my work in preparing the digital data from the English Heritage Silbury Hill Conservation Project for deposition. For an introduction to my work, please see the ADS Spring 2013 newsletter
A bit of background
Silbury Hill is “the largest man-made mound in Europe” (English Heritage) roughly 4,500 years old and a mystery that many antiquarians and archaeologists have, in their time, tried to solve through extensive survey and excavation.
To summarise: The Silbury Hill Conservation Project began after a hole appeared on the summit in May 2000, after which the hill continued to be monitored through a series of surveys, assessments and evaluations. These proved that the hill was suffering from various collapses caused by previous excavations being inadequately backfilled and voids were therefore created by the subsidence of material.
The Silbury Hill digital archive: a monumental task
As mentioned in the ADS newsletter, the digital data generated from the Silbury Hill Conservation Project represents all of the site visits, surveys, evaluations, excavations, photogrammetric recording, finds retrieval and environmental sampling undertaken over the span of 9 years as well as the consequent research, assessment and analysis of the site data.
At the beginning of 2012, the dataset comprised over 30,000 files and I was employed by English Heritage for three months to undertake the daunting task of selecting which files should be retained and renaming and reformatting files where appropriate. As it transpired, that three month period was not enough to even sort through which files needed to be kept for archiving and which should be discarded; consequently I was employed for a further year to continue to prepare the digital data for deposition. Continue reading The Silbury Hill Archive: the light at the end of the tunnel.→
As my former colleague Jen Mitcham discovered a SPRUCE Mash-up is a very productive thing to be involved in. This time I took along a collection of our PDF and PDF/A files to test a tool that is being developed. The idea of the tool is that it will be able to identify PDF files with content that involves a preservation risk. This is not necessarily the same thing as a PDF/A file which presents itself as a valid PDF/A according to the various different PDF/A validators out there (or at least it might not be – the jury is still out on this point). The validator being used by the tool is Apache PDFBox Preflight, but we also used PDFTron PDF/A Manager and Adobe Acrobat Preflight all of which give different results! The hope is that this tool when further developed will give a customisable traffic light system of identifying preservation risks in PDFs and that it will be possible to embed it in repository software. Good luck on the future development!
Other than that there was lots of great work done on file identification and although it was not possible to get on to my other issue of matching equivalent files of different formats I’m hoping to put in a bid on a spruce follow up grant for this.
More information on the issues and solutions is available from the event website.