While rationalising old and orphaned files on the ADS servers, I stumbled upon an old index.html file for a previous version of the website. Similar to discovering a long forgotten photograph in the attic, this led me down the meandering path of memory lane. However unlike a photograph, reconstructing the look and feel of a web page requires some fiddling to correctly associate the style sheets and any server side includes. After a few cut and paste commands replacing server side includes with actual HTML and a directory search for the missing stylesheet, the old homepage was back up again in all of its glory.
With the SENESCHAL project finally wrapped up, we thought it would be good to do a final post on how we implemented the SENESCHAL vocabularies into our systems. This follows on from my previous post, SENESCHAL Vocabularies: value to the ADS, which gives more background into the project in general. That post also lays out the original vision and a mock-up for the integration of the SENESCHAL vocabularies into our Collections Management System (CMS), which we can now safely say has been fully realised.
The CMS integration was the primary objective for the ADS within the SENESCHAL project, as it would have the biggest and most immediate impact. We wanted a simple way to accurately align our archive metadata to the official vocabularies used within Britain. This not only helps us manage our archives better, it also ensures our users can discover our archives in a more consistent and accurate manner. Before the SENESCHAL project, our “alignment” with the Monuments Thesaurus was done via a free text field. Even with the extreme fastidiousness of our digital archivists, errors had the potential to creep in. Entering data this way also uncoupled the term from its related concepts and hierarchy when recorded within our CMS. This could be recreated, but the onus was put on the user (us in most cases, but also the users of the archive) of the data to manually “realign” the term.
An online presence is just the tip of the ADS iceberg, but as such we still rely on search engines to direct traffic to our archived datasets and web content. Search engines are not the only way people find their way to the ADS, as we provide metadata to many aggregators and portals via OAI-PMH and SOAP web services such as the Heritage Gateway, Europeana, Thomson Reuters Data Citation Index, Keepers Registry, NERC Data Catalogue Service, and MEDIN Data Discovery Portal to name a few. Even with all of those outlets to discover ADS content, a fair share of users still come via a good ol’ Google type-and-pray search. We are referrer agnostic at the ADS, and don’t really care how people discover our resources… so long as it doesn’t circumvent our Terms & Conditions (T&C’s).
A side note on the the ADS T&C’s, they were drafted in the early days of the web, well before the advent of the Creative Commons or other similar licencing models, and were designed to credit the creators of the data while protecting their intellectual property from uncompensated resale. It doesn’t cost anything to access or use data from the ADS as long as you agree to the ADS T&C’s, which is usually done by clicking an “Agree” button when accessing data for the first time. While at times mildly annoying, we are all lucky that the original proposal from the AHDS lawyers (requiring a signed letter from each user wanting to access ADS data…… really!) was dismissed and the infinitely more convenient click-agreement prevailed. Ultimately our T&C’s were borne out of a belief in Open Access, which has been a core tenet of the ADS philosophy since its beginnings in 1996.
About a year ago the ADS was approached by the British Library (BL) about joining up to develop an mobile app together. A good relationship had evolved out of the ADS involvement with DataCite at the BL, so this seemed like a good opportunity to work together on something other than DOIs. Another reason the BL approached the ADS was because we hold a large amount of open data which would have a widespread appeal.
A year and many lessons later, the app has been available to download for 6 weeks and has notched up a respectable 650+ downloads. This blog post is an attempt to document and explain many of the decisions that were made during the development of the app. Some things in this blog may make more sense if you’ve already seen the app, which can be downloaded from the App Store. If you don’t have an iPad (or don’t want to download it), you can see screenshots on the ADS website to get an idea of what the app looks like.
ADS staff have bounced around the idea of developing a mobile app in the past, but until ADS was approached by the BL we didn’t have the time or resources to undertake the building of one. If the BL hadn’t approached the ADS to collaborate (and lead on the development), it is unlikely the ADS would have undertaken the developing of an app at this time. Given the widespread appeal of archaeology and the rich archaeological content held by the BL, an archaeologically themed app in collaboration with the ADS made sense. What kind of archaeological app to develop proved to be a more difficult question to answer than expected. Aware that a low curatorial overhead was desirable, initial thoughts focused on existing ADS collections or projects such as a mobile version of Archsearch, The Defence of Britain (DoB) archive or England’s Rock Art (ERA) project. An Archsearch mobile app was dismissed due to the scale (1.2 million records) and the broad nature of the Archsearch data. The more compact data sets of ERA or DoB were more appealing because they were focused on a distinct theme and had already been effectively curated by the depositors. DoB is also one of our most popular resources, but like ERA, its audience is rather specialist. While it may have been easier to create an ERA or DoB app, we wanted to develop an app with the widest appeal possible. We also wanted an app whose code and structure could easily be reused by us and others, so instead we decided to develop an app that focused on the archaeology of a select group of key British heritage sites. It was also obvious that general archaeology would be better suited to the BL and their collections, which has some of the rarest and most unique content in the world. After some initial indecisions, a general British archaeology app straightforwardly called “Archaeology Britain” was settled upon.
At the 7th World Archaeological Congress in Jordan, Martin Doerr raised a concern about the Linked Open Data world that was being advocated in our session. In particular he mentioned worry over the assumption that all of this Linked Open Data was going to be persistently and indefinitely accessible, and he suggested that people keep RDF or other serializations of the Linked Open Data they were using, particularly vocabularies or thesauri. This seemed like a good idea to us given the fragility of the web, and we have been informally promoting this idea at conferences and workshops.
Reason to heed Martin’s advice/concern has just presented itself, in the form of the recent US Government shutdown. This subsequently has brought down the Library of Congress website, including the id.loc.gov domain, which hosts their linked data records.
The release of the SENESCHAL vocabularies as Linked Open Data is a very exciting development for practitioners of archaeological linked data. This is the first step in enabling the proper alignment of UK archaeological terms for our archive metadata. Before SENESCHAL, we had no authoritative vocabularies to align our Linked Open Data with, so string literals were used based on what was recorded in a Collection Management System (CMS). This is obviously less than ideal and leaves this data exposed to the pitfalls of a pre-Linked Open Data world, such as spelling mistakes and unreferenceable terms, which makes true interoperability much more difficult.
The other week I had the opportunity to participate in the SPRUCE Hackathon hosted by Leeds University. Hackathons are an opportunity for developers to get together and work on (or hack) common problems. Typically hackathons in the USA are fuelled by Mountain Dew and pizza, but as this was a British hackathon it was mostly fuelled by tea and cakes (and mighty fine cakes thanks to Becky). The hackathon was specifically focused on issues around file characterisation, which is precisely identifying and describing the technical characteristics of a file as well as its metadata. This is an ongoing challenge for practitioners in the digital preservation realm since there are many file formats, many versions of those many file formats, and little consistency in the way these many file formats and their many versions internally identify themselves. Digital archivists need to know more than just the file extension or format’s name, which Gary McGath sums up nicely in his recent Code4Lib article:
Just knowing the format’s generic name isn’t enough. If you have a “Microsoft Word” file, that doesn’t tell you whether it’s a version from the early eighties, a recent document in Microsoft’s proprietary format, or an Office Open XML document. The three have practically nothing in common but the name.