‘Accessioning Arch Camb’: Gwynedd Archaeological Trust Volunteer Engagement Project

Gwynedd Archaeological Trust volunteers have been researching digitised versions of Archaeologia Cambrensis, the Journal of the Cambrian Archaeological Association, as part of the ‘Accessioning Arch Camb’ project. Using journal volumes hosted on ADS and the National Library of Wales websites, the project is helping enhance the regional Historic Environment Record (HER) for north-west Wales.

Summer Internship With the ADS: Heritage Open Days

The following is a blog written by Chloe Rushworth, who has recently completed a 4-week Voluntary Placement with the ADS. Chloe has been working with the Curatorial and Technical Team to investigate some new approaches to how we interact with data within the Archive. Below, she gives a run through on her huge contribution to creating a ‘Curated Collection’ collating data that relates to sites participating in Heritage Open Days. The aims of this project are for this collection to work as an educational tool, to both increase awareness and knowledge of the archaeological and historical importance of the sites that are taking part in the Heritage Open Days, and to show how the Archive can add to the experience of the Heritage Open Days themselves.

If you want to see the results, the Collection is now live. Over to Chloe!

We all know that time flies, but I could have never imagined how true this statement really was until we arrived here, the last day of my internship.

My name is Chloe and I am an Archaeology and Heritage student at the University of York going into my 3rd year. This summer I have channeled my love for digital archaeology into a placement with the ADS (Archaeology Data Service) which is based in the Department of Archaeology at Kings Manor.

For a University student used to leisurely starts, the prospect of starting work at 9:00am on a Monday morning was rather alien, however due to the current circumstances surrounding Covid-19, I never set foot in the office. Instead, the combination of a strong coffee, my bed and a warm Zoom greeting by all of the ADS team soon perked me up and I set about the day’s work with enthusiasm.

During my first week I was assigned the task of finding out if the ADS Library archives contained any documents about the sites taking part in the Heritage Open Days (HODs) festival in September. This seemed like a daunting task as when I first checked the website there were over 300 results, and this just kept on growing over the week.

Eventually I managed to make my way through them, creating a spreadsheet as I went which included all of the site names (both that the ADS had records of and which ones they didn’t for future reference). I listed all of the documents related to HODs sites along with the ADS DOI and all of the edits that I made to their listings. After this I classified the records, splitting them down into smaller sheets by site type: Abbeys and Churches, Houses and Halls, Parks and Gardens, Museums, Nature Reserves, Monuments, Mills and Factories, Miscellaneous Historic Buildings, Burials and Cemeteries and Trails.

My next task was to create a Google Map with pinpoints of all of the sites, including a brief description of the site, the references of the related ADS Library records, and their DOI links to the Library. The aim of this was for it to be a colourful and informative educational resource to accompany my data and I am really pleased with how it has turned out. There are a few gaps in places such as Birmingham and Peterborough, but there are still HODs events taking place in these areas. Perhaps it is something that the ADS will look into in the future to acquire some records from those areas for a more even distribution.

Throughout week 2, I spent a lot of time extracting images from the documents and converting them into TIF files.

Screenshot of the Google map of HODs sites in the ADS Library
This was a longer process than I anticipated because of the laptop I was using. Due to not being in the office, I didn’t have the softwares that they would usually use and the only application I had that supported TIF was Paint! Sadly, I couldn’t have one for every document since some didn’t contain images, and others were not really suitable as illustrative examples. In the end I chose 10 images, one per site type, to represent my data. To know which document they came from I also had to rename the images with the name of the paper they came from followed by their figure number.

Week 3 is where things got incredibly exciting! I was told that the data and map that I had created were going to be the basis of the first ADS ‘Curated Special Collection’… and that I was going to be involved in the making of this.

For the first few days I was doing admin tasks and tying up loose ends. I made sure that my map and spreadsheet were totally finished and then created the metadata for the images and documents so that they could be easily uploaded into the ADS Object Management database.

I was then given a Zoom tutorial with my supervisor Jenny O’Brien who walked me through how to add all of the details into the ‘behind the scenes’ parts of the collection, including adding myself as an author which was a highlight of course.

Once I had written the introduction and other pieces of text I wanted for the various pages of the collection, I had to learn the basics of HTML coding to add it onto the page and add paragraphs, the front page image and the interactive map that I made. The rest of the coding that needed to be done in order for the ‘downloads’ page to be set up, to display the categories in a table format with images and to show the various report links, was deeply out of my league, so was done by Teagan Zoldoske (another Archivist) and Jenny.

Teagan was incredibly helpful during this process, and not only allowed me to watch her code, but also walked me through what it all meant and all of the different types of software used in order for the ADS to run. She also showed me the entire archiving process including creating dissemination and preservation files (which I then had to do myself for the images).

The coding for the collection will be finished after my placement is over, so I spent my last day doing tasks within the ADS Library to get a feel of another area of the archiving process. I merged a couple of authors, meaning that the same author was in the database twice but now all of the papers have been changed to be under one name. I also managed to eliminate the allusive author ‘-ZZZ-’ and correct the papers with this listed author to the correct one which was very satisfying once completed. 

All in all, it has been an incredibly rewarding experience for me, and also for the ADS I hope. I have learnt so many new things about the ADS as an organisation and become familiar with six new pieces of software in 4 short weeks. The skills and knowledge that I have gained from this internship is invaluable and will definitely be transferable to the jobs that I apply for once I graduate. I couldn’t recommend taking on volunteer work with the ADS more, and I sincerely hope that this opportunity is offered out again in the future.

Linking the virtuous circles: Citation and Tracking Re-use.

The ADS has (for nearly 25 years!) been providing free access to resources deposited with us. We put them online in open/accessible formats, people use them, and people cite them. We know people use them because we have data on page views and downloads. Some things are used a great deal; often high profile research resources that always gain alot of mentions in literature and social media. Others have more of a cult following, but are still used sporadically.

All these access statistics always make a good basic demonstration of impact; we can pass them onto project funders and stakeholders to demonstrate quantitative success. However the follow-up questions normally enquire as to “who” is using this data, and for what purposes. The ADS have many ambitions in regards to its (meta)data, but facilitating and demonstrating this re-use is a high priority. Over the last year I’ve had a chance to think more about what we could and should be doing, and how it can help us, our users, and depositors make more of the situation…

The key to this are the Digital Object Identifiers (DOIs) we use. For those unaware, ADS use DataCite DOIs through our membership of a consortium lead by the British Library. We create DOIs for:

  • All our deposited collections
  • Upon request, distinct entities within a collection
  • All unpublished reports
  • Journal articles

These DOIs are registered with DataCite, and in doing so we also pass on key metadata for the Object (who created it, when it as created, where it realtes to etc). This metadata is then searchable in the DataCite interface, alongside records from other repositories that are part of the DataCite community such as Zenodo or Dryad.

When users use ADS resources they should be citing the DOI. For example when using material from the ever-popular Roman Rural Settlement project, any use of the data should follow our guidelines, for example:

Martyn Allen, Nathan Blick, Tom Brindle, Tim Evans, Michael Fulford, Neil Holbrook, Lisa Lodwick, Julian D Richards, Alex Smith (2018) The Rural Settlement of Roman Britain: an online resource [data-set]. York: Archaeology Data Service [distributor] https://doi.org/10.5284/1030449

Or for a Journal article:

Sparey-Green, C. (2002). Excavations on the SE defences and extramural settlement of Little Chester, 1971-2. Introduction. The Derbyshire Archaeological Journal 122. Vol 122, pp. 1-10. https://doi.org/10.5284/1066616

There are tools available from DataCite to reformat these into nearly all forms of Bibliographic reference, but it’s important to emphasise that any citation or reference should include the DOI and not the URL that appears in a web browser. For example it should be https://doi.org/10.5284/1066616 and never https://archaeologydataservice.ac.uk/library/browse/details.xhtml?recordId=3202768

Why? Primarily the DOI is persistent. No matter what happens to ADS applications in the future (for example an update to the Library may lead to us not using details.xhtml any more), a reference to the DOI will always take you to where the content is. Secondly, and most inportantly in this case it allows us, via a range of tools, to identifiy where our DOIs are being used.

One such tool is the DataCite Event API, a prototype developed in collaboration with Crossref to track citations of DataCite DOIs quoted as sources in academic papers. A quick search of this for ADS DOIs shows for example:

Image of JSON from the DataCite Event API which shows the citation of https://doi.org/10.5284/1007741 by a paper in  the Journal of World Prehistory

In this case the paper ‘Approaches to Interpreting Mesolithic Mobility and Settlement in Britain and Ireland’ published in the Journal of World Prehistory cited Wessex Archaeology (2006). Engand’s Historic Seascapes Final Report https://doi.org/10.5284/1007741.

In addition, there’s also the incredibly powerful CrossRef Event Data, a set of APIs that captures and records events that occur all over the web. This includes not only published articles but also Twitter and Wikipedia (including WikiData), So for example I can see

Image of JSON from the CrossRef Event Data API which shows the citation of the DOI https://doi.org/10.5284/1000266 by a wikipedia article

In this case, the Wikipedia article on the Sutton Hoo helmet cites Martin Carver’s data from the Sutton Hoo Research Project.

Capturing this sort of reuse, and mentions of resources in Twitter conversations (919 and counting) is to my mind a useful indicator not only of reuse, but a glimpse into the sort of conversations people may be having about our digital Objects.

The next step is for us to build a method to pull data from these APIs and incorporate back into our metadata as a dynamic process. This would mean that this page (for example) https://archaeologydataservice.ac.uk/archives/view/romangl/metadata.cfm is refreshed with information where we can demonstrate that https://doi.org/10.5284/1030449 is ‘Cited By’ XXX. Who knows, this could even be extended as an option to email a deposition when their data has been cited so that they know their data is being actively used.

Which brings me back to the title of this blog. The idea of a virtuous academic circle lies at the heart of what it is to publish – you publish your words/data, someone else uses it and cites it, you know they’ve used it (however this may be), this encourages you to publish more as you know your work must have some value. It also taps into what is at the core of what the ADS was set up to do: the archive/record is there to be used and maybe (hopefully?) reinterpreted and re-purposed. The archive needs to be used, otherwise there is arguably no point in having the archive.

However, without wanting to mangle my shapes, I think this model is more complex and more in-line with the sort of graph theory / social network analysis that is now de riguer. It’s good to know where our resources are being cited, but there’s a whole bigger world of possible study. What sort of Journals are ADS resources cited in, what sort of ADS resources are cited (e.g. is anyone citing the raw data?), what topics do these represent, who is citing etc etc. There’s material there for a new wave of study about citation habits and biases, or at the very least a PhD…

Anyway, for this to happen please remember to cite the DOI!

Our Tweeted Times for #FestivalOfArchaeology

As part of the CBA’s #FestivalOfArchaeology in 2020, I spent a light-hearted day revisiting some of Internet Archaeology’s and ADS’s milestones. I also asked those whose paths intersected and crossed ours to join in and share memories.

We’ve made a compilation for your enjoyment.

Cartoon showing the way to a data repository.

Guidelines for Depositors, a reintroduction

The first half of 2020 has been an interesting one for sure. We’ve been working from home with our partners, children, and kettles as coworkers and we’ve begun to look at how information is presented on our website.

You may or may not have come to our site to find out guidance on depositing data. In that quest, you may have found a document/guide that was spread across several webpages, with no images, an over eager table of contents, and a reminder it was written in 2015. Well, you’ll be happy to know, that it’s gotten a bit of a face lift.

So without further ado, allow me to reintroduce yourself to our Guidelines for Depositors.

We passed! Great result from CoreTrustSeal accreditation

We have obtained the Core Trust Seal.

As many of you will have seen on social media last month, it is with great pleasure that the ADS can announce that it has been awarded CoreTrustSeal (CSA) certification. This is a massive achievement for a small digital repository, based out of four small rooms in the ‘tumbledown’ King’s Manor in York (well at least under ‘normal’ circumstances) and represents the culmination of many hours, weeks and months of hard work by all repository staff.

I hit save so it’s preserved right?

No preservation format is perfect. While physical mediums such as paper can last centuries under proper conditions, it is that qualifier that is key to its longevity. Everyone has seen what can happen to paper when it gets wet. Similarly, there are many horror stories of corrupted files that have helped create sceptics for using digital preservation over physical preservation. 

We have had 4000+ years to develop strategies to conserve the ‘written’ word and less then 50 for methodologies to preserve digital data.  However, as long as digital data is properly cared for, there is no reason that it too cannot last just as long.

There are two types of digital data; born digital which is data that has never been in a physical format or digitised data which was originally a physical before begin converted.  Both types of face similar problems and today I‘m going to talk about one of the more hidden killers of digital data: data degradation.  

A comic strip that talks about how great digital data is and how it never degrades while have the quality of the image become more degraded in each panel.
©xkcd, Digital Data
Changes to the ADS Library

The scholar, Periander in his library with printed text. Reproduction after a woodcut, 1488-89. Credit: Wellcome Collection

Since a Beta release back in March 2017 we’ve received a great deal of feedback on the ADS Library application. We know it’s used intensively, with over 120,000 downloads in 2019, but as with any IT application there are places it can be improved!

For the uninitiated, the ADS Library was the outcome of a Historic England funded project to ensure the longevity of the British and Irish Archaeological Bibliography (BIAB). BIAB had traditionally been maintained by the CBA, with records added into the database by hand from extant sources (see Heyworth 1992). As this approach became less sustainable in the digital age, it was also deemed advisable to combine this dataset with the growing number of digital unpublished reports and journals and monographs held by the ADS, the former mainly derived through material uploaded to the OASIS system. This was also an opportunity for the ADS to align its records with BIAB, and to have a single interface to cross-search all written works it held (traditionally, files from unpublished and published works sat in different databases). Having a unified database, with access to free copies of published and unpublished reports has also been in line with Historic England’s HIAS Principle 4: ‘Investigative research data or knowledge should be readily uploaded, validated and accessed online’.

