Archiving Day of Archaeology (2011-2017)

This case study describes the background and behind the scenes work that has gone into archiving the Day of Archaeology Project. The final digital archive for Day of Archaeology is now live on the ADS.

Photograph of Siddhi Laxmi Temple in Bhaktapur
Siddhi Laxmi Temple in Bhaktapur. Digital Archaeology Foundation (2016): A day saving the temples of Nepal with Digital Archaeologyhttps://doi.org/10.5284/1080729

The Day of Archaeology (DoA) project aimed to provide a window into the daily lives of archaeologists. The project asked people working, studying or volunteering in the archaeological world to participate in a “Day of Archaeology” each year by recording whatever their actual activity was on a specific day,sharing it through text, images or video on the Day of Archaeology blog. By choosing a single day, readers could experience a real cross-section of archaeological work, whether exotic or mundane, that reflected the reality of the profession.

The project was conceived and developed by a group of archaeologists with expertise in digital methods, communication and analysis: Lorna-Jane Richardson, Matt Law, J. Andrew Dufton, Kate Ellenberger, Stuart Eve, Tom Goskar, Jessica Ogden, Daniel Pett and Andrew Reinhard (see Richardson et al 2018). The first ever Day of Archaeology was held in 2011 and saw steady growth, including  participation from thousands of archaeologists; from those working in the field through to specialists working in laboratories and behind computers. The last Day of Archaeology in this form was on Friday 28th July 2017.

Photograph of some finds displayed in a laboratory
Lab photo. Archaeological Research Associates Ltd (2016): Where Do Artifacts Go After Excavation?https://doi.org/10.5284/1080737

The project ‘website’ was originally a WordPress instance (a free and open-source content management system) paired with a MySQL database, and hosted by the Portable Antiquities Scheme (PAS). At its inception, Day of Archaeology was conceived as a community partnership, ideally with crowdsourced funds to ensure maintenance of the WordPress site and domain name fees (see Richardson et al 2018). However, in 2015 NEARCH an EU Culture-funded project, which aimed to study the different dimensions of public participation in archaeology, agreed to provide support for DoA, both through widening participation across Europe and supporting contributions in a wider range of languages, but also through hosting the site when it was in a period of transition. As ADS was a NEARCH partner the WordPress site and MySQL database was transferred to ADS. NEARCH provided the resource to keep the DoA website running for the remaining length of the project.

As NEARCH came to a close in 2017, discussions were undertaken between ADS and the DoA partners about the future of the project. The partners agreed that after seven successful years, the project had accomplished much of what it had set out to do, and should be drawn to a close. This opened the opportunity for NEARCH to fund the long-term preservation of the project, ensuring this important resource for understanding archaeological practice during a particular period of time would continue to be available for future researchers.

Keeping the WordPress instance running indefinitely was deemed an unsustainable option. Instead, we looked at the data itself (i.e. the text, images and videos uploaded by the original creators). The ADS Curatorial and Technical staff (CATs) considered a number of options, such as exporting directly from MySQL as XML or JSON. However in this workflow the “look and feel” of a blog post was lost as it was stripped down to its component parts. In our view, the “look and feel” is/was a significant property that needed to be preserved.

Photograph of two archaeologists on kneelers use their trowels to reveal a mosaic at a Roman villa.
Two archaeologists on kneelers use their trowels to reveal a mosaic at a Roman villa. Oxford Archaeology (2016): Stephen Macaulay: A Roman Villa in Somersethttps://doi.org/10.5284/1080807

The next option was WARC, which ADS have been monitoring for sometime as a preservation solution for websites, including our very own Internet Archaeology. When assessing WARC the main obstacle we found was that creating individual WARC files from Day of Archaeology posts was time consuming, and also would not preserve embedded audio-visual content. In addition there was also a concern that end users would face an extra step in needing to use the WARC files within the externally-hosted Wayback Machine. As we know from our Helpdesk, many of our users like the quick and simple option for accessing data, so would WARC achieve this when all users want to do is look at (static) text and pictures? Perhaps a simple PDF would do?

This put us into something of a quandry! We’ve always been reticent to rely on PDF as a long-term preservation solution. Sometimes we’re forced to do this as it is all a creator can give us. As many know, attempts at any sort of normalisation or migration strategies for PDF are difficult at best, and fundamentally flawed at worst. However, the CATs came up with the idea of ensuring the significant properties of the blog posts were preserved in an AIP package containing text, images and audio-visual content in suitable individual long-term preservation formats. We also keep a complete copy of the original Day of Archaeology database in its raw form, should we ever want to return to a WARC option (and for the record, we are interested in WARC, we just need time to thoroughly investigate how to integrate into our workflows). However, what would be wrong in providing a combined PDF for end users? In our view, nothing at all. In addition, the CATs also thought that providing additional access to the images within each post would be beneficial, as useful digital resources in and of themselves.

Photograph of Tanum rock art from the Bradshaw Foundation Scandinavia Rock Art Archive.
Tanum rock art from the Bradshaw Foundation Scandinavia Rock Art Archive. Bradshaw Foundation (2016): The Tanum Petroglyphs of Sweden: Following the Path of the Sunhttps://doi.org/10.5284/1080811

Thus began a fairly mammoth undertaking led by Katie Green and Jenny O’Brien, assisted by Teagan Zoldoske, Alfie Talks and Hayden Strawbridge (as a University of York MSc workplace secondments), to convert the Day of archaeology posts into easily accessible, yet sustainable digital documents.  The work involved creating a PDF of the original page but also ensuring the original data underneath was kept as a separate Object in our Object Management System (OMS), and then building the relationships between the post itself as an intellectual entity (something for PREMIS fans), and the AIP and SIP versions of the content.

Within the data being reassembled, the CATs also felt it was important to retain the original DoA tags (essentially as folksonomy) as metadata and directly associated with the preserved objects as a significant property; i.e. here’s what the original author thought their blog should be tagged as. Of course this also had the benefit of giving us a structure upon which to build a user interface, but more of that later.

Photograph of a ceramic vessel
Gail Boyle. (2016): Pots speak to me of the past…and other reponses to archaeologyhttps://doi.org/10.5284/1080815

Another task was assessing each post for text or images which run against our Sensitive Data Policy, particularly images of minors and personal data, as much of this content was created prior to the introduction of more stringent GDPR legislation. This ran to 1000s of files that needed checking. At this point we needed to stop, as other projects required attention and staffing capacity could not be spared for all this extra checking that was legally required before we could proceed. However, with the appointment of Teagan as a Trainee Digital Archivist we soon had the capacity to get things rolling again, only the task at hand was still significant. Things moved forward slowly where time and priorities allowed, and then in the Spring of 2020 the UK went into a national lockdown in the wake of COVID-19. Although we all remained busy, we thought that having a collaborative team exercise to finish the job together would be good for morale, and thus DoA suddenly roared forward. Nearly every member of staff, including our Director, Administrator and the Editor of Internet Archaeology lent a hand. We helped each other with case studies (“should I keep this?”), and Teagan and Jenny helped collate files and metadata into a coherent archive.

An interface framework created by Jenny, completely underpinned by use of the OMS and all that metadata was then used by Teagan to load in the final files. A simple query interface was also in place to allow a basic search functionality on year, author, and of course DoA tag. As a final embellishment, each post – except where the post could not be displayed for sensitivity reasons – was also minted a DataCite DOI. According to international definition blogs are grey literature so why shouldn’t we assign a persistent identifier to each just as we do with fieldwork reports? Looking at many of the blog posts they do genuinely represent a snapshot of working life, capture processes, thoughts and ideas. For example there are fieldwork snapshots, space archaeology, zooarchaeology and ruminations on the actual activities of an archaeologist. Perhaps not this one though*. They should be as findable and citable as the other forms of literature we hold.

Photograph of a ceramic vessel with the word cricket
This is an incredible link to the wider Cobham Hall estate, as one of the owners captained the first Ashes winning cricket team in the 1880’s…could this be a piece of memorabilia depicting this event…celebrated on the estate by the estate workers? Andrew Mayfield (2016): Cobham Landscape Detectives and a Cottage Dig in Kenthttps://doi.org/10.5284/1080904

After all that effort, we feel that work done on the DoA archive and interface has stayed true to the original project, and not only facilitated the preservation of these posts but also built something which showcases them. Of course there’s more we want to do: integrating the blog posts into the ADS Library, integrating the images into a single application which cross-searches all our images (some of the DoA images are stunning, and need to be shown off more). I hope with time we’ll be able to achieve this, and continue the DoA legacy.

Readers may already know but the project has been reimagined into the Day in Archaeology run by the Council for British Archaeology (CBA), so do please have a look at that fantastic resource if you’re interested.

* For the record England won by 239 runs, Moeen Ali took a hat-trick spread over two overs to finish off the tail. Could do with this now to counteract the ‘competitive’ pitches they’re struggling with on the current tour.

The future for England’s Rock Art

Several users have been in touch concerned over the future of England’s Rock Art website. Suffice to say that users should rest easy that no data is being lost, and public access to data is being retained.

Here’s the important background as to why this is happening:

England’s Rock Art website was originally launched in Summer 2008, as the culmination of a Historic England (then English Heritage project) to catalogue carvings in the Northumberland region. Since then it has been added to, principally with records from the Beckensall archive that were previously stored with Newcastle University. The website itself is actually a fairly complex application, with an underlying spatial database and Java framework that allows the user to interrogate the database.

Since its launch, the ADS have continued to perform a wide range of updates, patches and migrations on the application to ensure it’s longevity. These have involved major rebuilds in 2011, 2015, and 2018. Despite this additional work, undertaken with no additional funding, some features have begun to creak and latterly break (such as the map interface). More recently, the framework as a whole has become outdated, being deemed at risk for the last 18 months, and is now at a point where a major rebuild/application migration is required. This is not only to retain functionality, but also for security.

We take security very seriously here, and as such and in consultation with our IT services have agreed that the application is now at its end of life, and sadly needs to be replaced. We don’t take such decisions lightly. We’re aware from access stats that Rock Art has on average 30 unique visits every month and has a core interest group that needs to access the data, so we’re currently taking steps to make sure the data in the Rock Art database is maintained and made publicly accessible in perpetuity.

What’s happening?

The data itself (i.e. the text and images used in the database) is being turned into a standard ADS public archive. This means the individual records (CSV) images (JPG/TIF) and VRML will be available to access download. This includes all the later additions such as the Beckensall archive.

This means, for example, that all the information on the page for an ERA record such as this one, will still be there, just not in the website format and perhaps not as aesthetically pleasing.

We’re hoping to have this done as soon as possible, and when ready the ERA URL will resolve to the new archive.

Further ahead, there are some advantages to bringing the ERA data into a standard archive. The metadata can be incorporated within our Collections Management System (CMS) and Object Management System (OMS), the latter of which is forming the basis of our plans to centralise and implement cross-searching of Objects (i.e. files), and also to benefit from technical developments for external sharing such as IIIF. Overall, the data will be better curated, access widened by bringing it ‘in-house’.

In addition, we have plans to devote staff time to build on the raw data to develop the archive into an ADS Special Collection which replicates the database and map-based experiences we know a lot of our users enjoy for example Roman Amphora or Roman Rural Settlement of Britain). This is being done as a staff training exercise, so timescale for completion is less certain but I would hope we have an Advanced interface ready in 2021.

We hope all our users will understand that this work is being undertaken as a practical response to a tricky problem that impacts all public ICT applications at some point. In this case, and because the resource was already held by ADS we’re happy the data are secure and will be made publicly accessible as soon as possible, and that where we can (and remember the ADS has no core funding) we will continue to enhance access to the data so that the legacy of the original project is continued.

ADS, xyzviewer and an open future

The following is a Guest Blog authored by Professor Stephen Todd, currently visiting Professor in the Dept of Computing at Goldsmiths, University of London. We’re always interested in how people use our data, and indeed how they want to use or access our data. After preliminary discussions about enabling Cross-origin resource sharing (CORS) to provide direct access to ADS archived files for an xyzviewer, Stephen has been kind enough to write up his current work and wider thoughts for us as a case study.

This note discusses how xyziewer permits exploration/visualization of a subset of Star Carr data, and makes some points that arise on collaborative data and the relationship to the Archaeology Data Service, ADS. It is in two parts, the first outlines the capabilities of xyzviewer, and the second more diffuse arising thoughts.

Continue reading ADS, xyzviewer and an open future

Archaeology on Furlow: attitudes and expectations to online resources

Image of Mark Zuckerberg in a room full of people using Augmented Reality (AR) glasses
“‘ #Cyberspace . A consensual hallucination experienced daily by billions of legitimate operators, in every nation, by children being taught mathematical concepts…’ #Neuromancer , #WilliamGibson” by cyborglenin is licensed under CC BY 2.0

This week my colleague (Teagan Zoldoske) flagged up the following report:

Wiseman, R., and Ronn, P. (2020). Archaeology on Furlough: Accessing Archaeological Information Online: A Survey of Volunteers’ Experiences. https://doi.org/10.17863/CAM.54876

For those unfamiliar with the initiative, Archaeology on Furlough provides professional archaeologists in the UK with access to volunteer projects that can be done from home. This excellent report summarises the expectations and realities of using online resources for specific research needs. The ADS is cited frequently within, and I’m glad to see the overall positive response (see figures 2 + 3). The heavy use of the ADS Library, particularly unpublished reports, over Spring and early Summer 2020 is now partially explained!

Line chart of Access Statistics from the ADS Library
Export of Access statistics from the ADS Library (as of 31 October 2020) showing 103,464 downloads of articles and monographs, and 55,091 downloads of unpublished reports.
Continue reading Archaeology on Furlow: attitudes and expectations to online resources

Defining ‘Usefulness’

Guest post by Jamie Geddes

Recently, I have been on a work placement with the Archaeology Data Service, otherwise known as the ADS, situated within the Department of Archaeology at the University of York. This placement, which is a requirement for my MSc in Digital Archaeology in the Department, came along just as the world pandemic decided to force itself upon us. This meant I was unable to go into the work environment and physically work alongside members of staff. Lockdown meant we all had to stay at home. So, I hear you ask, what have you been doing since you can only work at home? Thanks to staff members Dr Tim Evans and Jenny O’Brien I have been given plenty of interesting and fun tasks to complete.

The main tasks that I have been asked to help with include cataloguing data, adding subject terms and amending and adding data to the ADS main website for the Berkshire Archaeological Journals and School of Archaeology Monograph Series by Oxford University. The cataloguing task involved tagging the collections with keywords and topics, as well as listing potential user groups I think will find each collection useful, or interesting, and giving each collection a rating as to how useful the collection  is. This collection review is aimed at being the first step towards a Cataloguing Policy, where cataloguing projects and processes can be prioritised based on specific collections assessment criteria.

Continue reading Defining ‘Usefulness’

The exciting world of Metadata

Metadata.

Something extremely important to the long-term health and reuse of data and yet the mere mention of it can cause people to shut off and run away. So, what is it and how is it different from data?

Metadata is the data about data. I think that sums it up quite nicely, don’t you? Ok, let’s phrase it a different way. It’s essentially the documentation needed to make the data findable, understandable, and useable. It allows for verification of claims, reuse for future projects, and more.

Perhaps some visuals would help. Below is some data, 5 trench raster images in this case. In which English region was each photo taken?

Continue reading The exciting world of Metadata

More ‘exam’ success! Certification and membership of the ISC- World Data System (WDS)

Earlier in the year we reported on a successful outcome from CoreTrustSeal (CSA) certification application, becoming the fifth repository in the UK to achieve this important standard. As an organisation, we are always pushing hard to ensure that our activities meet with good practice within the archaeological and heritage sectors, but also within the wider digital data communities. With this in mind, we are excited to announce acceptance as a regular member of the World Data System (WDS) and a certified Trusted Scientific Data Services.

Continue reading More ‘exam’ success! Certification and membership of the ISC- World Data System (WDS)

Data Mining with past publications from the ADS: The search for Neolithic crannogs

As part of a broader focus on the recently discovered Neolithic ‘crannogs’ – artificially-constructed islands – in Scotland, the Islands of Stone project has been conducting data mining on 148 volumes of the Proceedings of the Society of Antiquaries of Scotland, 5 volumes of Archaeologia Scotica and 71 volumes of Discovery and Excavation Scotland, which were kindly provided as a single download by the ADS.

Islands of Stone is an AHRC-funded collaboration between the University of Southampton, the University of Reading and Historic Environment Scotland investigating Neolithic ‘crannogs’ in the Outer Hebrides. The construction of crannogs, or artificial islands, in Scotland was generally thought to have emerged during the Early Iron Age (c. 800 BC); however, one artificial island in the Outer Hebrides known as Eilean Dohmnuill, or Donald’s Island, has demonstrated much earlier origins. Originally believed to be of Iron Age date, excavations by Ian Armit soon revealed large quantities of decorated Neolithic pottery which would ‘change the direction of the entire research programme’ (Armit 1991: 444-45).

Continue reading Data Mining with past publications from the ADS: The search for Neolithic crannogs

‘Accessioning Arch Camb’: Gwynedd Archaeological Trust Volunteer Engagement Project

Gwynedd Archaeological Trust volunteers have been researching digitised versions of Archaeologia Cambrensis, the Journal of the Cambrian Archaeological Association, as part of the ‘Accessioning Arch Camb’ project. Using journal volumes hosted on ADS and the National Library of Wales websites, the project is helping enhance the regional Historic Environment Record (HER) for north-west Wales.

Continue reading ‘Accessioning Arch Camb’: Gwynedd Archaeological Trust Volunteer Engagement Project

Summer Internship With the ADS: Heritage Open Days

The following is a blog written by Chloe Rushworth, who has recently completed a 4-week Voluntary Placement with the ADS. Chloe has been working with the Curatorial and Technical Team to investigate some new approaches to how we interact with data within the Archive. Below, she gives a run through on her huge contribution to creating a ‘Curated Collection’ collating data that relates to sites participating in Heritage Open Days. The aims of this project are for this collection to work as an educational tool, to both increase awareness and knowledge of the archaeological and historical importance of the sites that are taking part in the Heritage Open Days, and to show how the Archive can add to the experience of the Heritage Open Days themselves.

If you want to see the results, the Collection is now live. Over to Chloe!

Continue reading Summer Internship With the ADS: Heritage Open Days