The Archaeology Data Service would like to wish everyone a very happy World Digital Preservation Day. We’re excited to be raising awareness of Digital Preservation and celebrating the work that we do.
We’re looking forward to reading the DPC’s new edition of the ‘Bit List‘ of Digitally Endangered Species released today and hearing about how our fellow archivists and the #DigiPres community are participating.
We thought we’d address a few of the ‘endangered species’ of file formats on this year’s list and see how they relate to the data that we receive as an archive for archaeological data.
The Archaeology Data Service is currently in its 23rd year of archiving heritage data and supporting research, learning and teaching with open access and high-quality digital resources. Our collections have developed in volume, scope and complexity and this year the ADS reached an impressive milestone of 50,000 grey literature reports archived in our Grey Literature Library.
The ADS currently holds over three million files and 22 terabytes of data, which have been preserved and made accessible in our digital archive.
A more important figure perhaps, is the number of preservation processes that have been carried out (over 33,000!), with 3287 of these carried out in 2019 alone. These processes include creating, editing or enriching metadata, migrating files to more stable preservation formats and migrating files to more accessible and commonly used file formats for dissemination and reuse by the archaeological community. Keeping appraised of the accessibility of file formats is certainly a significant digital preservation challenge that we face, and there are currently over 800 distinct file formats in our archive.
Resources such as the DPC’s ‘Bit List’ of Digitally Endangered Species are therefore invaluable, helping archivists to identify at-risk formats and be proactive in our approach to preserving data. The bit list is a crowd-sourcing exercise to discover which digital materials our community thinks are most at risk, as well as those which are relatively safe thanks to digital preservation. This year’s version of the ‘Bit List’ has just been launched today and we’re excited to see what’s new for 2019.
Data outputted from archaeological data and research is often limited to a few standard formats (e.g. text reports and images), but can also produce some incredibly varied and niche data types. GIS, CAD, Lidar, photogrammetry, laser scanning, RTI and 3D data can often present unique preservation challenges for archivists.
We thought we’d have a quick review of several of this year’s ‘Bit List’ entries and see how they relate to our own collections and the data produced by the archaeological community.
The ADS faced a novel challenge this year, when we received our first-ever dataset of a survey of human-inhabited digital space. The No Man’s Sky Archaeological Survey was conducted by University of York PhD candidate Andrew Reinhard and investigated the ‘archaeological remains’ left by players of the No Man’s Sky multiplayer video game. You can read more about the project in Andrew’s Blog post for the ADS.
Old or non-current video games are listed as ‘Critically endangered’ in the ‘Bit List’, meaning they ‘face material technical challenges to preservation, there are no agencies responsible for them or those agencies are unwilling or unable to meet preservation needs’.
Although not preserving the video game itself, recording data within it and ‘surveying’ digital spaces offers a potential solution when this is not possible. Data Types submitted for the No Man’s Sky project consisted of site reports, survey spreadsheets and gameplay images and videos.
These file formats are relatively standard outputs of archaeological work and although audio-visual data is fairly uncommon, it can be found in several of our collections, for example as the output of Oral History Projects (e.g. The Urban Landscapes of Ancient Merv, Turkmenistan). The audiovisual data was archived in mp4 format (MPEG-4), and MP4 and ‘Video files’ have been listed as ‘endangered’, so this is certainly something for the ADS to keep an eye on in the future.
We hope that we will receive more data like this in the future and are looking forward to seeing how data from rapidly developing and innovative fields such as virtual reality and archaeogaming is managed and preserved.
Another ‘critically endangered’ entry is grey literature, which is certainly something that the ADS has an abundance of. OASIS is a dedicated data capture form for archaeological and heritage data, with reports submitted and validated by HERs before being archived in the ADS Grey Literature Library. This hopefully means that data loss due to grey literature reports not being archived is less of a problem in the archaeology sector than it may be in others. This is, of course, dependent on archaeological contractors and researchers submitting data to OASIS in the first place and the significance of the loss, as noted by the DPC, would impact on people and sectors around the world.
The DPC proposes the development of new preservation tools or techniques as a solution to preserving grey literature, and for the archaeological sector, this is already underway with the HERALD project. HERALD is the redevelopment of the current OASIS system currently being undertaken by ADS with support from Historic England. This should provide a more streamlined and efficient system for submitting and archiving archaeological grey literature. The ADS hopes that this will negate the loss of grey literature from the heritage and archaeology sector in the future.
PDF/A & PDF
The vast majority of OASIS grey literature reports are submitted and archived in PDF format, and PDF is one of our most frequently submitted file types. The entry of PDF/A and ‘PDF other than PDF/A’ to the ‘Bit List’ as ‘vulnerable’ and ‘endangered’ respectively is therefore of great significance to the ADS and archaeological data.
Although not one of our prefered preservation formats, a significant amount of literature is submitted to the ADS as PDFs. There are various reasons why PDF (and even PDF/A) is not ideal for data storage and preservation, with this having been discussed in more detail previously on the ADS blog. One of the most significant problems with PDFs is how content (e.g. images) are embedded in the PDF format, and if not preserved in their original form may result in a significant loss of data quality when retrieved at a later date. It is considered best practice to instead preserve data in its original format (e.g. as a text file and separate original image files) to ensure its safeguarding in the long term.
The ADS therefore, continues to encourage the submission of text in formats such as DOCX or ODT, and we are pleased to see that these formats have not made an appearance on this years list. The ADS continues to disseminate documents as PDF in a majority of cases, as it is clear that this is an accessible format and used extensively by the archaeological community, and indeed all communities.
It is reassuring to hear that the DPC suggests that only small effort may be required to preserve materials in this group, although PDF’s inclusion in the list at all perhaps suggests that as archivists we should be advocating more strongly for the submission of text data in formats that are more easily preserved.
Hopefully, we’ve provided a quick rundown of how and where this year’s ‘Bit List’ is relevant to archaeological data. The above is by no means exhaustive, and we’re looking forward to having a more thorough look at the full text of the BitList 2019 report over the next few days.
We hope you’ve enjoyed this year’s World Digital Preservation Day as much as we have. We’re always pleased to have an opportunity to discuss and raise awareness of Digital Preservation and the challenges we face as digital archivists.
More information about World Digital Preservation Day and how you can participate can be found on the DPC Website.