Archaeology on Furlow: attitudes and expectations to online resources

Image of Mark Zuckerberg in a room full of people using Augmented Reality (AR) glasses
“‘ #Cyberspace . A consensual hallucination experienced daily by billions of legitimate operators, in every nation, by children being taught mathematical concepts…’ #Neuromancer , #WilliamGibson” by cyborglenin is licensed under CC BY 2.0

This week my colleague (Teagan Zoldoske) flagged up the following report:

Wiseman, R., and Ronn, P. (2020). Archaeology on Furlough: Accessing Archaeological Information Online: A Survey of Volunteers’ Experiences.

For those unfamiliar with the initiative, Archaeology on Furlough provides professional archaeologists in the UK with access to volunteer projects that can be done from home. This excellent report summarises the expectations and realities of using online resources for specific research needs. The ADS is cited frequently within, and I’m glad to see the overall positive response (see figures 2 + 3). The heavy use of the ADS Library, particularly unpublished reports, over Spring and early Summer 2020 is now partially explained!

Line chart of Access Statistics from the ADS Library
Export of Access statistics from the ADS Library (as of 31 October 2020) showing 103,464 downloads of articles and monographs, and 55,091 downloads of unpublished reports.

I’m always interested in the opinions and expectations around re-use, so it’s good to see these being documented as a single case study of a particular project and one that’s reported back to the sector so quickly. As the report heavily references the ADS, it’s also valuable to us so as to understand “what the public wants”! I’ll leave the arguments over how much this is influenced by the Google experience and big social media ‘platforms’ for another time…

As keen advocates of the FAIR data principles, we at the ADS know there’s a lot more we need to do to increase the findability of resources we hold, and so reports like this do genuinely offer up ideas to take forward as practical solutions. In addition there are also a few points of discussion it’s good to highlight as relevant to both ourselves, and the wider sector and how it perceives online resources and formats.

Issue 1: Users not knowing how to use the ADS website. On page 10 of the report a responder states: “I put in ‘Roman’ and
‘Cambridgeshire’ into ADS search engine and it came up with [just] 6 entries”. To me the same keyword search of Archsearch shows 3372 results, principally from the NRHE, but also the Archives and reports we hold. Individual searches of ADS Archives brings back 56 collections, and the ADS Library 1246 resources. So which ADS search engine is being used in this user case study? Do we need to create more resources to explain the data sources? How much is enough? Are users aware of the awesome AriadnePlus data portal which aggregates all our metadata alongside European partners?

Issue 2: Bibliographic records. The British and Irish Archaeological Bibliography (BIAB) housed in the ADS Library was singled out repeatedly for a lack of links to original source material. This indicates a general misunderstanding of the original BIAB dataset, which has a long history, but is no longer formally added to or updated. However, do users want to update these records themselves when they find copies online? If users could provide a list of records and hyperlinks (preferably DOIs), we can always develop into a formal plan of action. If you’re reading this and would like to do this, please get in touch.

Issue 3: “Google actually found the report in ADS for you, rather than trying to search in ADS and finding nothing” (page 18). As the authors note this is principally down to a lack of metadata provided by depositors, which in turn is used in the ADS catalogues. Google bots harvest, index and return the words it finds in the actual PDF files. We’ve experimented with this approach ourselves in the past, but as a not-for-profit organisation our capacity to build a series of algorithms which classifies the results and builds a useful results set is limited. Indeed, our focus is more on working with research partners to further investigate NLP techniques, and in turn how this can work with structured vocabularies to return results which mean something to an archaeologist both in terms of literal content but also significance. Leaving aside for now the debate about structured metadata versus search algorithms (time consuming but specialised user classification versus big business). To me the bigger question for an archaeologist is “are the results returning useful results”? Is Google saving time, or are you still having to do an assessment of significance/relevance for each PDF returned?

Issue 4: All hail PDF! It is Interesting to note (page 13 onwards) the responses on what makes resources easier or harder to use. The benefits and pitfalls of PDFs is of course well known to some connoisseurs, and I would always refer readers to this classic article which presents and alternative vision for how we technically produce site reports. However, it seems that ‘cheap and easy’ (a machine-readable PDF) is the pragmatic solution is an unstoppable force… It is however good to see from some responses that acknowledge that reports in PDF format only tell part of the story that is present in the original specialist data, something clearly identified by recent research projects (see for example Lodwick 2019). What was concerning in the Archaeology on Furlough responses was scant mention of the original data itself. Do users not want the original vector plans, shapefiles, spreadsheets of data to analyse and reformat?

Issue 5: The quality of site reports! On page 17 the authors report back on the problems of grey literature. There is a truth here, previously acknowledged by other studies in the area (Donnelly 2016; Evans 2015; Fulford and Holbrook 2018). In this case perhaps an assessment on the relative age of the reports would be more conducive to understanding ‘usefulness’. For example, the ADS Library holds reports from the mid-1980s onwards, with the quality of earlier reports well known to be somewhat ‘mixed’….

Anyway, much to think about…