Category Archives: BIg Data

Linking the virtuous circles: Citation and Tracking Re-use.

Network Analysis in Social, Business and Political Research | Macquarie  University | ACSPRI Courses | ACSPRI

The ADS has (for nearly 25 years!) been providing free access to resources deposited with us. We put them online in open/accessible formats, people use them, and people cite them. We know people use them because we have data on page views and downloads. Some things are used a great deal; often high profile research resources that always gain alot of mentions in literature and social media. Others have more of a cult following, but are still used sporadically.

All these access statistics always make a good basic demonstration of impact; we can pass them onto project funders and stakeholders to demonstrate quantitative success. However the follow-up questions normally enquire as to “who” is using this data, and for what purposes. The ADS have many ambitions in regards to its (meta)data, but facilitating and demonstrating this re-use is a high priority. Over the last year I’ve had a chance to think more about what we could and should be doing, and how it can help us, our users, and depositors make more of the situation…

The key to this are the Digital Object Identifiers (DOIs) we use. For those unaware, ADS use DataCite DOIs through our membership of a consortium lead by the British Library. We create DOIs for:

  • All our deposited collections
  • Upon request, distinct entities within a collection
  • All unpublished reports
  • Journal articles

These DOIs are registered with DataCite, and in doing so we also pass on key metadata for the Object (who created it, when it as created, where it realtes to etc). This metadata is then searchable in the DataCite interface, alongside records from other repositories that are part of the DataCite community such as Zenodo or Dryad.

When users use ADS resources they should be citing the DOI. For example when using material from the ever-popular Roman Rural Settlement project, any use of the data should follow our guidelines, for example:

Martyn Allen, Nathan Blick, Tom Brindle, Tim Evans, Michael Fulford, Neil Holbrook, Lisa Lodwick, Julian D Richards, Alex Smith (2018) The Rural Settlement of Roman Britain: an online resource [data-set]. York: Archaeology Data Service [distributor] https://doi.org/10.5284/1030449

Or for a Journal article:

Sparey-Green, C. (2002). Excavations on the SE defences and extramural settlement of Little Chester, 1971-2. Introduction. The Derbyshire Archaeological Journal 122. Vol 122, pp. 1-10. https://doi.org/10.5284/1066616

There are tools available from DataCite to reformat these into nearly all forms of Bibliographic reference, but it’s important to emphasise that any citation or reference should include the DOI and not the URL that appears in a web browser. For example it should be https://doi.org/10.5284/1066616 and never https://archaeologydataservice.ac.uk/library/browse/details.xhtml?recordId=3202768

Why? Primarily the DOI is persistent. No matter what happens to ADS applications in the future (for example an update to the Library may lead to us not using details.xhtml any more), a reference to the DOI will always take you to where the content is. Secondly, and most inportantly in this case it allows us, via a range of tools, to identifiy where our DOIs are being used.

One such tool is the DataCite Event API, a prototype developed in collaboration with Crossref to track citations of DataCite DOIs quoted as sources in academic papers. A quick search of this for ADS DOIs shows for example:

Image of JSON from the DataCite Event API which shows the citation of https://doi.org/10.5284/1007741 by a paper in  the Journal of World Prehistory

In this case the paper ‘Approaches to Interpreting Mesolithic Mobility and Settlement in Britain and Ireland’ published in the Journal of World Prehistory cited Wessex Archaeology (2006). Engand’s Historic Seascapes Final Report https://doi.org/10.5284/1007741.

In addition, there’s also the incredibly powerful CrossRef Event Data, a set of APIs that captures and records events that occur all over the web. This includes not only published articles but also Twitter and Wikipedia (including WikiData), So for example I can see

Image of JSON from the CrossRef Event Data API which shows the citation of the DOI https://doi.org/10.5284/1000266 by a wikipedia article

In this case, the Wikipedia article on the Sutton Hoo helmet cites Martin Carver’s data from the Sutton Hoo Research Project.

Capturing this sort of reuse, and mentions of resources in Twitter conversations (919 and counting) is to my mind a useful indicator not only of reuse, but a glimpse into the sort of conversations people may be having about our digital Objects.

The next step is for us to build a method to pull data from these APIs and incorporate back into our metadata as a dynamic process. This would mean that this page (for example) https://archaeologydataservice.ac.uk/archives/view/romangl/metadata.cfm is refreshed with information where we can demonstrate that https://doi.org/10.5284/1030449 is ‘Cited By’ XXX. Who knows, this could even be extended as an option to email a deposition when their data has been cited so that they know their data is being actively used.

Which brings me back to the title of this blog. The idea of a virtuous academic circle lies at the heart of what it is to publish – you publish your words/data, someone else uses it and cites it, you know they’ve used it (however this may be), this encourages you to publish more as you know your work must have some value. It also taps into what is at the core of what the ADS was set up to do: the archive/record is there to be used and maybe (hopefully?) reinterpreted and re-purposed. The archive needs to be used, otherwise there is arguably no point in having the archive.

However, without wanting to mangle my shapes, I think this model is more complex and more in-line with the sort of graph theory / social network analysis that is now de riguer. It’s good to know where our resources are being cited, but there’s a whole bigger world of possible study. What sort of Journals are ADS resources cited in, what sort of ADS resources are cited (e.g. is anyone citing the raw data?), what topics do these represent, who is citing etc etc. There’s material there for a new wave of study about citation habits and biases, or at the very least a PhD…

Anyway, for this to happen please remember to cite the DOI!

Rural Settlement of Roman Britain

Tim Evans

In June 2013 I wrote the first in what I planned to be a two part blog describing my work on the Rural Settlement of Roman Britain Project (henceforth RRS).  A little later than planned, here it is.

br104-1
Drawing of a columnar Roman milestone found c.1772 on the Fosse way two miles from Leicester, bearing the name Ratae (the unofficial logo for the project Web Mapping). Image from the Society of Antiquaries of London Catalogue of Drawings and Museum Objects doi:10.5284/1000409

Background

The RRS project arose from a two-stage  pilot project undertaken by Cotswold Archaeology and funded by English Heritage (now Historic England), Assessing The Research Potential of Grey Literature in the study of Roman England. This project identified the large levels of grey literature, the colloquial term for unpublished reports produced primarily through the planning process containing significant information about the Roman period.

The RRS project is being undertaken by the University of Reading and Cotswold Archaeology and funded by a grant from the Leverhulme Trust with additional backing from Historic England. The project has built on the pilot by reviewing all sources – traditional published journals/monographs and grey literature – for the excavated evidence for the rural settlement of Roman Britain with the over-arching aim to inform a comprehensive reassessment of the countryside of Roman Britain.
Continue reading Rural Settlement of Roman Britain

DADAISM Project

DADAISM logo

The DADAISM project brings together researchers from the diverse fields of archaeology, human computer interaction, image processing, image search and retrieval, and text mining to create a rich interactive system to address the problems of researchers finding images relevant to their research.

In the age of digital photography, thousands of images are taken of archaeological artefacts. These images could help archaeologists enormously in their tasks of classification and identification if they could be related to one another effectively. They would yield many new insights on a range of archaeological problems. However, these images are currently greatly underutilized for two key reasons. Firstly, the current paradigm for interaction with image collections is basic keyword search or, at best, simple faceted search. Secondly, even if these interactions are possible, the metadata related to the majority of images of archaeological artefacts is scarce in information relating to the content of the image and the nature of the artefact, and is time intensive to enter manually.
Continue reading DADAISM Project

The ADS goes to the Houses of Parliament

parliamentary office of science and technology logoThe Parliamentary Office of Science and Technology (POST) hosted an exhibition in the Members’ Dining Room in the House of Commons on Tuesday (15th July) to which the ADS were very pleased to be invited to participate.

POST is Parliament’s in-house source of independent, balanced and accessible analysis of public policy issues related to science and technology. POST publishes 20-30 POSTnotes each year, along with occasional longer reports and short POSTboxes. They focus on current science and technology issues and aim to anticipate policy implications for parliamentarians.

This exhibition focussed on ‘Big Data’ was arranged in collaboration with Research Councils UK, which represents the AHRC, NERC and the five other leading public sector bodies that fund research in the UK. Readers of this blog will already know of the ADS’ close relationship with the AHRC, and that we are the smallest of NERC’s data centres with a remit for science based archaeology.
Continue reading The ADS goes to the Houses of Parliament