All posts by Tim Evans

Linking the virtuous circles: Citation and Tracking Re-use.

Network Analysis in Social, Business and Political Research | Macquarie  University | ACSPRI Courses | ACSPRI

The ADS has (for nearly 25 years!) been providing free access to resources deposited with us. We put them online in open/accessible formats, people use them, and people cite them. We know people use them because we have data on page views and downloads. Some things are used a great deal; often high profile research resources that always gain alot of mentions in literature and social media. Others have more of a cult following, but are still used sporadically.

All these access statistics always make a good basic demonstration of impact; we can pass them onto project funders and stakeholders to demonstrate quantitative success. However the follow-up questions normally enquire as to “who” is using this data, and for what purposes. The ADS have many ambitions in regards to its (meta)data, but facilitating and demonstrating this re-use is a high priority. Over the last year I’ve had a chance to think more about what we could and should be doing, and how it can help us, our users, and depositors make more of the situation…

The key to this are the Digital Object Identifiers (DOIs) we use. For those unaware, ADS use DataCite DOIs through our membership of a consortium lead by the British Library. We create DOIs for:

  • All our deposited collections
  • Upon request, distinct entities within a collection
  • All unpublished reports
  • Journal articles

These DOIs are registered with DataCite, and in doing so we also pass on key metadata for the Object (who created it, when it as created, where it realtes to etc). This metadata is then searchable in the DataCite interface, alongside records from other repositories that are part of the DataCite community such as Zenodo or Dryad.

When users use ADS resources they should be citing the DOI. For example when using material from the ever-popular Roman Rural Settlement project, any use of the data should follow our guidelines, for example:

Martyn Allen, Nathan Blick, Tom Brindle, Tim Evans, Michael Fulford, Neil Holbrook, Lisa Lodwick, Julian D Richards, Alex Smith (2018) The Rural Settlement of Roman Britain: an online resource [data-set]. York: Archaeology Data Service [distributor] https://doi.org/10.5284/1030449

Or for a Journal article:

Sparey-Green, C. (2002). Excavations on the SE defences and extramural settlement of Little Chester, 1971-2. Introduction. The Derbyshire Archaeological Journal 122. Vol 122, pp. 1-10. https://doi.org/10.5284/1066616

There are tools available from DataCite to reformat these into nearly all forms of Bibliographic reference, but it’s important to emphasise that any citation or reference should include the DOI and not the URL that appears in a web browser. For example it should be https://doi.org/10.5284/1066616 and never https://archaeologydataservice.ac.uk/library/browse/details.xhtml?recordId=3202768

Why? Primarily the DOI is persistent. No matter what happens to ADS applications in the future (for example an update to the Library may lead to us not using details.xhtml any more), a reference to the DOI will always take you to where the content is. Secondly, and most inportantly in this case it allows us, via a range of tools, to identifiy where our DOIs are being used.

One such tool is the DataCite Event API, a prototype developed in collaboration with Crossref to track citations of DataCite DOIs quoted as sources in academic papers. A quick search of this for ADS DOIs shows for example:

Image of JSON from the DataCite Event API which shows the citation of https://doi.org/10.5284/1007741 by a paper in  the Journal of World Prehistory

In this case the paper ‘Approaches to Interpreting Mesolithic Mobility and Settlement in Britain and Ireland’ published in the Journal of World Prehistory cited Wessex Archaeology (2006). Engand’s Historic Seascapes Final Report https://doi.org/10.5284/1007741.

In addition, there’s also the incredibly powerful CrossRef Event Data, a set of APIs that captures and records events that occur all over the web. This includes not only published articles but also Twitter and Wikipedia (including WikiData), So for example I can see

Image of JSON from the CrossRef Event Data API which shows the citation of the DOI https://doi.org/10.5284/1000266 by a wikipedia article

In this case, the Wikipedia article on the Sutton Hoo helmet cites Martin Carver’s data from the Sutton Hoo Research Project.

Capturing this sort of reuse, and mentions of resources in Twitter conversations (919 and counting) is to my mind a useful indicator not only of reuse, but a glimpse into the sort of conversations people may be having about our digital Objects.

The next step is for us to build a method to pull data from these APIs and incorporate back into our metadata as a dynamic process. This would mean that this page (for example) https://archaeologydataservice.ac.uk/archives/view/romangl/metadata.cfm is refreshed with information where we can demonstrate that https://doi.org/10.5284/1030449 is ‘Cited By’ XXX. Who knows, this could even be extended as an option to email a deposition when their data has been cited so that they know their data is being actively used.

Which brings me back to the title of this blog. The idea of a virtuous academic circle lies at the heart of what it is to publish – you publish your words/data, someone else uses it and cites it, you know they’ve used it (however this may be), this encourages you to publish more as you know your work must have some value. It also taps into what is at the core of what the ADS was set up to do: the archive/record is there to be used and maybe (hopefully?) reinterpreted and re-purposed. The archive needs to be used, otherwise there is arguably no point in having the archive.

However, without wanting to mangle my shapes, I think this model is more complex and more in-line with the sort of graph theory / social network analysis that is now de riguer. It’s good to know where our resources are being cited, but there’s a whole bigger world of possible study. What sort of Journals are ADS resources cited in, what sort of ADS resources are cited (e.g. is anyone citing the raw data?), what topics do these represent, who is citing etc etc. There’s material there for a new wave of study about citation habits and biases, or at the very least a PhD…

Anyway, for this to happen please remember to cite the DOI!

Changes to the ADS Library

The scholar, Periander in his library with printed text. Reproduction after a woodcut, 1488-89. Credit: Wellcome Collection
CC BY.

Since a Beta release back in March 2017 we’ve received a great deal of feedback on the ADS Library application. We know it’s used intensively, with over 120,000 downloads in 2019, but as with any IT application there are places it can be improved!

For the uninitiated, the ADS Library was the outcome of a Historic England funded project to ensure the longevity of the British and Irish Archaeological Bibliography (BIAB). BIAB had traditionally been maintained by the CBA, with records added into the database by hand from extant sources (see Heyworth 1992). As this approach became less sustainable in the digital age, it was also deemed advisable to combine this dataset with the growing number of digital unpublished reports and journals and monographs held by the ADS, the former mainly derived through material uploaded to the OASIS system. This was also an opportunity for the ADS to align its records with BIAB, and to have a single interface to cross-search all written works it held (traditionally, files from unpublished and published works sat in different databases). Having a unified database, with access to free copies of published and unpublished reports has also been in line with Historic England’s HIAS Principle 4: ‘Investigative research data or knowledge should be readily uploaded, validated and accessed online’.

Continue reading Changes to the ADS Library

IWD2020

The strength of the ADS has always been the people who work here. As a team, we accomplish a lot. Out of the existing cohort of 13 staff, eight are female. Individually, and as a group, these women bring an array of knowledge, skills, and commitment without which we would be diminished. To coincide with International Women’s Day 2020, and in mind of its mission “To celebrate digital advancement and champion the women forging innovation through technology“, it is an opportune moment to celebrate our female staff. Even those who think they know the ADS, should read on to discover the vast array of expertise at hand (listed in alphabetical order)…

Continue reading IWD2020

OASIS and Archives


Saint Lawrence, by Bartolomeo Cesi [CC0]. Image from https://commons.wikimedia.org/wiki/File:Saint_Lawrence_MET_2000.495.jpg

Over the last few weeks (ether side of Christmas) As part of the HERALD project we’ve been making some progress on the part of the new OASIS which records the archive. As an archival body ourselves we’re keen – along with everyone else I’ve spoken to – that the new system improves on:

  • Recording what has been found/produced for archive
  • Allowing an archival body to produce in-form guidance on what it expects from a deposition
  • Making the archival body aware of events happening within their area/remit
  • Allowing the archival body and data producer to correspond at an early stage
  • Recording the deposition stage
  • Reflecting the differences in archive workflows in England + Scotland.
  • Signposting between physical and digital archives
Continue reading OASIS and Archives

in the dark near the Tannhäuser Gate

Blade Runner 1982, by Bill Lile Image shared under a  CC BY-NC-ND 2.0 licence

As it’s World Digital Preservation Day I thought I’d finished the following blog about our work with managing the digital objects within our collection. Like most of my blogs (including the much awaited sequel to Space is the Place) these often languish for a while awaiting a final burst of input. To celebrate WDPD 2018, here we go….

Continue reading in the dark near the Tannhäuser Gate

Space is the Place (part I)

server-racks-clouds_blue_circuit” by Kin Lane. CC BY-SA 2.0

This is the first part of a  (much delayed) series of blogs investigating the storage requirements of the ADS. This began way back in late 2016/early 2017 as we began to think about refreshing our off-site storage, and I asked myself the  very simple question of “how much space do we need?”. As I write it’s evolving into a much wider study of historic trends in data deposition, and the effects of our current procedure + strategy on the size of our digital holdings. Aware that blogs are supposed to be accessible, I thought I’d break into smaller and more digestible chunks of commentary, and alot of time spent at Dusseldorf airport recently for ArchAIDE has meant I’ve been able to finish this piece. Continue reading Space is the Place (part I)

Rural Settlement of Roman Britain: Salute!

A bronze figure of a boy on a chimera, found in Colchester in 1804. Image from Society of Antiquaries of London Catalogue of Drawings and Museum Objects (doi:10.5284/1000409). Not technically from a rural settlement but I like the picture!

In December of last year (2016), I completed the final stage of the digital archive and dissemination for the The Rural Settlement of Roman Britain project. The first publication and (revised) online resource were launched at a meeting of the Society for the Promotion of Roman Studies at Senate House of the University of London.

I’ve written previous blogs on the project, so won’t repeat myself here too much. Suffice to say that the final phase publishes the complete settlement evidence from Roman England and Wales, together with the related finds, environmental and burial data. These are produced alongside a series of integrative studies on rural settlement, economy, and people and ritual, published by the Society for the Promotion of Roman Studies as Britannia Monographs. The first volume, on rural settlement, has now been published, while the two remaining volumes will be released in 2017 and 2018.

The existing online resource has been updated both in content and functionality: the project database is available to download in CSV format, and most key elements of the finds, environmental and burial evidence have been added into the search and map interface. Hopefully the dissemination of the data in these forms allows re-use of this fantastic dataset in a variety of ways and, I hope, by a variety of users.

Example of the online map, showing weighted distribution of inhumation (black) and cremation (orange) burials

As with previous posts on this project, I’d like to say how much I’ve enjoyed working with the team at Reading and Cotswold. Producing an online archive and formal publication in tandem and in such a short time is no mean undertaking. I’m particularly happy/impressed with the determination by the researchers to make their data openly available at the earliest opportunity. Hopefully this is a benchmark that others will aspire to reach. A debt of thanks is also due to all those organisations that assisted the project, particularly the HERs of England and Wales who provided exports from their systems and aided the team at Cotswold with access to fieldwork reports. Finally, I’d have been lost without the awesome Digital Atlas of the Roman Empire created by Johan Åhlfeldt. At an early stage it became clear that creating any kind of ‘baseline mapping’ of Roman archaeology (combining NMP + HER data for example) would be problematic – both in terms of technical overheads and copyright. To do something on the scale of the EngLaId project’s ArcGIS WebApp simply wasn’t in the scope of the project! Johan’s work was thus timely and extremely useful in providing a broad backdrop of Roman Britain in which to compare the project results.

The rationale behind much of the interface work was to act as data publication of an academic synthesis and not to get tied down in building something akin to a Roman portal. Throughout the project we’ve been at pains to point out that this is very much a synthesis and interpretation of the excavated evidence in relation to a research question. Not a complete inventory or atlas of every Roman site. Indeed, it became clear that as soon as the data collation had been completed 31st December 2014 for sites in England and March 2015 for sites in Wales), it was effectively missing all the discoveries made in the following years. Thus although providing broad context was necessary in this case, if someone wanted to know everything about the Roman period (including sites not excavated) from a particular area they’d be best off consulting the relevant HER.

This in turn leads onto the $64,000 Question which I was asked at every event around England and Wales (including the final one in London). “What plans are there to keep this database updated”? Without wishing to appear pessimistic, I would always answer “None”. Aside from the logistics and finances of keeping a large database as this constantly updated, there’s also the fact that this is a very subjective synthesis of a much larger resource. To my mind, the key question is how do we make it easier for other researchers to build on this and have academic synthesis of a period or theme happen on a more regular basis. One of the answers to this is surely access to data, especially the published and non-published written sources. This isn’t really radical, and indeed increased access to data is being explored and recommended by the Historic England Heritage Information Access Strategy. The work of the Roman Rural Settlement project has many lessons to inform these strategies, some of which will form future papers by the project team. Out of curiosity I’ve undertaken my own analysis of the project database and ‘grey literature’ sources (a term I don’t like!) and the OASIS system but will save that for a separate blog post. ..

At the post-launch meal I did end up asking the team a rather cheesy question of “which is your favourite record”? The responses were often based around the level of finds, or in the relative level of information the site could add to a regional picture. My answer(s) were perhaps a little more prosaic, for example I really like records such as Swinford Wind Farm (Leicestershire) which has fieldwork reports disseminated via OASIS, and a Museum Accession ID. However my heart veers towards 42 London Road, Bagshot (Surrey): the site of my very first experience of archaeology as a somewhat geeky 16 year old. The site was never published, and thus it’s great to see it live on in this resource and with a link to the corresponding HER record to (hopefully) allow users to go and explore the wider area. Perhaps even to undertake their own research project. To my mind, to stimulate further work large and small that would be a great legacy of the project.

Tim

Reflections on CHNT 2016

Back in November (16th-18th), I was lucky enough to be invited to participate in the Cultural Heritage and New Technologies (CHNT) conference in Vienna. As detailed in my excitable post, written  in advance of the event, my involvement was to represent the ADS at the session and subsequent round tables hosted by the ARIADNE project on the subject of Digital Preservation. One of the reasons I was so excited was that it was one of the few occasions on which the focus of such sessions was solely on the issues surrounding Digital Preservation: how it’s undertaken, problems and the challenge of ensuring re-use. It was also the first time, in public at least, that individuals representing organisations undertaking Digital Preservation from across Europe came together to present as a united front and presented to the wider heritage community. In addition, the event also took place at the beautiful Vienna town hall in (see below), a fantastic venue.

Just a normal staircase at Vienna town hall. not intimidating in the least
Just a normal staircase at Vienna town hall. Not intimidating in the least

It was incredibly heartening to hear from European colleagues on their experiences, successes and challenges. I also felt that all the papers in the session – no doubt due to the diligence of  co-chairs from  DANS , DAI IANUS and the Saxonian State Department for Archaeology – meshed together really well. Although there were common themes, each was unique and presented a different tale to tell. Although somewhat biased, at the end of the formal session I came away thinking that I had not only contributed, but had learnt in equal measure. For those interested, IANUS have agreed to host the abstracts and presentations from the session on their website. I’d recommend these to everyone interested in a European-wide approach to the issues of digital archiving.

The first round table followed the formal session, and was listed as an open invitation for delegates to query the archivists in the room about where/when/what/how to archive. Surprisingly, considering the high profile parallel sessions, the room was packed with an array of people from a variety of backgrounds and countries across Europe. As such, the conversation veered between the extreme poles of the subject matter – for example the basic need for metadata versus adherence to the CIDOC-CRM. Reading between the lines here, what I thought the attendance and diverse topics showed was that this type of event was not only useful, but actually essential for archivists and non-archivists alike. Not only to correct misconceptions and to genuinely try and help, but also to alert us to the issues as perceived from the virtual work-face.

After a well-earned rest, and a quick visit to the Christmas markets for a small apfelwein, the next day was a chance for all the archivists to get together for an informal round table on issues affecting their long term, and shorter term objectives. Issues ranged from the need for accreditation – one of the ADS’ goals in this regard is to learn from DANS’ experience of achieving NESTOR – to file identification and persistent identifiers. In this setting the ADS is  perceived as very much the elder statesperson (!) in the room, having been in the business for 20 years now, and it’s a good feeling to be able to pass onto colleagues advice and lessons from our own undertakings. I think it’s important that we continue to do this, not only to be nice (and I like to think we’ve always been approachable!), but also to achieve a longer-term strategic strength. Although we (the ADS) are winning many of the challenges at home in terms of championing the need for consideration of digital archives, there’s always more to be done. When we can also point to equivalents in continental Europe, I feel we only make our cause stronger.

However I’m also conscious that this isn’t just a one-way street and that we’ve still a great deal to learn from our European colleagues. Not only in things like accreditation, but also shared experiences on tools, file formats, metadata standards and internal infrastructure. We often say that Digital Preservation never stands still, so in this regard it’s good to look at what others are doing and reflect on what we could do better.  Events such as this – and the international community of archaeologists doing Digital Preservation built in its wake – serve to make us richer in knowledge, and renewed of purpose. Looking forward to the next one!

Tim

Looking forward to CHNT Vienna

Next month, the Archaeology Data Service (ADS) are contributing to an exciting session at the CHNT conference in Vienna: Preservation and re-use of digital archaeological research data with open archival information systems. The session is being organised by partners within the ARIADNE consortium, and chaired by members of the Data Archiving and Networked Services (DANS – Netherlands), the Research Data Centre Archaeology and Ancient Studies (DAI IANUS – Germany ), the Saxonian State Department for Archaeology (Germany) and the ADS (UK).

The original rationale behind organizing the session was the need to ensure preservation and re-use of the ever-growing corpus of digital data produced through archaeological activity. Put simply, what we are creating must be available for future generations to consult, but also feed back into current research and practice. Accordingly, the focus of the session is on the services and duties of existing repositories and archives, including case studies and experiences of technical considerations such as formats, authenticity/validity and metadata. Participants will also offer wider perspectives on the rationale for curation, how it can be achieved, lessons learned, the relevance of the OAIS-standard and future challenges. Believing that there is no true preservation without re-use, the session also concentrates on dissemination; discussing accessibility, publicity (getting people to re-use data), and novel and creative methods of data publication as demonstrated through case studies.

Example of the 3D viewer created for dissemination of the archive from the Las Cuevas Project. http://dx.doi.org/10.5284/1036099
Example of the 3D viewer created for dissemination of the archive from the Las Cuevas Project. http://dx.doi.org/10.5284/1036099

The speakers are drawn from a range of cultural heritage institutions, representing a mix of established digital archives and current research projects that are investigating archival solutions, thus offering a range of international perspectives on the Session themes. From an ADS point of view, it will be great to meet up with familiar faces but also hear from (and get to know) new projects. In this vein the Session is followed by a Round Table which will allow for further discussion on topics, as well as allow those new to digital curation the discover more about the subject.

This inclusive participation, and learning from the experiences of international partners is a key theme of the ARIADNE project, and personally I’m excited to not only offer a UK perspective but also to learn from my colleagues and to feed back into my day-to-day role at the ADS.

Tim

A decade in data

birthday_badgeIt’s hard to believe, but next week will mark my 10 year anniversary at the ADS. I originally started on a one-year contract to oversee the archiving of key digital outputs produced by English Heritage ALSF projects (with the job title of ALSF Curatorial Officer), but have since stayed on in the role of Digital Archivist, more recently taking over responsibility as the ADS’ Preservation Lead.

The realisation that I’d spent a decade in one organisation initially triggered a Proustian flashback of projects, archives and even files I’d worked on, and thus the idea of a blog was born. I was tempted to call this blog something like “In Search of Lost Time” (time being a portmanteau of my first name and initial of my surname), but was perhaps a little floral as well as erroneous: here at the ADS we never lose anything…

Curious as to what I’d achieved over this period (apart from a sense of satisfaction in safeguarding humanity’s digital heritage), I returned to the ADS Collections Management System (CMS) to query what it was I had worked on. In short, I’ve been responsible for

  • 1018 accessions (the act of receiving and ingesting data from a depositor)
  • Arriving on:
    • 1 x 3.5 inch floppy disc
    • 298 x CD-ROMs
    • 46 x DVDs
    • 208 x Emails
    • 337 x FTP downloads
    • 12 x HTTP downloads
    • 87 USB hard drives
    • 30 USB memory sticks
    • 5523 Web uploads (via OASIS)
  • Archiving 377 collections
  • Updating/adding to a further 169 collections (Journals, collections of OASIS reports etc)
  • Curated 323,050 accessioned files (in 800,000+ files on our AIPs and DIPs)
  • Undertaken 4094 processes (e.g. migrations)
    • Of which 966 processes related to the creation of Preservation PDF/A (12,592 files if you’re curious)
  • Drunk at least 11300 cups of tea (a slightly spurious figure based on an average of 5 cups a day x (10x(annual working days – holiday)).

Over that time, and all those cups of tea, there are definitely some projects that stick in my mind as being memorable. So, to commemorate my decade in data, here  are my top 10 covering every year I’ve been at the ADS:

2006: Wearmouth and Jarrow Monastic sites. Volume 2 Appendix C

My first archive! Notable for using Tab delimited text, which was soon to be replaced as a dissemination format by Comma separated values.

2007: West Stow, Lackford Bridge, Suffolk

One of the first sizeable projects to come through as part of my ALSF work, this was instrumental in building up a strong start to the project. It’s also a useful dataset arising from a modern appraisal of an old rescue excavation.

2008: Land south-west of Ripple, Worcestershire

Although tempted to opt for Gwithian (check out the photos!), I went for this project which was completed in 2008. It’s a nice mixture of reports, data and photos from (to my mind) quite an important site, especially if you’re interested in the dating of pit alignments.

2009: Fieldwalking the cropmark landscape on the Sherwood Sandstone of Nottinghamshire

The first of a series of big projects I started to work on incorporating map and/or database interfaces. This one was built in ArcGIS Server.

2010: The evolution of Rome’s maritime facade: archaeology and geomorphology at Castelporziano

Primarily because I worked on the fieldwork project (look carefully for pictures of a youthful Tim), but also as it was at the time, the largest archive we held. A detailed archive for a very interesting site.

2011: The Deanery, Chapel Road, Southampton (OASIS ID wessexar1-92410)

Although at first appearance this is a somewhat modest archive, it represents a great leap forward. This was the first archive from an agreement between ADS and Southampton Arts and Heritage, whereby digital archives arising from development-led work in the City of Southampton would be passed onto the ADS. We now have several agreements with Local Authorities to perform this role (for example see Worcestershire), and it all started here. As an aside, I often use this archive as an example to show to students as it comprises a compact, well-documented dataset including reports, images and a plan – essential material for anyone working in/researching the city.

2012: A Long Way from Home: Diaspora Communities in Roman Britain

A great example of the archiving of an important research dataset, although I’m also swayed by the similarity of the man in the image on the introduction page and the ex-Everton manager David Moyes.

2013: Quarry Farm, Ingleby Barwick

The site is the most northerly known Roman villa surviving in the Empire, and the dataset is a useful companion to the published CBA Research Report.

2014: Palaeolithic and Mesolithic Lithic Artefact (PaMELA) database

The PaMELA database consists of two main parts: a literal digital transcription of Jacobi’s card index (the Jacobi Archive); and a searchable database with typological and chronological keys (the Colonisation of Britain database). I could spend hours browsing this archive!

2015: The Prehistoric Stones of Greece: A Resource from Field Survey

I expected to put down the Roman Rural Settlement of Britain project, but I won’t consider that finished until the final interface (with access to all data) is finished later this year. So I’ve gone for this project, a rescue of a dataset that had been available on another website, but subsequently removed. The interface has a strong spatial element, and after some thought I moved away from Google Maps and ESRI products (such as ArcGIS Server) to embrace OpenLayers. In the end the hard-learnt lessons (e.g. how to close a polygon?) reaped dividends in my work on the large map for the Roman project.

2016: Birmingham Archaeology (BUFAU) Digital Archives

Before working for the ADS, I’d spent most of my professional life working for Birmingham Archaeology (previously known as BUFAU). That organisation closed in 2012, and subsequently a project undertaken to ensure that all key physical and digital materials are transferred to a suitable archive. We’re only halfway through the project, but already we have the majority of the c.2000 reports written over the years, and a selection of digital materials. It’s been good to go back to where I started, and even to archive some of my own (not very good!) reports!

I’ll end the blog there, who knows, I may update this in another 10 years!

Tim