Journey into the archive with our new online gallery.
The Wonders of the ADS, is a digital exhibition dedicated to highlighting the outstanding digital data held in the ADS archive.
The Wonders of the ADS digital exhibition developed out of a collaborative project with Carlotta Cammelli, a Leeds University MA Art Gallery and Museum Studies student as part of her Masters dissertation. The project entitled Unearthing the Archive: Exploring new methods for disseminating archaeological digital data aimed to develop an innovative online approach to present specific digital objects (such as photographs, drawings, documents, videos and 3D data files) from the ADS collections in order to increase public engagement with the data in our archive.
Traditionally the ADS is used by researchers with specific interests in mind. The structure of the ADS into individual archives also means that sometimes interesting material can be buried within the vast quantity of data held by the ADS. Continue reading Wonders of the ADS:→
This is the first part of a (much delayed) series of blogs investigating the storage requirements of the ADS. This began way back in late 2016/early 2017 as we began to think about refreshing our off-site storage, and I asked myself the very simple question of “how much space do we need?”. As I write it’s evolving into a much wider study of historic trends in data deposition, and the effects of our current procedure + strategy on the size of our digital holdings. Aware that blogs are supposed to be accessible, I thought I’d break into smaller and more digestible chunks of commentary, and alot of time spent at Dusseldorf airport recently for ArchAIDE has meant I’ve been able to finish this piece.
Here at the ADS we take the long-term integrity and resilience of our data very seriously. Although what most people in archaeology know us for is the website and access to data, it’s the long term preservation of that data that underpins everything we do. The ADS endeavour to work within a framework conforming to the ISO (14721:2003) specification of a reference model for an Open Archival Information System (OAIS). As you can see in the much-used schematic reproduced below, under the terminologies and concepts used in the OAIS model ‘Archival Storage’ is right at the heart of the operation.
How we actually achieve this is actually a pretty complicated process, documented in our Preservation Policy; suffice to say it’s far more than simply copying files to a server! However, we shouldn’t discount storage space entirely. Even in the ‘Zettabyte Era’, where cloud-based storage is commonplace and people are used to streaming or downloading files that – 10 years ago – would have been viewed as prohibitive, we still need some sort of space on which to keep our archive.
At the moment we maintain multiple copies of data in order to facilitate disaster recovery – a common/necessary strategy for any organisation that wants to be seen as a Digital Archive rather than simply a place to keep files. Initially, all data is maintained on the main ADS production server maintained by the ITS at the University of York which is backed up via daily snapshot, with these snapshots stored for a month, and furthermore backed up onto tape for 3 months.
In addition to this, all our preservation data is synchronised once a week from the local copy in the University of York to a dedicated off site store, currently maintained in the machine room of the UK Data Archive at the University of Essex . This repository takes the form of a standalone server behind the University of Essex firewall. In the interests of security outside access to this server is via an encrypted SSH tunnel from nominated IP addresses. Data is further backed up to tape by the UKDA. Quite simply, if something disastrous happened here in York, our data would still be recoverable.
This system has served us well, however recently a very large archive (laser scanning) was deposited with us. Just in it’s original form it was just under a quarter of the size of all our other archives combined, and almost filled up the available server space at York and Essex. In the short term, getting access to more space is not a problem as we’re lucky to be working with very helpful colleagues within both organisations. Longer-term however I think it’s unrealistic to simply keep on asking for more space at ad-hoc intervals, and goes into a wider debate over the merits of cloud-based solutions (such as Amazon) versus procuring traditional physical storage space (i.e. servers) with a third party. However I’ll save that dilemma for another blog!
However, regardless of which strategy we use in the future, for business reasons (i.e any storage with a third party will cost money) it would be good to be able to begin to predict or understand:
how much data we may receive in the future;
how size varies according to the contents of the deposit ;
the impact of our collections policy (i.e. how we store the data);
the effect of our normalisation and migration strategy.
Thus was the genesis of this blog….
We haven’t always had the capacity to ask these questions. Traditionally we never held information about the files themselves in any kind of database, and any kind of overview was produced via home brew scripts or command-line tools. In 2008 an abortive attempt to launch an “ADS Big Table” which held basic details on file type, location and size was scuppered by the difficulties in importing data by hand (my entry of “Comma Seperated Values” [sic] was a culprit). However we took a great leap forward with the 3rd iteration of our Collections Management System which incorporated a schema to record technical file-level for every file we hold, and an application to generate and import this information automatically. As an aside, reaching this point required a great deal of work (thanks Paul!).
As well as aiding management of files (e.g. “where are all our DXF files?”), this means we can run some pretty gnarly queries against the database. For starters, I wanted to see how many deposits of data (Accessions) we received every year, and how big these were:
As the graph above shows, over the years we’ve seen an ever increasing number of Accessions, that is the single act of giving us a batch of files for archiving (note: many collections contain more than one accession). Despite a noticeable dip in 2016, the trend has clearly been for people to give us more stuff, and for the combined size of this to increase. A notable statistic is that we’ve accessioned over 15 Tb in the last 5 years. In total last year (2017), we received just over 3 Terrabytes of data, courtesy of over 1400 individual events; compared with 2007 (a year after I started work here) where we received c. 700Mb in 176 events. That’s an increase of 364% and 713% respectively over 10 years, and it’s interesting to note the disparity between those two values which I’ll talk about later. However at this point the clear message is that we’re working harder than ever in terms of throughput, both in number and size.
Is this to do with the type of Accessions we’re dealing with? Over the years our Collections Policy has changed to reflect a much wider appreciation of data, and community. A breakdown of the Accessions by broad type adds more detail to the picture:
Aside from showing an interesting (to me at least) historical change in what the ADS takes (the years 1998-2004 are really a few academic research archives and inventory loads for Archsearch), this data also shows how we’ve had to handle the explosion of ‘grey literature’ coming from the OASIS system, and a marked increase in the amount of Project Archives since we started taking more development-led work around 2014. The number of Project Archives should however come with a caveat, as in recent years these have been inflated by a number of ‘backlog’ type projects that have included alot of individual accessions under one much larger project, for example:
This isn’t to entirely discount these, just that they could be viewed as exceptional to the main flow of archives coming in through research and development-led work. So without these, the number of archives looks like:
So, we can see the ALSF was having an impact 2006-011, and that 2014-2016 Jenny’s work on Ipswich and Exeter, and Ray’s reorganisation of CTRL was inflating the figures somewhat. What is genuinely startling, is that in 2017 this ceases to be the case, we really are taking 400+ ‘live’ Accessions from Project Archives now. How are these getting sent to us? Time for another graph!
The numbers clearly show that post-2014 we are seeing alot more smaller archives being delivered semi-automatically via ADS-easy (limit of 300 files) and OASIS images (currently limited to 150 raster images). When I originally ran this query back in early 2017 it looked like ‘Normal’ deposits (*not that there’s anything that we could really call normal, a study of that is yet more blogs and graphs!) were dropping off, but 2017 has blown this hypothesis out of the water. What’s behind this, undoubtedly the influence of Crossrail which has seen nearly 30 Accessions, but also HLCs, ACCORD, big research projects, and alot of development-led work sent on physical media or via FTP sites (so perhaps bigger or more complex than could be handled by ADS-easy). Put simply, we really are getting alot more stuff!
There is one final thing I want to ask myself before signing off; how is this increase in Accessions affecting size? We’ve seen that total size is increasing (3 Tb accessioned in 2017), but is this just a few big archives distorting the picture? Cue final graphs…
I’m surprised somewhat by the first graph, as I hadn’t expected the OASIS Grey Literature to be so high (1.5 Tb), although anecdotes from Jenny and Leontien attest to size of files increasing as processing packages enable more content to be embedded (another blog to model this?). Aside from this, although the impact of large deposits of Journals scans (uncompressed tiff) can be seen in most years, particularly 2015, it does seem as though we’re averaging around 1.5 Tb per year for archives. Remember, this is just what we’re being given and before any normalisation for our AIP (what we rely on for migration) and DIP (what we disseminate on the website). And, interestingly enough, the large amount of work we are getting through ADS-easy and OASIS images isn’t having a massive size impact, just under 400Gb combined for the last 3 years of these figure.
Final thoughts. First off, I’m going to need another blog or two (and more time at airports!) to go deeper into these figures, as I do want to look at average sizes of files according to type, and the impact of our preservation strategy on the size of what we store. However, I’m happy at this stage to reach the following conclusions:
Over the last 5 years we’ve Accessioned 15 Tb of data.
Even discounting singluar backlog/rescue projects and big deposits journal scans, this does seem to represent a longer term trend in growth
OASIS reports account for a significant proportion of this amount: at over a Tb a year
ADS-easy and OASIS images are having a big impact on how many Accessions we’re getting, but not an equal impact on size.
After threatening to fall away, non-automated archives are back! And these account for at least 1.5Tb per year, even disregarding anomalies.
Right, I’ll finish there. If anyone has read this far, I’m amazed, but thanks!
ps. Still here? Want to see another graph? I’ve got lots…
Following the closure of Birmingham Archaeology (BUFAU), a project was initiated to identify and secure important born-digital archival material, and latterly to arrange transfer to the ADS. I’ve had the pleasure of archiving this digital material, including images, CAD files, databases and GIS over the last few months. The archives and reports of Birmingham Archaeology can now be accessed from the overview page: http://archaeologydataservice.ac.uk/archives/view/1959/
A total of 68 BUFAU archives have been released. Below I will highlight some of my favourite archives that I have worked on over the last couple of months.
Ahead of the redevelopment of Derby Inner Ring Road, Birmingham Archaeology was commissioned to undertake archaeological fieldwork. This site consisted of several different archaeological investigations, including a watching brief, an evaluation, an excavation and an historic building recording. Stratified archaeological deposits spanned a period from the 11th to the 20th centuries. This archive includes an extensive image gallery, reports, CAD files and GIS. Continue reading Birmingham Archaeology Digital Archives→
The ADS are pleased to announce that the ADS Library will be moving out of its Beta phase and go Live on Tuesday 16th January. Concurrently with this the ADS will also be launching a newly designed website. The main aim of the new website design is to make it easier for our users to access our searchable resources. With the launch of the ADS Library the ADS now provides three main heritage environment search tools:
Each of these tools should be used to search for different types of information held by the ADS. Archsearch is for searching metadata records about monuments and historic environment events in the UK. The ADS Archives is the place to search for historic environment research data (such as images, plans, databases) and contains international and UK data. The ADS Library is a bibliographic tool for searching for written records on the historic environment of Britain and Ireland. Where possible, the record will provide a direct link to the original publication or report.
In order to make the differences between these search tools clear to users, and to make all three tools easy to find from our main website, we will be introducing a new website menu with drop-down links that enable a user to go straight to each of our search resources. This new drop-down menu can be seen in the image on the right.
Users will also be given the option to access a main search page that will explain the differences between each of the available search options. This page will then allow you to choose which search facility to send your chosen keywords to.
The ADS has also taken this opportunity to redesign the layout of our website, creating a bold new home page, designed to better highlight our featured collections and news items, while providing links to our new search and deposit pages.
Our new Deposit page will also provide clearer links to the different types of data deposit options available to researchers wishing to archive data with the ADS.
Our new About page provides clear links to our operations policies and details of our governance.
The new design will include a help tab on our menu with links to frequently asked questions and our contact details, allowing users to troubleshoot problems faster and get the right help quicker.
The new design will reduce the number of main tabs in the menu. This means that some of our resources have moved location. For example our Teaching and Learning page will now be found under the Advice tab. However, despite the reduction in the number of main options on the menu, the introduction of the drop-down feature will mean that, in practice, more pages will be directly accessible from the menu than previously. Overall the new design will surface the most important pages of our website better and make our key resources accessible via fewer clicks.
Although the design and structure of the website has changed, and some things may now be found in a different location, very few URLs have changed. Only out-of-date pages have been removed so bookmarks to specific pages should still work, and Archsearch, the ADS Archives and the ADS Library are still navigated in exactly the same way. If you have any trouble finding resources please contact firstname.lastname@example.org .
It’s been another busy year for Internet Archaeology. One of the reasons I manage to just about stay on top of things is the help of a small number of volunteers who have given up their time to work on a whole range of aspects of the journal production, promotion and management. So I gladly namecheck Erica Cooke, Lesley Collett and Hayden Strawbridge.
This lovely infographic was created by Lesley and sums up the 2017 visitors and page views of the journal very nicely. It’s good to know that all that content we work on actually gets read…a lot. And if page loading takes just a few seconds longer on a Tuesday, now you know why!
On 30th November 2017 the first ever International Digital Preservation Day will draw together individuals and institutions from across the world to celebrate the collections preserved, the access maintained and the understanding fostered by preserving digital materials.
The aim of the day is to create greater awareness of digital preservation that will translate into a wider understanding which permeates all aspects of society – business, policy making and personal good practice.
To celebrate International Digital Preservation Day ADS staff members will be tweeting about what they are doing, as they do it, for one hour each before passing on to the next staff member. Each staff member will be focusing on a different aspect of our digital preservation work to give as wide an insight into our work as possible. So tune in live with the hashtags #ADSLive and #idpd17 on Twitter or follow our Facebook page for hourly updates. Here is a sneak preview of what to expect and when:
To mark the 2017 Open Access week, we thought it would be a good time to introduce the winner of our first Open Access Archaeology fund award (see our original announcement here), decided on after much deliberation and consideration by the panel of 3 independent judges. So…
Chris Whittaker carried out a survey at Breedon on the Hill, a multi-period hilltop site, as part of his undergraduate dissertation at Newcastle University, supervised by Dr Caron Newman. After graduating he worked outside archaeology in the technology sector. However conscious that his data was potentially at risk, he applied to the fund to help preserve the data and publish his findings. He has since started to study for a research master’s in settlement archaeology at Newcastle University.
The judges felt that Chris’ proposal – Breedon Hill, Leicestershire: an archaeological investigation at the multi-period hilltop site – was “an important site and methodically-collected dataset, which made good use of both Internet Archaeology and ADS, with the data having considerable potential for re-use to inform future fieldwork”.
About Breedon Hill
Breedon Hill, Leicestershire is a scheduled ancient monument. The hilltop was the site of a univallate hillfort present from the Early-Middle Iron Age. From the 7th century AD, a minster church was founded within the hillfort enclosure. Today, approximately two-thirds of the Iron Age rampart, and much of the hillfort interior, has been irretrievably lost due to quarrying (Figure 2). The investigation combined magnetometry and resistivity geophysical surveys, alongside digital terrain models (processed LIDAR data), to contribute to the understanding of the character and development of the hillfort interior and its immediate environment. Very little is known about the different phases of occupation at the hilltop, as previous excavations have primarily focussed on the ramparts, and so Chris’ investigation sought to address this issue.
The results of Chris’ geophysical survey reveal several phases of roundhouses and post-hole built structures, as well as several potential associated enclosures, in the south-eastern part of the hillfort interior. These will be published as part of a future open access article in Internet Archaeology and will link to a related digital archive deposited with the Archaeology Data Service. We are looking forward to working with Chris in the coming months.
Chris said “The work was undertaken while I was an undergraduate student, firstly as part of an independent summer research programme (processing the LIDAR data), and secondly as part of an undergraduate dissertation (undertaking the geophysical survey). Publisher or institutional paywalls are often barriers for local researchers to study the world around them. And I know from personal experience that projects such as the digitisation of volumes of the Derbyshire Archaeological Journal, preserved with the ADS, are of great benefit to local and school-level research alike. From a research perspective [open access] offers many opportunities for colleagues from different backgrounds to build on and potentially refine the resources preserved.”
And now, we start all over again…
As you know, the Open Access Archaeology fund is made up of donations, set aside to support the digital archiving and publication costs of those researchers for whom funding is simply not available despite research quality and whose digital data is potentially at greater risk.
Thank you to everyone for your support for our #OAFund which is now being used to support the open access dissemination of Chris’ work. Of course, in making the first award, we now need to start all over again to raise sufficient funds for the next round to help more early career and independent researchers like him. So please consider donating today and help to reduce the barriers to open archaeological research and advance knowledge of our shared human past.
Nine months ago, we launched our Open Access Archaeology Fund. We have sent our little USB trowels all over the globe by way of a ‘thank you’ and we have been thrilled with everyone’s generosity, not least in such austere times.
So, it makes us even happier to say that sufficient funds have now been accrued and we are in a position to make our first award to cover costs of an unfunded proposed archive or article. (Full details of eligibility can be found here)
So if you or someone you know, has already submitted an article proposal or approached ADS about an archive for which you have no funding, then you can apply to the fund today.
Have you donated yet?
The successful application will likely deplete the fund substantially but we did not want to delay making the first award – it is infinitely preferable that the benefits of the fund can be fast and tangible. However we need more donations to do it all again in 6 months time!
Every donation you make helps to ensure that more archaeological research is open and accessible.
Internet Archaeology and the Archaeology Data Service are working together on a project concerning the current and ongoing impact of our activities on publication policy and practice (which we are calling PUBLICAN for short). We’re especially interested in the impact digital archiving and publication has had on the commercial sector.
Can you help us to compile a national picture of how digital has changed and affected professional practice?
The ADS, Historic England and the Council for British Archaeology are pleased to announce the beta release of ADS Library.
Weaving a web of references.
The ADS Library is the fusion of existing datasets. These include journal and series backruns archived with the ADS, the Library of unpublished fieldwork reports (aka the Grey literature library) which is mostly populated with reports from OASIS and last but not least the British and Irish Archaeological Bibliography (BIAB) which is in itself a collection of different datasets which have been collected over the last hundred years.
The project to get these references online as a single resource has involved cleaning, mapping and enhancing the data from the different datasets. Allowing them to share the same data structure and hopefully give users as consistent information about each item listed in the library. Some records simply show the existence of a report or publication and others link out to the publication itself where available. There was some overlap in the combined datasets and we have endeavoured to merge records where appropriate in order to limit the existence of duplicates in the lists of results. Continue reading ADS Library: BETA version now online!→