Felix Schäfer from IANUS visits ADS

Over the past two weeks the ADS has been extremely pleased to have hosted Felix Schäfer from IANUS for a training placement as part of the ARIADNE project. IANUS is a project to establish a National Research Data Centre for Archaeology and Ancient History in Germany. One of the reasons for Felix’s visit to ADS is to provide IANUS with a behind-the-scenes insight into the workings of a well-established and successful digital repository. Here is what Felix had to say about his time at ADS.

 

For two weeks I had the wonderful chance to stay and work at King’s Manor in York and look behind the scenes of the Archaeology Data Service. As IANUS is still a relative young project to build up a similar discipline specific  research data centre for the German archaeological and historical community, IANUS is very happy to see other successful institutions and learn from their experiences (and failures). And what better place to go than the ADS and look over the shoulders of the staff members, asking them numerous questions, inspecting their present and future systems, discussing issues about standards and guidelines and even processing some of my own German-type project collections according to the ADS’s workflows and checklists. All this has proven to be very inspiring and informative for me and I hope I can remember most of the insights when I’m back in Germany.

Hopefully, the learning and teaching from ADS will have twofold benefits. The first benefit is definitely gained from the information provide by ADS, e.g. what workflow steps are necessary to create AIPs (Archive Information Packages) and DIPs (Dissemination Information Packages) from SIPs (Submission Information Packages), which will speed up the construction of IANUS which still is in its conceptual planning phase and strives to offer “real” services to users as soon as possible. The second benefit is a long-term goal to apply the same or similar criteria for documentation, e.g. the type of metadata required for different file types, as the ADS, which will ease the international exchange of data across different international repositories as ADS has already successfully undertaken through the TAG project with tDAR.

So what did I actually do during my hours sitting within the medieval walls of King’s Manor? Well, in the first week I had several talks with different ADS staff members about different aspects of ADS. I got introduced to the internal Collections Management System, got explanations about the charging and licence policies, got an overview on all the social media channels being feed with information, and much more. At the end I had gained enough knowledge to process a small dataset from an archaeozoological project in Jordania, conducted by the German Archaeological Institute in Berlin. So, IANUS now has our very  first archive compliant collection, which in theory could go online if the data owners wished it to do so …

In my second week my focus was more on the Guides to Good Practice. One reason for this is that IANUS is now preparing similar recommendations for the German community, which we aim to published online on the IANUS  homepage in a first version by the end of this year. Already at the time of writing this IANUS has heavily profited from the enormous efforts ADS has put into the digital preservation field over the last years. Maybe there will be some passages at the end where the German “IT-Empfehlungen” can complement the English “G2GP”, who knows?

Another motivation for my visit to ADS was to write a case study about a specific aspect, namely the selection and retention of files in big data collections, and exemplify the documentation of files which are part of a longer digital process as it is in the case of laser-scanning or photogrammetry. For this purpose, I decided to use the digital documentation of excavation trenches in Pergamon/Turkey, a long-term project of the Istanbul department of the German Archaeological Institute. I just need to write this up during my last day here at King’s Manor and hopefully it will appear in the near future somewhere in the wonderful digital world of the ADS.

To end it just remains to say a very big thank you to Julian Richards who made my training placement possible and to all the members of the ADS staff who always had more than open ears and minds for my concerns, questions and discussions. The mental openness and willingness to share ideas and insights is impressive. I have enjoyed and profited enormously from my stay here and it might well be the case that it was not the last visit of a crew member of IANUS !

Digital Romans

This is the first of a two-part blog – the second will be a more detailed overview of the technologies involved in the digital dissemination – on the ADS’s work on what is colloquially known as the Roman Grey Literature project, but more officially as The Roman Rural Settlement of Britain. The project is funded by English Heritage and the Leverhulme Trust and is collaboration between ourselves, University of Reading and Cotswold Archaeology, which aims to produce a new synthesis of the rural landscape through the analysis of developer funded fieldwork. Some may be familiar with an earlier associated project, which has been archived by the ADS doi:10.5284/1000418, if you haven’t already seen it it’s well worth a look.

Funerary pottery

3rd century AD Pakenham colour-coat beaker (left) and 4th century AD Nene Valley colour-coat pentice moulded beaker from excavations at The Babraham Institute, Cambridgeshire. Cambridge Archaeological Unit doi:10.5284/1001160

As with nearly all major research projects, the main outputs for this consist of the usual hard-copy publications including a monograph and various journal articles but in addition to these there will also be a project archive held with the ADS. As we’ve been involved in the project from the beginning the archive will be a great deal more than the usual ‘downloads’ interface. Hopefully by the end of the project (at the time of writing summer 2015) we’ll have in place a well-formed and meticulous archive to allow sophisticated reuse of the data and grey literature sources collected by the team, thus facilitating and encouraging further analyses by the archaeological community.

What data is being produced?

The data collection is split into two parts; the first is the identification and collection of grey literature relating to Romano-British archaeology and is being undertaken by Cotswold Archaeology. I’ve been helping out here by providing the team with a list of reports already held by the ADS (we have a lot, over 20,000 and counting!), but also with making sure that the ADS hold the relevant permissions from the producers of the reports to archive. Cotswold has been making sure that digital reports are of a suitable standard (for example with OCR) and are catalogued with basic bibliographic and spatial metadata.

The second part of data collection is the analysis of the grey literature (and published sources) by the team at Reading – to do this I’ve built them a pretty large, but simple database to record fine details such as site type, presence of Early Medieval activity, numbers and types of coins, faunal remains, human remains and so on.

A snapshot of the working GIS: sites in the East of England

I’ve also set up a fairly detailed desk-based GIS for the researchers at Reading, this incorporates Web Mapping Services (WMS) such as geological maps from the BGS and a wide range of Heritage data including exports from English Heritage’s AMIE database, and National Mapping Programme (NMP) data. The latter is a fantastic resource for any ‘landscape’ study, but there are well known issues with legacy data. Fortunately Chris Green on the EngLaID project has very kindly shared the processes he built to deal with this. At this stage, the Access database can simply be added as a connection and queried in relation to this baseline data – for (a rather random) example “show me every record with 4th century inhumations and cattle bones within 100m of a villa”. Early signs are that the Reading team are producing detailed and informative results. At the moment the results sets are being saved as comma separated values and ESRI shapefiles.

What are the ADS going to do with it?

In the first instance we’re actively working towards the wider reuse of the grey literature which has been collected, scanned and documented by Cotswold Archaeology. At the moment we’ve already been given all the reports from the East of England which I’m working through as I write; the Southeast and East Midlands are expected soon. Using the metadata created by the project team, the reports will be added to the Grey Literature Library and assigned a digital object identifier (DOI) where they can be cross-searched along with the rest of the 20,000+ corpus. An additional facet of this process is the addition of grey literature records to the Archsearch index, allowing grey literature to be discoverable alongside inventory records and archives. Another aspect we’re happy about is the potential of making our grey literature records available to other organisations, particularly the HERs. This is achievable as the Cotswold and Reading team have been carefully recording HER Monument and Event ids during data collation. Thus we’re more than happy to produce an export of our grey literature database (with the DOI) for those records that have had a HER id listed. This can then be incorporated into the HER and thus the Heritage Gateway. A neat example of this can be seen for the Romano-British site at Waterbeach, Cambridgeshire on the Heritage Gateway; scroll down to the bottom and there are links to those reports on the ADS. I feel that this is a major positive feature to come out of the project – we’re not only collecting and reusing information from the HERs, but also giving them something back in return.

The web-pages for the project archive (to be released upon completion of the entire project) will aim to replicate the searches that one can perform on the various desktop software. In order to do this the database will be rebuilt in all its Oracle glory and available to query online – no software needed except your web browser. Another advantage of this will be the capability to enter incredibly detailed searches, so for example in the grey literature library interface you’re currently limited to thesaurus terms such as ‘COIN’; however in the project database you’ll be able to specifically pick out coins minted between for example AD348-364. Of course, once you’ve found the site records you’re interested in, you’ll be able to link straight through to the digital grey literature to investigate further.

In addition to all this, the desktop GIS will be replicated as a Web GIS – although technically speaking this will be Web mapping (see my next post). As with the desktop version we’ll be able to utilise WMS from external organisations to provide context. For the most part, the user will be able to explore the data via the predefined queries produced by the project team, but in addition these queries will be able to be broken down further on (pre-defined) facets. As mentioned above, more information (for the technical minded) will follow in the next blog, if you’re curious as to what this may look like an examples of this type of interface can be seen at doi:10.5284/1000151. All this, and with no need to download any software! Of course for the keen researcher, all original files will be available to download and reuse under the standard ADS Terms and Conditions.

Of course there’s the potential for a great deal more that we can do with this incredibly rich resource, but that can wait until another day…

Two new print publications of the Guides to Good Practice are out now!

The Archaeology Data Service and Digital Antiquity are proud to announce the print publication of two new Guides to Good Practice, Caring for Digital Data in Archaeology and Geophysical Data in Archaeology. These two new print publications are the culmination of three years’ work to update the online Guides to Good Practice (http://guides.archaeologydataservice.ac.uk/) to cover a wider range of archaeological data and to refresh the content with up-to-date information. 

A wide variety of organisations are both creating and retaining digital data from archaeological projects. While current methods for preservation and access to data vary widely, nearly all of these organizations agree that careful management of digital archaeological resources is an important aspect of responsible archaeological stewardship.

Caring for Digital Data in Archaeology 

This Guide to Good Practice aims to improve the practice of depositing and preserving digital information safely within an archive for future use, by providing information on the best way to create, manage, and document digital data files produced during the course of an archaeological project. To do this Caring for Digital Data in Archaeology: A Guide to Good Practice is separated into three primary sections:

1.    Digital Archiving: An Introduction to this guide focuses on the need for digital archiving through the use of two case studies as well as how to best use the guides.

2.    Planning for the Creation of Digital Data outlines issues surrounding data creation and capture, selecting data for digital archiving, documentation and metadata, as well as issues surrounding copyright and intellectual property rights.

3.    Common Digital Objects, the final section, outlines best practices specific to documents, data sets, and images.  Each section covers which formats are archival, and specific issues related to each file format or type.

Copies can be ordered online at: http://www.oxbowbooks.com/dbbc/caring-for-digital-data-in-archaeology.html

Geophysical Data in Archaeology

This 2nd edition of Geophysical Data in Archaeology: A Guide to Good Practice systematically explores what should be included in an Archive, illustrated with relevant examples. A conceptual framework is developed that allows assembling data and meta-data so that they can be deposited with an Archiving Body. This framework is also mapped onto typical database structures, including OASIS and the English Heritage Geophysics Database. Examples show step-by-step how an Archive can be compiled for deposition so that readers will be able to enhance their own archiving practice.

Geophysical data are sometimes the only remaining record of buried archaeological features when these are destroyed during commercial developments (e.g. road schemes). To preserve them in an Archive can therefore be essential. However, it is important that data are made available in formats that can still be read in years to come, accompanied by documentation that gives meaningful archaeological context. This Guide covers the creation of the necessary metadata and data documentation. There is no point preserving data if they cannot be used again.

Copies can be ordered online at: http://www.oxbowbooks.com/oxbow/geophysical-data-in-archaeology.html

These print publications are intended to be used in concert with the comprehensive online Guides to Good Practice site, which will be maintained with up-to-date information and provide more depth of content.

SPRUCE Hackathon – File Characterisation

The other week I had the opportunity to participate in the SPRUCE Hackathon hosted by Leeds University.  Hackathons are an opportunity for developers to get together and work on (or hack) common problems.  Typically hackathons in the USA are fuelled by Mountain Dew and pizza, but as this was a British hackathon it was mostly fuelled by tea and cakes (and mighty fine cakes thanks to Becky).  The hackathon was specifically focused on issues around file characterisation, which is precisely identifying and describing the technical characteristics of a file as well as its metadata.  This is an ongoing challenge for practitioners in the digital preservation realm since there are many file formats, many versions of those many file formats, and little consistency in the way these many file formats and their many versions internally identify themselves.  Digital archivists need to know more than just the file extension or format’s name, which Gary McGath sums up nicely in his recent Code4Lib article:

Just knowing the format’s generic name isn’t enough. If you have a “Microsoft Word” file, that doesn’t tell you whether it’s a version from the early eighties, a recent document in Microsoft’s proprietary format, or an Office Open XML document. The three have practically nothing in common but the name.

Thankfully there are a number of characterisation tools to help digital archivists with this, and of the attendees at the hackathon were some of the key developers behind the major tools such as JHOVE, JHOVE2, FITS, DROID and C3PO.  This provided an exciting opportunity to work alongside them on their tools and learn more about how the tools work.
Continue reading

ADS wins DPC Decennial Award

As part of its tenth anniversary celebrations, the Digital Preservation Coalition (DPC) awarded its Decennial Award, for an outstanding contribution to digital preservation, to the Archaeology Data Service.

We beat off intense competition from Library of Congress, the National Archives, and the International Internet Preservation Consortium, to take the award at a ceremony at the Wellcome Collection in London on December the 3rd.

The Decennial Prize – the DPC’s most prestigious – is awarded specially to mark the tenth anniversary of the founding of the DPC. It recognises the most outstanding work over the decade that the DPC has existed. After a painstaking assessment, an expert panel selected finalists from New York, Washington and London as well as York.

The ADS recieve the DPC Decennial Award

The ADS recieve the DPC Decennial Award

Our Director, Professor Julian Richards who accepted the award from Dame Lynne Brindley, said: “Winning this award is an outstanding achievement for the ADS and it is extremely gratifying to have the last decade’s effort and hard work recognised by our peers. The ADS was up against some stiff competition to win this first decennial award, so we are particularly thrilled to have received this tremendous accolade.”

William Kilbride, Executive Director of the DPC said: “These awards are important in showcasing the creative solutions that have been developed towards digital preservation. Digital preservation is critical. We know that significant parts of the economy, industry, research, government and the public life depend on the opportunities information technology creates, but the rapid churn in technology means data is also surprisingly fragile. We are the first generation that’s had to think about handing on a digital legacy, so we need to act quickly to develop the skills and techniques that will ensure our legacy is protected.”

In July, ADS also received the British Archaeological Award for Best Archaeological Innovation of 2012 in recognition of technical innovations it developed which allowed thousands of hitherto unpublished fieldwork reports to be made freely available online to any user.

Impact Project Update

The Impact of the Archaeology Data Service: a study and methods for enhancing sustainability

We are now just over half-way through the project that commenced in February 2012 and will conclude in July 2013. We have successfully completed desk research and two surveys of ADS Users and Depositors respectively.

In November we held our community focus group and presentation of interim results at a workshop in York. The aims of the workshop were to seek stakeholder feedback on the emerging results, establish any change of perception of the ADS amongst participants as a result of the study, and seek their views on how the study results might be presented to the archaeological community and its funders.

Invitations were sent to a range of sector representatives and eleven delegates attended the workshop, of which four were from the Local Authority sector, three from National Authorities, one from Universities, one from the Commercial sector, one shared university/commercial sectors, and one from Publishing. It was an extremely valuable day and the feedback will help shape our final phase of dissemination of the study results and contribute to our final report.

We have recently made our project workshop presentation of interim/provisional findings from the study and our post-dissemination activity value perception report (a report of workshop participant feedback) available on the project webpage.

We are now working on the final weighting of the economic analysis with the aim of incorporating the latest results in presentations, posters and leaflets that can be presented and distributed at forthcoming events during 2013 including the International Digital Curation Conference, The World Archaeological Congress, and Computer Applications in Archaeology.

ReACTing to digital archive requirements.

Logo Montage

In April this year our former colleague, Jen Mitcham, attended the inaugural SPRUCE digital preservation mash-up in Glasgow (16th-18 April 2012), an event organised by Leeds University Library as part of the JISC funded Sustainable PReservation Using Community Engagement Project (SPRUCE) which intends to foster wider communication within the digital archiving sector. During discussions at the event it was identified that one of the practical problems effecting the management of digital files within archives was the ability to compare and monitor the migration of files during various stages of the archive process (Millard 2012). At the outset it was identified that any solution to this problem needed to easy for to use and could be deployed directly from the desktop in order to a wider appeal to users of varying computing ability. During the event Andrew Amato (London School of Economics and Political Science) developed a series of tools, based around Microsoft Excel and VBA macros, which assisted in the audit of collections (Amato 2012). Having developed a proof of concept it was found that the uniqueness of repository infrastructures made the application of the tools problematic outside the specific organisations for which it was initially developed, as a result it was considered that a more generic version of the tool would have a broader appeal and potential use value within the wider digital preservation community. With this in mind a successful application was made to the SPRUCE award scheme allowing Andrew Amato and Ray Moore time to develop what was christened ReACT (Resource Audit and Comparison Tool) further, with a period of testing of the resultant tool on archives from ADS’ collections.

Continue reading

DPC Decadal Award Nomination: ADS short listed amongst esteemed company.

The announcement of the Digital Preservation Coalition (DPC) awards shortlist is always greeted with some excitement the digital community, but this year’s list was particularly well received here as the ADS due to our short listing in the ‘outstanding contribution to digital preservation in the last decade’ category. To be listed in such esteemed company as the International Internet Preservation Consortium, The PREMIS Metadata Working Group and The National Archives is an honour which reflects the hard work being carried out here at the ADS over the last 15 years. At the same time the nomination of subject specific data centre, the only one listed in the 2012 list, should be considered a tribute to the forward thinking attitude in archaeology and heritage management generally which places the discipline at the forefront on digital technology.

Continue reading

JISC-British Library data citation workshop

A summary of the July 6th JISC-British Library workshop on “metadata for effective data citation”  by Caroline Wilkinson  of the British Libraray is now up on the DataCite blog , this includes a summary of the presentation by the ADS’s own Michael Charno and a link to his slides. There is also a Mendeley reading list with links to articles and resources that are relevant to the workshop themes and to research data citation issues in general. All workshop presentations are available in full on the BL website.

PDF, or PDF/A: that is the question

The Portable Document Format (PDF) remains the most popular and de facto format for the sharing of printable documents across the web. As such the PDF has become deeply embedded within personal, institutional and governmental workflows since its inception in 1993; indeed its pervasiveness is highlighted by the 100,000 or so PDFs within the ADS’ collections, making it by far our most common file type. As a result we thought it might be useful to provide some insight into the PDF, and its archival equivalent PDF/A, so that you can benefit from our (very!) long discussions and sleepless nights.
Continue reading