Why does it take so long for a report uploaded to OASIS to make its way into the library?

The Grey Literature Library is one of the ADS’ most popular resources, and as shown by projects such as the Roman Rural Landscape, one that is of massive research value. The library is constantly growing, with most reports coming from the OASIS system. In 2013 alone, there were 3891 reports submitted. Feedback from all levels of the archaeological community makes it clear that the hosting of openly accessible digital grey literature is a boon. However, one of the questions we are most commonly asked is “why does it take so long for a report uploaded to OASIS to make its way into the library?”. This is perfectly understandable; people who have completed an OASIS record to share the results of their fieldwork want to make sure this effort is not in vain. Rest assured it isn’t, here’s a small insight into what’s going on underneath the workings of the library.

To enter the library the associated OASIS record has first to be completed and validated by the HER and relevant NMR. Once signed off, the report enters a list of all reports that need to be ‘transferred’. What we don’t often make public is the detailed and technical nature of the next stage of this process, but it’s the difference between simply putting files on a server, and managing a resource as part of the duties of an accredited digital archive. The archiving of every single grey literature report is something we’re very proud of here at the ADS: it’s no use having this fantastic research resource if it’s not held in perpetuity. The task of archiving these reports falls to an ADS digital archivist and is assigned on a roughly bi-monthly basis. Experience has shown that it is simpler to archive reports in bulk rather than individually. As detailed in the ADS repository operations, each file we receive has to be dealt with as any other digital object:

  • Ingested (or accessioned) in its original form
  • Migrated to a suitable preservation format
  • Migrated to a suitable dissemination format
  • Document all stages, processes and the technical details of each object within the ADS Collection Management System (CMS)

The digital archivist assigned to the task will work in batches according to the individual contractor ― for example the reports for AC Archaeology or Wessex Archaeology are distinct collections ― and all the individual files for each report (note that a single report can exist as numerous files, the record at the moment is 58 for a single watching brief!) are moved from the OASIS system, and stored on the ADS preservation server as a unique accession within such a collection. A long-standing contributor to the library can have numerous accessions under their collection, so for example Suffolk County Council Archaeology Service has 41 accessions representing 1478 files. This accession process, including the names and types of these files are all logged in our CMS; thankfully, a great deal of hard work from ADS’s Developer Paul Young has meant that most of the database side of this process is now semi-automated.

The digital archivist then has to convert all the files in an accession to a suitable preservation and dissemination format, in the case of PDFs this is currently PDF/A. The trials and tribulations of this process are documented elsewhere in a recent article in ‘Information Standards Quarterly’ by Digital Archivist Ray Moore (2013). Suffice to say, to ensure a report is adequately preserved, with no loss of information or formatting, is often a significant undertaking. The current record is one day for a single report, which although a rare occurrence illustrates the degree of intervention that is often involved in digital preservation.

Once migrated, the preservation and dissemination files are moved to appropriate locations on ADS servers; the technical process such as hardware and software used for any migrations then has to be recorded within the CMS. We’re fortunate that we can record this at a batch level, so for example we could record the migration of 10 PDF 1.4 files to PDF/A 1B as a single event. However, quite often an accession will consist of a multitude of different file types, so the documentation of the archive process for each collection can involve a large number of processes.

Only once all the files within the archive package are fully documented can we move towards transferring the files to the library, the most satisfactory part of the job. Within the internal pages of the OASIS system are scripts to facilitate the transfer of a subset of OASIS metadata into the Grey Literature Library database. As a final step ― and thanks to the expertise of our Applications Developer Michael Charno ― the script also mints and registers a DOI for each report via the Datacite API, and stores it within the OASIS system. It’s then a case of checking that everything has run smoothly. If we look at what is becoming a typical grey literature task, in January 2014, 1011 reports were moved from OASIS into the library. This consisted of:

*2249 files, split into 99 accessions.The files consisted of:

  • 52 PDF/A 1A
  • 371 PDF 1.7
  • 601 PDF 1.6
  • 920 PDF 1.4
  • 284 PDF 1.3
  • 6 DWG
  • 10 Adobe Illustrator files

*These accessions belonged to 87 existing collections and 12 new collections.
*338 processes logged within the CMS.
*The replacement of 13 files that were unreadable, or had been mistakenly uploaded.
*The drinking of approximately 79 cups of tea.

So in answer to the original question of “where is my report?” the answer is, that it’s being looked after carefully by a committed team at the ADS!

One thought on “Why does it take so long for a report uploaded to OASIS to make its way into the library?

Leave a Reply

Your email address will not be published. Required fields are marked *