The Portable Document Format (PDF) remains the most popular and de facto format for the sharing of printable documents across the web. As such the PDF has become deeply embedded within personal, institutional and governmental workflows since its inception in 1993; indeed its pervasiveness is highlighted by the 100,000 or so PDFs within the ADS’ collections, making it by far our most common file type. As a result we thought it might be useful to provide some insight into the PDF, and its archival equivalent PDF/A, so that you can benefit from our (very!) long discussions and sleepless nights.
So what is PDF/A?
Essentially it is a constrained form of PDF version 1.4 that makes it more suitable for archiving and long-term preservation (the A meaning PDF/Archive). As an ISO standard (ISO 19005-1:2005) the file is a discrete and does not require external programs or information in order for it to be displayed. As a result certain content is prohibited (e.g. audio, video, Java, or other executable files, compression) and any encryption is forbidden. Similarly, unlike a regular PDF file which may substitute fonts that are unavailable, a PDF/A will store all fonts within the file structure. While certain metadata is also mandated. Within the PDF/A standard there are two levels of compliance:
- PDF/A 1a – meets all the requirements of the standard.
- PDF/A 1b – meets a much lower level of compliance, which allows for the retention of the visual appearance of file, but does not secure the structural or semantic properties of the file.
PDF/A: Why bother?
Unfortunately while PDF is an open standard it is still essentially a proprietary format and is generally regarded as unsuitable for preservation (see Guides to Good Practice for a fuller discussion), consequently PDF/A has become widely accepted as a viable alternative for preserving PDF content (e.g. Library of Congress). However, PDF/A remains in essence a proprietary format and is therefore far from ideal for preservation. As a result at the ADS we suggest that a better alternative for the long term preservation of files is the retention of those ‘original’ files that were used to create the PDF (Word, ODT, etc). Unfortunately this is not always possible consequently PDF/A is a next best alternative.
Creating a PDF/A?
The creation of PDF/A compliant files is becoming increasingly easy, with a wide range of commercial and freeware products available that can convert existing PDF files; whilst Microsoft Office 2007 and Open Office can handle direct conversions to PDF/A from their formats. A special note should be made that the PDF/A’s produced in this manner are generally only PDF/A 1b compliant. The high volume of PDFs within our workflow means that the only practical option for large PDF collections, such as the Grey Literature Library, is a batch process. Initial experiments with Adobe Acrobat proved unsatisfactory, but a higher degree of success was reached with PDFTron’s PDF/A Manager which not only allowed batch conversions but included a validation tool.
When is a PDF/A not a PDF/A?
Just too muddy the waters a little further, and sound a note of caution, when creating PDF/A files. You may have noticed the helpful blue banner (beneath the menu bar) that appears in Adobe Reader or Acrobat to notify you that the PDF you are reading, or have just created, is PDF/A file; unfortunately this message only notifies that the file claims conformance to the PDF/A standard, it does not mean that the file actually adheres to the standard. As a result we always find it worth checking that the outcomes of conversions are really PDF/A using a validation programme (here at the ADS we use Adobe Acrobat, but there are others available).
Complicating matters and future directions
In the last twelve months a second ISO standard (ISO 19005-2:2011), based on PDF version 1.7, called PDF/A 2. Unlike PDF/A 1
it allows JPEG2000 compression, supports transparency effects and layers, embedding of OpenType fonts, and digital signatures. It also allows archiving of sets of documents as individual documents in one file (PDF/A Competence Center, 2011).
Like PDF/A 1 there are levels of compliance, and those files with are PDF/A 1 compliant will meet the PDF/A 2 standard. A full appraisal of these developments is pending, but we will keep you up to date with any developments.
Further information on the use of PDF/A as a preservation format can be found in the ADS’ Guides to Good Practice.