Kieron Niven, Archaeology Data Service / Digital Antiquity, Guides to Good Practice

File formats

A number of issues will be discussed below in reference to specific file formats but, in general terms, there are two areas of concern for archives that can be discussed in reference to text documents as a whole. The first of these, as is also seen in relation to a wide number of file formats, is the continually developing nature of formats used by word processing packages. Aside from the possibility of receiving files produced by now defunct software (e.g Wordstar), the continual development and enhancement of formats used by currently popular word processing packages often results in incompatibility between older versions of a file and the current version of the software. As mentioned above, the current move towards XML-based open standard formats such as .docx and .odt has been an attempt both to standardise these formats and to allow different software packages to read non-native formats. To some extent a similar problem has also been apparent with the PDF format and, again, the recent move towards an open standard (PDF/A [1]) has been an attempt to address these long term access issues.

Embedded objects

In addition to the file formats themselves, there are general concerns regarding the ability to embed content within text documents and the implications this has for preserving such content in the long term within the original document format. The most common type of embedded content is arguably images although in certain formats, most notably Microsoft Word and PDF, more complex content such as spreadsheets and video can be stored with the text document itself and often in a format which should be deposited and archived separately. It is generally recommended that, in addition to embedding, such content is stored and archived separately thereby retaining the original qualities of the content (e.g. image resolution) and allowing it to follow a separate archival strategy to the textual content.