Preservation and Management Strategies for Exceptionally Large Data Formats: 'Big Data'

The key outcomes of the project will be raised awareness and recommendations on preservation and management strategies for exceptionally large data formats. In particular:

  • Preservation issues and practices for 'Big Data' archives.
  • Dissemination issues and policy practices for 'Big Data' archives.
  • Alternative approaches to data archiving strategies, especially concerning 'Big Data' archives including the potential for likely re-use of different types of data as an indicator of archive suitability.
  • Users and uses of 'Big Data'.
  • Cost implications of different approaches to 'Big Data' preservation and dissemination.

In achieving the above outcomes nine specific deliverables will be generated:

Interdisciplinary review of literature and good practice relevant to digital archivists, fieldworkers and researchers working with 'Big Data' in archaeology and cultural heritage management (outcome of the literature review is represented by footnotes in the documents below).

Representative directory and interest list of users and uses for 'Big Data' in the UK (73% of respondents to the 'big data' online questionnaire agreed to join such a list).

A user survey (results from the online questionnaire).

Documented review and debate regarding preservation and dissemination / reuse options for 'Big Data' (a successful workshop was held in York in November 2005).

Case studies (pilot studies) of current projects demonstrating the cost implications of the preservation and reuse options identified above (once procedures were estabished for a particular format it was found that Big Data fitted into the current ADS lifecycle costing model).

Three completed 'Big Data' archives, resulting in appropriately preserved data sets and access to data (dissemination) according to best practice as defined by the project. Covering data generated by maritime archaeology, laser scanning and LiDAR technologies. These archives will be stored for a five year period at suitable repositories on a distributed model determined by cost and suitability for access.

Characterisation and preservation / reuse / dissemination recommendations on other data formats highlighted during the activity of the project (the formats review).

Dissemination of the main contents of the 'Big Data' report through conference papers or session (following an interim presentation at the 2006 IFA conference the final outcomes were disseminated at CAA 2007).

Completion of the 'Big Data' Report: bigdata_final_report_1.3.pdf (845KB)

