The Arch-I-Scan Project, 2019-2023

Penelope Allisonorc id logo , Daan van Heldenorc id logo , 2026. https://doi.org/10.5284/1138130.

Overview

This overview provides further information on the three datasets that comprise the Arch-I-Scan Project’s data archive: photographs of sherds; photographs of near complete vessels; and 3D models of terra sigillata forms. This information is intended to help the user navigate the material on this archive.

Data resource of photographs of sherds from the collections of MOLA, ULAS, Vindolanda Trust and CIMS

Photograph of a <em>terra sigillata</em> rim sherd in the MOLA collection.

Fig. 1. Photograph of a terra sigillata rim sherd in the MOLA collection.

This section of the Arch-I-Scan Project’s data archive comprises a dataset of 197,260 photographs of 28,158 sherds, mostly terra sigillata with some other Roman fineware fabrics (particularly London Ware), from the collections of the project’s partners – MOLA, ULAS, Vindolanda Trust and CIMS. These photographs were taken by the Arch-I-Scan Project between 2020 and 2023.

Photograph of the profile of a <em>terra sigillata</em> sherd at 20x magnification, showing detail of the fabric.

Fig. 2. Photograph of the profile of a terra sigillata sherd at 20x magnification, showing detail of the fabric.

The higher-level folders of this dataset of sherd photographs are organised by vessel form, as much as possible, irrespective of the particular collection from which the sherds originate. To view the photographs of all the sherds from a particular collection a user would need to select and download the folders with the site codes of the relevant collection (see site codes below). Photographs of sherds that could not be securely identified to a particular terra sigillata form type, and some sherds in different fabrics, can be found in the folder named ‘Unassigned to form’. In most cases, vessel forms were provided in the partners’ datasheets supplied to the Arch-I-Scan Project, as discussed below, while in others they were identified (or re-identified) by the ceramic specialists working with the Arch-I-Scan Project. Within these higher-level folders are subfolders, each of which contains the photographs of a single sherd, there being usually seven photographs of each sherd. Six of these photographs are of the sherd and one is a magnified profile of the sherd’s fabric. The naming conventions for these folders and files are outlined below.

This dataset of sherd photographs is accompanied by a spreadsheet compiled by the Arch-I-Scan Project from information in the original datasheets that were provided to the project by its partners, and from information resulting from examination of the material by ceramics specialists on the project. The format and contents the datasheets provided by four different project partners varied so not all columns in the project spreadsheet include information from every datasheet. That is, in this spreadsheet we attempt to preserve most of the sherd information in the original datasheets that our partners provided us with, irrespective of whether the project photographed the actual sherds. This also means there are entries in the spreadsheet that do not have an identification number, as discussed below. We have included these records for material that has not been photographed for analytical purposes. Information on context or information related solely to the excavation (such as information on land use) has not been included in this spreadsheet. For sherds that had not been suitably catalogued by our partners for the project’s purposes, we have created further rows in our spreadsheet, but only for those sherds that were (re-)identified by the Arch-I-Scan Project’s ceramic specialists and that we actually photographed (see van Helden and Allison n.d.).

Within folder ‘Unassigned to form’, some sherd photographs are traceable in the spreadsheet and some are not. The reason why some are not traceable is that the file naming convention had to be changed during the course of the photographing programme. Rather than renaming the photographs, it was more expedient to re-photograph these sherds under a new naming convention (see van Helden and Allison n.d. Section 3.2). Essentially, these re-photographed sherds are traceable in the spreadsheet.

Because this photograph dataset was compiled for the purposes of training an AI classifier rather than for comprehensive artefact description, the information in this spreadsheet is less comprehensive than might be expected for a full post-excavation analysis of these ceramic remains. As such, an empty cell in a given spreadsheet column can only be taken as absence of information for that particular record if other cells in that column have been used for information on other sherds with the same project site code.

The link between sherd photographs and the entries in data spreadsheet is via the name of the sherd folder in the sherd photograph dataset, except in cases as discussed above. The name of the sherd folder leads one to the relevant row in the spreadsheet, but not as smoothly in the other direction because rows can contain information on more than one sherd, as outlined below. These names (e.g. ‘CL_780_AISID1341a) consist of three elements, separated by underscores. The information in the spreadsheet that is relevant for tying file names to spreadsheet rows is found in the columns whose headers are marked in grey. The first element in the photograph file names consists of Arch-I-Scan Project’s site code identifying the excavated site which, for the MOLA and ULAS and Vindolanda material, is found in column B, ‘Site code (area code)’. Because the Vindolanda Trust divides its material by excavation year, our project’s site codes for most of the Vindolanda material comprise ‘V’ followed by the year in which the material was excavated, found in column C in the spreadsheet ‘Excavation year’ (e.g. ‘V16’ for excavation year 2016). The code ‘Vx’ is used for material for which the excavation year could not be identified. For some material from Vindolanda, the first element of the file name also includes an ‘B’, in addition to the excavation year (e.g. ‘V07B’). This is part of the Vindolanda Trust’s context identifier, which does not play a significant role in tying the photographs to the information in the spreadsheet.

The second element in the file name, for the material from these three collections, is the context number, found in column D, ‘Context’, in the spreadsheet. Because of the manner in which the material from the CIMS collection is stored, as outlined below, these initial elements were not always consistently applied during the photographing sessions. To deal with these inconsistencies we have put all initial elements for the file names of this CIMS material in the context column, column D, in the spreadsheet. The third, or final, element is an identifier, which was developed using different protocols during the project’s photographing processes. If the third element is simply a number, this is the partner’s accession number or small finds number, found in column E in the spreadsheet, ‘Accession No/SF No’. If this third element in the folder name is made up of ‘AISID’ (for Arch-I-Scan IDentifier) followed by a number, this results from a file name developed by the Arch-I-Scan Project and can be linked to the relevant row in the spreadsheet using the column F ‘AISID no.’. Note that the numbers in column F are not necessarily unique, they are only unique in combination with the first two elements of the file name. The final naming element in the photograph file name is sometimes appended with a lower-case letter, or combination of such letters. This is to differentiate between different sherds described in a single spreadsheet row. In the above example, tracing the three naming elements in the spreadsheet leads to row 12541 in the spreadsheet, where the information on this sherd can be found. Photograph file name ‘CL_780_AISID1341b’ also leads to this spreadsheet row, which contains the information on both sherds.

The material from the Colchester and Ipswich museums was not stored by excavation project. As a result, it was not always straightforward to ascertain with certainty the site code and context, or to differentiate between them. Consequently, the initial elements in the photograph file names for this material do not always follow our convention as closely as for those from the other collections. All initial elements are found in the context column, column D, in the spreadsheet. In addition, some site abbreviations seem to have been wrongly entered as file names but can usually be worked out (e.g. ‘LCW’ instead of LWC’). Because the documentation was limited and the sources of the material varied, it was not always possible to know whether, and where, an error was made during the project’s recording sessions. So, we have left site abbreviations in the file names that could not be unequivocally matched to those in the Colchester Archaeological Report (Symonds and Wade 1999, 8-11). These file names can still be used unambiguously to identify the relevant spreadsheet rows, but this process is less clear cut than it is for photographs taken from other collections as the site codes and context numbers are not always distinct in the spreadsheet columns. For the same reason, identifying the photograph file names from the spreadsheet rows involves a little more detective work, but will always be unambiguous, especially in combination with the form identifications found in column I.

There are also photographs that have an alphanumeric string as their third element in their file name, in the form ‘p#u#’. These alphanumeric strings result from a stopgap solution to the file naming convention during the project’s photographing sessions which was later changed (see van Helden and Allison n.d. Section 3.2). Not many of these photographs were allocated to their form type, and are therefore not sorted in the folders by typological form. Those that have been allocated to form type are traceable through the combination of the first two name elements and their form identification, is the latter being found in column I in the spreadsheet.

In the spreadsheet in column I ‘Form’ some of the form identifications use the conventional ‘/’ for merged forms. In the names of the vessel form folders this has been replaced with ‘-’, since folder names cannot contain ‘/’.

Project site codes:

For the file names which identify individual sherds we have used project site codes as the first element. These codes are the site codes, sometimes modified by the Arch-I-Scan Project, by which our partners subdivided their excavations and the material resulting from this or, in the case of Vindolanda, they are generated from the year in which the material was excavated. These codes are listed below (see Tables 1-3) and are mainly in upper case lettering but some have been entered in lower case in the photograph file names. The different cases are not meaningful.

Site Code

Site name

TEQ10

Three Quays

FES15

Fenchurch Street

MOQ10

Moorgate

BZY10

Bloomberg

Table 1. Site codes for the MOLA material used by the Arch-I-Scan Project.

Site Code

Site name

CL

Causeway Lane

FLA8 (or FLA)

Freeschool Lane

A12_2015

Southgates

A12_2016

Stibbe

SMA2

St Margaret’s Baths-Vaughan Way

VSA22A24

Vine Street site

Table 2. Site codes for the ULAS material used by the Arch-I-Scan Project.

The formatting for the project site codes for the Vindolanda material is discussed above.

The material that we photographed at the CIMS stores was not organised by excavation project. As a result, the naming convention for the sherd photograph files was applied as best as possible using the information available about the sherds. Below is a table of the CIMS site codes and the associated sites as we could reconstruct them. The information on which these reconstructions is based can be found in the Colchester Archaeological Report (Symonds and Wade 1999: 7-11). Sometimes these site codes have been entered incorrectly in the image file names (e.g. BK75, BKC_75 or BKL75 for BKC75; and BK7b or bkc_76 for BKC76).

Site Code

Site name

Comments

236.84.LW.41

Lion’s Walk United Reformed Church (1984-85)

 

BH65_BG_R16

(not listed in Symonds and Wade 1999)

 

BKC

Balkerme Lane

 

BKC73

Balkerme Lane (1973)

 

BKC74

Balkerme Lane (1974)

 

BKC75

Balkerme Lane (1975)

 

BKC76

Balkerme Lane (1976)

 

BKCA

Balkerme Lane, subsite A

 

BUC

Butt Road (1976-79)

 

BUC76

Butt Road (1976)

 

BUC77

Butt Road (1977)

 

BUC79

Buck Road (1979?)

 

CGC78

Castle Gardens (1978)

 

COC79

Long Wyre Street (1979)

 

CPS

The Cups Hotel (1973-74)

 

CPS73

The Cups Hotel (1973)

 

cps74

The Cups Hotel (1974)

 

DTC75

Dutch Quarter (1975)

 

E5587

 

Should probably be ESS87

ESS

East Stockwell Street (1988)

 

GBS

The Gilberd School

 

GBS84

The Gilberd School (1984)

 

GBS85

The Gilberd School (1985)

 

G85

The Gilberd School (1985)?

 

GBSA

The Gilberd School Area A

 

GRS84

The Gilberd School (1984)?

Should probably be 'GBS84'

HEC76

St Helena's School

 

I81

Culver Street (1981-82 & 1984-85)

 

I81B

Culver Street Area B

 

IRA72

Inner Relief Road Site A (1972)

 

IRC73

Inner Relief Road Site C (1973)

 

LHC71

 

Should probably be ‘LWC71’

LCW76

 

Should probably be ‘LWC76’

LWC

Lion Walk

 

LWC71

Lion Walk (1971)

 

LWC72

Lion Walk (1972)

 

LWC73

Lion Walk (1973)

 

LWC74

Lion Walk (1974?)

 

LWC76

Lion Walk (1976?)

 

LW49

 

Maybe should be LWC - but surely not 1949?

MID

Middleborough

 

MID78

Middleborough (1978?)

 

NH

 

Not in Symonds and Wade 1999

NH65

 

Not in Symonds and Wade 1999

Sheepen

Sheepen

 

spt83

Spendrite, 61-62 High Street (1983)

 

st.gc75.6

St Giles' Masonic Centre 1975

 

V81

 

Not in Symonds and Wade 1999

X37

Small site/watching brief

 

X67

Small site watching brief

 

X119

Watching brief

 

X151

Small site watching brief

 

X223

Small site or watching brief

 

X401

Small site or watching brief

 

X9999

Small site or watching brief

 

Table 3. Site codes for CIMS material in the project spreadsheet, not all of which has been used by the Arch-I-Scan Project. Sites from which material was not used by the project, or which not clearly identified from the abbreviation are listed in the ‘Comments’ column

Abbreviations
The sherd spreadsheet contains many abbreviations used by the Arch-I-Scan Project’s partners who supplied their datasheets of the material the project photographed. Most of these abbreviations are self-explanatory but, for reference, this archive includes an adapted version of the London Roman pottery codes used by MOLA and are incorporated into our project spreadsheet. This includes the form codes that MOLA uses and which are replicated as a number in column I of our spreadsheet, before the Dragendorff and similar form codes. The Vindolanda system ceramics recording system is based on the Oxford Archaeological Unit system which can be found here: https://knowledge.oxfordarchaeology.com/library/448. This document can therefore be used as a reference for abbreviations in the Vindolanda data in our spreadsheet. The Vindolanda period codes (column T) and their date ranges are outlined in Allison and van Helden (n.d. Section 5, Table 3).

Data resource of photographs of near-complete vessels from the London Museum’s collection

Photograph of a near-complete Drag 29 terra sigillata bowl in the London Museum.

Fig. 3. Photograph of a near-complete Drag 29 terra sigillata bowl in the London Museum.

This section of the Arch-I-Scan Project’s data archive comprises a dataset of 13,504 photographs of 384 near-complete vessels, mainly terra sigillata but also some other fabrics (e.g. London ware and mica-dusted ware), from the London Museum’s collection of Roman ceramics. These photographs were taken by the Arch-I-Scan Project in December 2019 and used in the project’s pilot study (Núñez Jareño et al. 2021; van Helden et al. 2022).

The higher-level folders in this dataset are organised by vessel form, according to the form identifications Fiona Seeley made for the Arch-I-Scan Project from the photographs. Within these main folders the photographs are further grouped in folders according to individual vessels. We took multiple photographs of each individual vessel to capture information from different angles (see van Helden and Allison n.d. Section 3.2 for details on the protocol).

Photograph file names for this dataset start with a letter code which indicates what view of the vessel the photograph comprised: ‘s’ for standard photographs which portrayed the side of the vessel, ‘t’ for top photographs taken from directly overhead, or ‘f’ for flipped photographs in which the vessel was resting on its rim, or ‘h’ for a close-up photograph of any numbers and lettering inked onto the vessel. Where more than half of the vessel’s rotational shape was missing or had breaks, a lower-case ‘d’ (damaged) is included before the letter code, resulting in two letters being used.

The accompanying spreadsheet includes information regarding these vessels from the London Museum’s datasheet that was provided to the Arch-I-Scan Project. For each vessel this information comprises: the museum’s accession or ‘ID Number’; the ‘Use label’ given to the vessel by the museum; a description of the vessel; and its measurements. The further columns in this spreadsheet have been compiled by the Arch-I-Scan Project by extracting the form type from the description where possible and from the further form identifications and comments that Fiona Seeley provided on these vessels for the project. This spreadsheet includes the MOLA form code abbreviations referred to in the above section.

The vessel photographs in the dataset and the entries in the spreadsheet are linked by the museum’s vessel ID Numbers. In both the dataset folder names and the spreadsheet ‘/’ have been replaced with ‘-’ in the form identifications, since file and folder names cannot contain ‘/’. For the same reason, in the photograph file names the ‘/’ in some of the museum’s ID numbers has been replaced with ‘+’.

These photographs for this pilot study are also available through the project’s Github repository (Mirkes et al. 2025).

Data resource of 3D models of terra sigillata vessels forms

Diagram of the vessel simulation process.

Fig. 4. Diagram of the vessel simulation process.

This dataset consists of 64 3D models of terra sigillata forms generated from Peter Webster’s drawing of these forms (1996). These models were used to generate simulated vessels and sherds for the AI training process Starting from a digitised profile drawing the profile was extracted as an ordered list of points. With that profile we generated simulated photographs using the package Matplotlib or the 3D modelling tool Blender (see Núñez Jareño et al. 2021, 2.3; van Helden et al. 2022, 84-87). For the code for this process see Mirkes et al. (2025).

References

Allison, P. M. and van Helden, D. P. n.d. The Arch-I-Scan Project’s dataset of photographs of Roman finewares, in Allison et al. n.d. The Arch-I-Scan Project: Artificial Intelligence and other approaches to ceramic identification and analyses in the Greek and Roman worlds, Internet Archaeology special volume.

van Helden, D., Mirkes, E., Tyukin, I. and Allison, P. 2022. The Arch-I-Scan Project: Artificial Intelligence and 3D Simulation for Developing New Approaches to Roman Foodways. Journal of Computer Applications in Archaeology, 5(1), 78–95. DOI: https://doi. org/10.5334/jcaa.92

van Helden, D, P. and Allison, P. M. n.d. The Arch-I-Scan Project’s recording and image preparation processes: development and refinements, in Allison et al. n.d. The Arch-I-Scan Project: Artificial Intelligence and other approaches to ceramic identification and analyses in the Greek and Roman worlds, Internet Archaeology special volume.

Mirkes, E. M., van Helden, D.P., Zheng, Z., Tyukina, T. A., Tyukin, I. Y., Núñez Jareño, S.,J., and Allison, P. 2025. The Arch-I-Scan Project repositories. https://github.com/ArchiScn/Access,

Núñez Jareño, S. J., van Helden, D. P., Mirkes, E. M., Tyukin, I. Y., and Allison, P. M. Learning from Scarce Information: Using Synthetic Data to Classify Roman Fine Ware Pottery. Entropy 2021, 23, 1140. https://doi.org/10.3390/e23091140

Symonds, R. P. and Wade, S. 1999. Colchester Archaeological Report 10: Roman pottery from excavations in Colchester, 1971-86. Colchester: Colchester Archaeological Trust Ltd.

Webster, P. 1996 P. Roman Samian Pottery in Britain. York: Council for British Archaeology.