Over the past few months Holly and Myself have been working on ArchAIDE, an EU funded project which aims to create a new system for the automatic recognition of archaeological pottery from excavations. So far my main participation on the project has been to advise and write documentation on data management with a view to long-term preservation. In addition, I’ve also been helping to build the conceptual model of the database that will sit behind the ArchAIDE application.
To this end there’s been alot of work at looking at existing digital catalogues, principally Roman Amphorae and CeramAlex (a database of Ptolemaic and Roman pottery in Egypt) and using the extant data to facilitate filtering of results alongside the image-based recognition. Some of this is very simple, for example labeling the part of the sherd using common terms such as ‘handle’ or ‘rim’. Others that at first seemed simple have transpired to be more difficult than I envisioned. For example terms used to describe the appearance or form of a vessel or sherd may be consistent within a single schema, but not portable or applicable to other catalogues. In addition, how is the description of a term such as ‘beaded’ going to help an automated system?
Recently, we’ve also been considering of the use of geographical terms and/or geometries to assist in filtering results. The broad concept being that an extremely localised type ‘xxx’ will not usually appear (or has not been documented as found) in area ‘xzy. For example, to look at the record from the Teliţa type of amphora, we can see that it has a distribution limited to ‘Scythia’ (also where it is manufactured), and mapped to the concept of ‘Black Sea’ in that particular system. Looking more closely, it seems there are 38 countries/areas used to classify distribution. These range from smaller entities such as Cyprus or Belgium, to much broader terms such as ‘The Levant’ or ‘Mediterranean region’. When dealing with more common types such as Mid Roman Amphora 5, these broad regions become more understandable (see below).
So, thinking ahead. How could we build on this to try and help the application filter by where the sherd was found? And not only for Roman Amphora, but also collections from across archaeological periods and continental Europe, the Middle East and North Africa? To my mind there are three issues: text versus geometry, scale and consistency:
1) The initial proposal was to record countries or regions similar to that of the Amphorae collection. This is somewhat simplistic, and open to inconsistencies as the catalogue grows with the digitization of paper or museum collections. For example from a British perspective, do I record “Great Britain” “British isles” “United Kingdom”, “England” or”Yorkshire”? And although I may pick a term that suits me, someone else digitizing a catalogue may well choose a different term based purely on subjective reasoning. A simple database like or equals statement then potentially misses a positive match, or perhaps even returns a false positive.
To get around this at the ADS we use the Getty Thesaurus of Geographical Names (TGN), a structured vocabulary, including names, descriptions, and other metadata for extant and historical cities, empires, archaeological sites, and physical features important to research of art and architecture. Mapping terms to Getty subjects (for example see the entry for ‘England‘ or even ‘Northern England‘) not only allows greater consistency, but also flexibility in searching and subsequent results. For the recent ARIADNE project, Holly successfully mapped spatial terms – including those within Roman Amphorae – to Geonames. Although my personal preference is for TGN, purely because it records historical regions such as Scythia or Britannia, I think mapping to modern terms in Geonames would be more useful. Especially as that system supports bounding boxes and polygons for higher tier administrative regions.
Although this helps the accuracy of a ‘spatial’ filter, we’re still just restricted to modern administrative regions, which may bear little resemblance to archaeological distribution. If we did want to move towards utilizing capacity of a spatial database, then we’d have to think about the following issues of using our own geometries.
- We could create overarching zones such as Roman Amphora (e.g. Baleric Islands or southern Britain) but with a spatial extent stored as a polygon. Types/Classes would then be mapped to these if appropriate. The positives are that these relatively simple to create and administer, the negatives are that we would have to decide on, and then create these ‘zones’ which for most of Europe will be a hassle. Plus, how detailed do we go?
- Each class/type has it’s own extent polygon(s). This is potentially alot more accurate than Option 1, but also time consuming and potentially inconsistent if the extents are done by different people (as with use of text terms, this could vary between detailed and very broad!)
- The recording of X/Y values for findspots of a particular class/type. This has the potential for a higher level of accuracy, and capacity to build more intuitive map searches. However this is again potentially time consuming for non-digital collections as well as catalogues (such as Amphora) that do not reference sites at all. There’s also the danger that the bias of the coverage of some catalogues may unintentionally produce a skewed dsitribution that is not truly representative of the pottery type.
After talking this through with the project team we’re going to investigate (in addition to the mapping of any text terms to Geonames) option 2. Although this will require a certain amount of creation and curation it will really help the the application move beyond the restrictions of modern borders. In addition, the database will also look to support individual sites as points where they already exist in the catalogues we’re using. There’s also the potential for results from the user application to feed back into this, enhancing the coverage of the reference dataset.