Provenance Conceptual Architecture Breakout Group Report

Data Provenance and Annotation Workshop


1 December 12:30am - 3 December 12:00pm 2003
e-Science Institute, 15 South College Street, Edinburgh

Jim Myers (Chair), Syd Chapman, Monica Duke, Carol Goble, Dennis Groth, Robert Hutchison, Jessie Kennedy, Natalia Maltsev, Jun Zhao

The terms provenance and annotation have a wide range of potential meanings. Although the premise of the workshop is that there are strong commonalities that make it worthwhile to treat this range using common techniques, it is not yet clear how to compare and contrast the work of various research groups to prove or disprove this. The goals of the conceptual architecture group were to search for a level at which recording, managing, and make use of provenance and annotation across the range of definitions being discussed became uniform and to attempt to identify the set of services required at this level. This conceptual architecture could then be used to frame discussions across projects. The architecture’s generality and power can then be used as a measure of the degree to which the workshop’s premise is true.

To work towards these goals, the breakout group participants attempted to build three lists during their discussion: provenance/annotation ‘types’, use cases for such metadata, and required services.

Types (Dimensions/Layers/Facets/Aspects)

Provenance/Annotation Use Cases

Conceptual Services

The group discussed several additional topics. Participants noted that different types of provenance could be written independently, and were often write-once, while realistic use scenarios general involve reading/using multiple types of provenance aggregated together, i.e. with trust being built on a combination of process metadata, evidential relationships, reputations of data creators, etc. This was seen as a potential working definition of when the proposed architecture would be a useful way to model information. The use of provenance/annotation link depth and dimensionality (number of types of P/A) as a data quality measure was also discussed, though it was noted that such a measure is easily manipulated. Meta-P/A, P/A information applied to existing P/A rather than to the data itself (reification) was also discussed. The group concluded that the various types of P/A could be treated via a common conceptual architecture and that the services listed would be useful as a tool for comparing P/A implementations.