Experiments with Reification

In preparation for discussions of tagging at the 10 April 2001 telecon of the Joint Committee, I decided to see how various RDF (DAML) tools handle tagging and quoting of statements.

Experiment

For each of the tools, I encoded 4 statements of the form subject predicate object:

taggedStatement:
subject predicate1 object1
statement about taggedStatement:
taggedStatement predicate2 object2
quotedStatement:
subject predicate3 object3
statement about quotedStatement:
quotedStatement predicate2 object2

All of the statements except quotedStatement were added to the model, which was then serialized into RDF.

We intentionally used predicate2 and object2 for both statements about statements. This would be the case, for example, if we were recording the source pedigree and both taggedStatement and quotedStatement came from the same source.

Results

Each tool surveyed provides support for these statements. The source code, execution script, and results for each tool are shown in the table below.

Tool Code Script Results

RDF API rdfapi.java rdfapi.bat rdfapi.rdf

Jena jena.java jena.bat jena.rdf

N3 n3.n3 n3.bat n3.rdf

Tool	Code	Script	Results
RDF API	rdfapi.java	rdfapi.bat	rdfapi.rdf
Jena	jena.java	jena.bat	jena.rdf
N3	n3.n3	n3.bat	n3.rdf

Analysis

The results are quite different.

Only Jena used the RDF M&S reified statement construct. N3 uses its own parseType="Quote" construct.

RDF API and Jena attempt to label taggedStatement and/or quotedStatement, but do so in different ways. Jena associates a (non-standard) ID with the predicate for taggedStatement. RDF API computes a hash of the subject/predicate/object for each statement, and uses it as the statement URI. N3 refers to the quoted statements themselves, using a (non-standard?) variant of rdf:Description

Other than the hash URI, RDF API loses quotedStatement.

When reading their own results, each of the tools should be able to properly reconstruct their basic model (with some loss of information in RDF API). When reading the results of other tools, I believe none of the tools can properly reconstruct the model.

Opinion

I like the idea of associating an ID with each statement reified in the same model. I'd like to be able to specify my own ID (e.g. taggedStatement and quotedStatement) but couldn't find a way to do that in the Jena API.

Adding an ID only to each reified statement doesn't provide a mechanism for someone to come back later and add a statement. Adding an ID to every statement seems excessive, so something like the RDF API hash or N3 quoting becomes necessary. A compact representation of quoting within a single URI may often not require a lot more space than the hash; this would preserve information and have 0 probability of collisions.

I prefer the terseness of the RDF API generated namespace names (e.g. a) to Jena (e.g. RDFNsId0). I would also prefer that Jena's generated identifiers be shorter (e.g. #_anon1 vs #RDFAnonId1).

Author

Mike Dean

$Id: index.html,v 1.8 2001/04/10 17:24:25 mdean Exp $