Experiments with Reification

In preparation for discussions of tagging at the 10 April 2001 telecon of the Joint Committee, I decided to see how various RDF (DAML) tools handle tagging and quoting of statements.

Experiment

For each of the tools, I encoded 4 statements of the form subject predicate object:

All of the statements except quotedStatement were added to the model, which was then serialized into RDF.

We intentionally used predicate2 and object2 for both statements about statements. This would be the case, for example, if we were recording the source pedigree and both taggedStatement and quotedStatement came from the same source.

Results

Each tool surveyed provides support for these statements. The source code, execution script, and results for each tool are shown in the table below.

Tool Code Script Results
RDF API rdfapi.java rdfapi.bat rdfapi.rdf
Jena jena.java jena.bat jena.rdf
N3 n3.n3 n3.bat n3.rdf

Analysis

The results are quite different.

Only Jena used the RDF M&S reified statement construct. N3 uses its own parseType="Quote" construct.

RDF API and Jena attempt to label taggedStatement and/or quotedStatement, but do so in different ways. Jena associates a (non-standard) ID with the predicate for taggedStatement. RDF API computes a hash of the subject/predicate/object for each statement, and uses it as the statement URI. N3 refers to the quoted statements themselves, using a (non-standard?) variant of rdf:Description

Other than the hash URI, RDF API loses quotedStatement.

When reading their own results, each of the tools should be able to properly reconstruct their basic model (with some loss of information in RDF API). When reading the results of other tools, I believe none of the tools can properly reconstruct the model.

Opinion

I like the idea of associating an ID with each statement reified in the same model. I'd like to be able to specify my own ID (e.g. taggedStatement and quotedStatement) but couldn't find a way to do that in the Jena API.

Adding an ID only to each reified statement doesn't provide a mechanism for someone to come back later and add a statement. Adding an ID to every statement seems excessive, so something like the RDF API hash or N3 quoting becomes necessary. A compact representation of quoting within a single URI may often not require a lot more space than the hash; this would preserve information and have 0 probability of collisions.

I prefer the terseness of the RDF API generated namespace names (e.g. a) to Jena (e.g. RDFNsId0). I would also prefer that Jena's generated identifiers be shorter (e.g. #_anon1 vs #RDFAnonId1).

Author

Mike Dean
$Id: index.html,v 1.8 2001/04/10 17:24:25 mdean Exp $