Re: Tagging

From: Jeff Heflin ([email protected])
Date: 04/03/01


I wanted to comment on Jim's discussion of tagging, found at

http://www.cs.umd.edu/users/hendler/Tags.html

I think that the need for tagging (i.e., attaching additional
information to statements) is clear, and that it could have many
potential uses (e.g., providing source information, timestamps, degree
of belief, etc.). Jim suggests that we come up with a way to create
ontologies of these "tags" so that users can customize tag sets and
share the semantics of these tags in the same way that we currently use
ontologies for defining classes and properties.

There are two basic questions that need to be addressed:

1) What syntax should be used for this "tagging" scheme?

2) What logical constructs are needed to express the semantics of tags? 

I will make some preliminary remarks on both of these issues:

As for syntax, we should consider first if it can be done within the
realm of RDF. As much as I hate to say it, the infamous "reification"
mechanism may be suitable for Jim's purposes. You can use a resource
that represents a reified statement as the subject of any property, thus
the tagging ontology needs to specify a set of properties that have
domain rdf:Statement. If tag:timestamp is such a statement, then I could
say:

<rdf:Description>
   <rdf:subject resource="http://www.cs.umd.edu/users/hendler/Tags.html
/>
   <rdf:predicate resource="http://description.org/schema/Creator" />
   <rdf:object resource="http://www.cs.umd.edu/users/hendler/" />
   <rdf:type
resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
   <tag:timestamp>Tuesday, 27-Mar-01 03:59:08 GMT</tag:timestamp>
</rdf:Description>

This essentially says "The timestamp of the statement 'Jim is the author
of the Ramblings on Tags document' is Tuesday, 27-Mar-01 03:59:08 GMT"

However, there are two problems here. First, it is very verbose, and
would cause the size of RDF documents to be 4 times bigger than if the
statements weren't tagged. More importantly though, this does not
actually make the statement that "Jim is the author of the Ramblings on
Tags document." From RDFM&S, "The RDF graph is said to contain the fact
given in the statement if and only if the statement is present in the
graph, irrespective of whether the corresponding reified statement is
present."

However, RDF has a syntactic shorthand for reification that actually
makes both the statement and its reification. This relies on the way
bags work in RDF, which is a bit of a kludge. Here's how it goes: You
assign a bagId to the description, which essentially makes the reified
version of the statement the member of a bag, while asserting the actual
statement itself. Then, using the controversial "aboutEach" attribute,
you can make statements about every statement in the bag. Thus the
example above could be written as:

<rdf:Description about="http://www.cs.umd.edu/users/hendler/Tags.html"
                 bagId="S1">
  <dc:Creator resource="http://www.cs.umd.edu/users/hendler" />
</rdf:Description>

<rdf:Description aboutEach="#S1">
  <tag:timestamp>Tuesday, 27-Mar-01 03:59:08 GMT</tag:timestamp>
</rdf:Description>

An additional advantage of this approach is that you could apply a tag
to a set of statements in one fell swoop.

The details of this are provided in RDFM&S Section 4. Note, I am NOT
endorsing this method, just saying that RDF already has a (perhaps
imperfect) mechanism for doing tagging. How might we improve on this
mechanism? Well first, we might want to remove the dependence on bags.
It seems like a roundabout way of doing things. It would also be nice if
we could just make a simple addition (perhaps an attribute) to each
statement in order to "tag" it, but we need to be careful in doing so.
For example, if we tried saying:

<rdf:Description about="http://www.cs.umd.edu/users/hendler/Tags.html"
                 tag:timestamp="Tuesday, 27-Mar-01 03:59:08 GMT">
  <dc:Creator resource="http://www.cs.umd.edu/users/hendler" />
</rdf:Description>

then we run into a conflict with the meaning ascribed by RDF to its
first abbreviated syntax. As a result, we would actually be saying the
timestamp is about the Tags.html document and not the current statement
(BTW, this is a significant difference because we may want to have
statements with different timestamps in one document).

However, we may be able to say:

<rdf:Description about="http://www.cs.umd.edu/users/hendler/Tags.html">
  <dc:Creator resource="http://www.cs.umd.edu/users/hendler"
              tag:timestamp="Tuesday, 27-Mar-01 03:59:08 GMT" />
</rdf:Description>

Current RDF parsers may barf at this, but as far as I can tell, RDF does
not yet prescribe a meaning for it.

I'd be interested in hearing your comments or alternative solutions to
this problem.

Since this message is already pretty long, I'm only going to touch on
the problem of what logical constructs are needed to define the
semantics of ontologies of "tags." For something like tag:timestamp
there's probably not a whole lot of formal semantics we can give to it,
but other things, like claimants or belief factors, will require us to
specify how to determine the tags that would be associated with inferred
information. As a quick example, if A said P with 0.7 belief factor, B
said Q with 0.9 belief factor, and my ontology said P & Q -> R, then can
I conclude R, and if so, who should I attribute R to, and with what
belief? There are many possible answers to these questions, and this is
the sort of thing we would probably want to define in our ontology of
claims and belief factors (so that different people could have different
ontologies and use different systems). To do this, we would at a minimum
need primitives for describing an inference, and arithmetic operations
(to compute the inferred belief). What about other tagging systems?
Would they require additional primitives to describe their semantics?

Okay, I think I've rambled enough. Hopefully this can serve as a basis
for further discussion on this difficult problem.

Jeff



Jim Hendler wrote:
> 
> Sorry not to respond sooner - saw that Pat and Dan were wondering
> about what "tags" were, and decided I'd better try to write a careful
> statement with lots of citations and examples, instead of just random
> meanderings.  Unfortunately, I didn't have time to do the thorough
> job, and ended up with something only a little better than a quick
> email -- still, it's now on the web at [1] and perhaps can open a
> dialog -- I hope it will explain what I mean by the general term
> "tag" and it (very) quickly reviews the difference between annotated
> logics, higher logics, and "tag" schemes
>   Not sure how much of the telecon I can make tomorrow - I'll be in
> Boston w/Tim, but may have to leave soon after 4 to catch the plane -
> we'll see.
>   -JH
> p.s. DanC - thanks for suggestion as to Claris for Mac web paging,
> really makes it a lot easier than the raw html I've been doing by
> hand -- now if someone would just do a DAML tool that easy to use...
> 
> [1] http://www.cs.umd.edu/users/hendler/Tags.html
> --
> Prof. James Hendler             Program Manager
> DARPA/ISO                       703-696-2238 (phone)
> 3701 N. Fairfax Dr.             703-696-2201 (Fax)
> Arlington, VA 22203             [email protected]


This archive was generated by hypermail 2.1.4 : 04/02/02 EST