XML/RDF model theory

From: Peter F. Patel-Schneider ([email protected])
Date: 10/16/01
Next message: Peter F. Patel-Schneider: "Fw: Re: DAML+OIL submission to W3C (copyright/patent)"
Previous message: Mike Dean: "Joint Committee telecon today 16 October"
In reply to: Mike Dean: "Joint Committee telecon today 16 October"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]
I sent out an updated XML/RDF model theory to www-rdf-interest yesterday.
Just in case you don't get that, here is a version with a couple typos
fixed.

Where will this go?  Who knows?

Also, I have an approx. 250 line ML program that (probably) implements the
construction of canonical models and entailment for RDF (not RDFS as yet,
though) in this model theory, which I believe is consistent with RDF
entailment in Pat's model theory.  I need to merge this implementation with
the CAML implementation of the XQuery data model by Jerome Simeon.

peter


	A Radical Reinterpretation of RDF and RDF Schema plus Datatypes

		Peter F. Patel-Schneider
		Bell Labs Research


This is a radical rethink of how RDF and RDF Schema should work, but
actually doesn't change very much!  Note that this is a draft version of
serious change to the way that RDF and RDF Schema are defined.  There are
likely to be problems that need to be worked out!

Over the last little while I've been looking at XML Infoset, XML Schema,
and the new RDF data model.  I put together a different way of looking at
RDF and RDF Schema that places all RDF and RDF Schema processing after the
creation of the XQuery data model.  It also moves interpretations closer to
the XML way of looking at the world.


Supppose we really believed that RDF should use other W3C standards.  How
could we do that?  Well one way would be to have all initial processing of
RDF documents be done by other tools, and only do the RDF processing after
they are done.  (Note that DAML+OIL actually does a version of this, as its
input is a collection of RDF triples.)

Just what sort of processing should be handled by other standards?  There
are several potential answers to this, but the standard that does the most,
I think, is the XQuery Data Model.  This data model results in a tree, with
a considerable amount of processing having being done on the tree,
including XML Schema processing.  So the ``input'' to RDF will be (a slight
generalization of) the XQuery Data Model.

The next issue to be addressed is how differences between the XQuery Data
Model and RDF are to be handled.  There are several serious differences
that need to be addressed here.  First, the XQuery Data Model has an order
on the children of a node.  I propose that this be ignored.  Second, the
XQuery Data Model does not have edge labels.  I propose to move closer to
the XQuery Data Model by using two unlabeled edges with a ``label'' on the
middle node instead of a labeled edge.  This change means that there are
some interpretations that do not correspond with RDF interpretations.
Third, there is lots of information in the XQuery Data Model that is not in
the RDF model, such as comments and processing instructions.  I propose to
ignore almost all of this information.  Fourth, there are aspects of 
RDF that are not in the XQuery Data Model, such as node IDs.  I propose to
extract this information from the XQuery Data Model in much the same way as
it is proposed to be encoded in XML by the RDF M&S.


1/ Input

A data set is a set of nodes, N, from the XQuery 1.0 Data Model 
that is well-formed in that if n is in N then the children of n are also in
N, but that need not form a tree or have a document node.  (Due to the
treatment of rdf:ID, etc., tree data sets would be fairly general, however,
missing only a completely general treatment of blank nodes.)  Reference
nodes are not currently considered, but should be.

L is the lexical space of strings.
U is the value space of QNames.
UTS is the XML Schema Datatypes map from L to U, given the
    namespace declarations in scope at the point where the mapping is
    performed.   [This may need a bit more care to get exactly right.]

Just what counts as an identifier is a serious problem for RDF if it wants
to be a member-in-good-standing of the XML community.  The above makes the
(strong) assumption that QNames are suitable for RDF identifiers.  This may
not be correct, and readers could read the document substituting RDF
identifier for QName.

1a/ Example

Consider the following piece of a data set, ED, where nodes are represented
as tuples containing the relevant bits of information prefixed with a node
identifier.

1:<Person,attributes=[2:<rdf:about,"John">],
	elements=[3:<friend,attributes=[4:<rdf:resource,"Susan">]>,
		  5:<age,attributes=[6:<xsi:type,"xsd:integer">],
		         elements=[7:<"05">]>]>
8:<rdf:Description,
    attributes=[9:<rdf:about,"Susan">,
                10:<age,"6",simple-type="xsd:integer">],
    elements=[11:<rdf:type,
                  elements="[12:<rdf:Description,
				 attributes=[13:<rdf:about,"Student">]>]>]>


2/ Data Values and Datatypes

DV is the union of the value spaces of the XML Schema primitive datatypes
DT <= U are the QNames that reference XML Schema datatypes
	[This may need a bit more care to get exactly right.]
DTC : DT -> powerset ( DV ), maps XML Schema datatypes to their value spaces
DTS : DT -> ( L -> DV ), 
	contains the lexical to value maps for XML Schema datatypes
XTS : L -> powerset ( DV )
      v in XTS(l)  iff  v = DTS(dt)(l) for some XML Schema datatype dt

(If you didn't want to bother with datatypes, you could just work with
data sets where all text nodes are under nodes with string type.)


3/ Interpretations

An interpretation I is a four-tuple	 < IR, EXT, CEXT, IS >  
where IR is a non-empty set, called resources
      EXT <= IR x (IR u DV)
      CEXT : IR -> powerset ( IR u DV )
      IS :(partial) U -> IR
and IS(rdf:type) in CEXT(IS(rdf:Property))
    CEXT(IS(rdf:Description)) = IR
    CEXT(IS(rdf:Property)) <= IR
    if d in DT, then CEXT(IS(d)) = DTC(d), if IS is defined on d
    if < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT 
       then x in CEXT(z)
    if x in CEXT(z) and x in IR
       then there is some y in IR such that 
       < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT 

Loosely speaking, CEXT serves for both property and class extensions.  Or,
considered another way, a property is presented as a type whose values
and related tuples identify arcs in the traditional RDF graph structure.
[Thanks to Graham Klyne for this wording.]

We say that <s, p, o> is in I   iff
there is some r in IR such that <s,r> and <r,o> in EXT and r in CEXT(p)

Given an interpretation I = < IR, EXT, CEXT, IS > 
let P = { x : exists y such that x in CEXT(y) and y in CEXT(rdf:Property) }
and EXT' = EXT - { <y,IS(rdf:type)> } - { <x,y> | <y,IS(rdf:type)> in EXT }.
If P makes EXT' bipartite, i.e., all tuples in EXT' either originate or
terminate, but not both, in this set, and also each x in P has exactly
one incoming and one outgoing tuple in EXT', then I is an RDF interpretation.

An RDF interpretation can be turned into one of Pat Hayes's interpretations
by taking each pair of tuples <x,p> and <p,z> in EXT' where p is in P
and replacing them with <x,z> in IEXT(r) for each r such that p in CEXT(r)
then adding <x,c> in IEXT(IS(rdf:type)) for each x in CEXT(c) for x not in P.

Why use this more-complex notion of interpretation?  The big reason is to
be able to create a model-theoretic meaning for all XML documents and thus
to provide a foundation for the layer-cake view of the semantic web.

3a/ Example

Consider the following interpretation
	EI = < ER, EEXT, ECEXT, EIS>
where ER contains { j, s, P, S, f, a, tt,
		    tj, ts, fj, tfj, aj, taj, as, tas,
		    type, desc, prop }
      EEXT contains { <j, tj>, <tj, P>, <tj, tt>,
		 <s, ts>, <ts, S>, <ts, tt>,
		 <j, fj>, <fj, s>, <fj, tfj>, <tfj, f>, <tfj, tt>,
		 <j, aj>, <aj, 5>, <aj, taj>, <taj, a>, <taj, tt>,
		 <s, as>, <as, 6>, <as, tas>, <tas, a>, <tas, tt>,
		 <tt, type>, <tt, tt> }
      ECEXT(P) = { j }
      ECEXT(S) = { s }
      ECEXT(f) = { fj }
      ECEXT(a) = { aj, as }
      ECEXT(type) contains { tj, ts, tfj, taj, tas, tt }
      ECEXT(desc) = ER
      ECEXT(prop) = { f, a, type }
      EIS = { <"John",j>, <"Susan",s>, 
	      <"Person",P>, <"Student",S>, 
	      <"friend",f>, <"age",a>,
	      <"rdf:type",type>, <"rdf:Description",desc>,
	      <"rdf:Property",prop> }

The first line of EEXT makes John have type Person, the second line makes
Susan have type Student, the third line makes Susan a friend of John, the
fourth and fifth lines provide ages for John and Susan, and the last line
completes the typing information for the ``properties'' in a rather
circular, but well-defined, fashion.

To ``complete'' EI, ER has to contain elements that represent the
memberships in desc and prop, EEXT has to contain pairs that link these
elements up in the correct manner, and ECEXT has to be adjusted as well.

EI corresponds to data set ED, in a way that will be made formal in the
next section.

EI is an RDF interpretation, and corresponds to the following more-standard
interpretation  ES = < ESR, ESEXT, ECEXT, ESIS > 
where	ESR = { j, s, P, S, f, a, type, desc, prop }
	ESEXT = { < j, t, P>, < s, t, S>,
		  < j, f, s>, <j, a, 5>, <s, a, 6> }
	ECEXT(P) = { j }
	ECEXT(S) = { s }
	ECEXT(desc) = ER
	ECEXT(prop) = { f, a, type }
	ESIS = { <"John",j>, <"Susan",s>, 
		 <"Person",P>, <"Student",S>, 
		 <"friend",f>, <"age",a>,
	         <"rdf:type",type>, <"rdf:Description",desc>,
		 <"rdf:Property",prop> }


4/ Models and Entailment

An interpretation I = < IR, EXT, CEXT, IS > is a model for a data set N 
if IS is defined on all names in N and on all values for rdf:ID, rdf:about,
and rdf:resource, and there are mappings
      M : N -> IR u DV
      MA : N' -> DV, where N' is the attribute nodes in N
such that

     1.	for each n in N an element node, 
	    M(n) in IR  and  M(n) in CEXT(IS(name(n)))
	    if n has an attribute with name rdf:ID and string-value u
	       then M(n) = IS(UTS(u))
	    if n has an attribute with name rdf:about and string-value u
	       then M(n) = IS(UTS(u))
	    if n has an attribute with name rdf:resource and string-value u
	       < M(n), IS(UTS(u)) > in EXT
	    for each element, attribute, or text node child, n', of n
		     except for attribute nodes with name
		     rdf:ID, rdf:about, rdf:resource, or xsi:type
		< M(n) , M(n') > in EXT
	    if n has a simple type, d
	       then for each child, n', of n that is a text node
		    M(n') = DTS(d)(string-value(n'))

     2.	for each n in N a text node
	    M(n) in DV  and  M(n) in XTS(string-value(n))

     3. for each n in N an attribute node, except for those with name
		     rdf:ID, rdf:about, rdf:resource, or xsi:type
	    M(n) in IR   and  M(n) in CEXT(IS(name(n)))
	    MA(n) in DV  and  MA(n) in XTS(string-value(n))
	    < M(n), MA(n) > in EXT
	    if n has a simple type, d
	       MA(n) = DTS(d)(string-value(n))

This treats the ``structural'' RDF attributes by not placing them in the
model.  It would also be possible to uniformly add them where appropriate
and have semantic rules for them.

(This does not handle the second abbreviation style in RDF.  That
abbreviation style could be handled something like 
	if n has an attribute with name rdf:resource and string-value u
	   then for each attribute node child, n', of n
		< IS(UTS(u)) , M(n') > in EXT.
However, I think that this abbreviation should be removed.  I would
actually go even further and require that all RDF be written using the
third abbreviation style throughout.)

An RDF model I for N is an RDF interpretation I that is a  model for N.

A data set N entails another data set N'  iff
every model of N is also a model of N'.

4a/ Example

Now EI is a model of ED under the following mappings:

	M(1) = j
	M(3) = fj
	M(5) = aj
	M(7) = 5
	M(8) = s
	M(10) = as
	MA(10) = 6
	M(11) = rdf:type
	M(12) = S

The other nodes of ED are ``structural nodes'' and thus do not have a
mapping.  As XML Schema datatypes only show up in the ``structural'' nodes,
they don't need to be present in EI.


5/ RDFS

An interpretation I is a frame interpretation if the following are in I:

  <IS(rdfs:Description),   IS(rdf:type),        IS(rdfs:Class)>
  <IS(rdfs:Description),   IS(rdfs:subClassOf), IS(rdfs:Resource)>
  <IS(rdfs:Resource),      IS(rdfs:subClassOf), IS(rdf:Description)>

  <IS(rdfs:Resource),      IS(rdf:type), IS(rdfs:Class)>
  <IS(rdf:Property),       IS(rdf:type), IS(rdfs:Class)>
  <IS(rdfs:Class),	   IS(rdf:type), IS(rdfs:Class)>	[redundant]
  <IS(rdfs:Literal),       IS(rdf:type), IS(rdfs:Class)>

  <IS(rdf:type),           IS(rdf:type), IS(rdf:Property)>	[redundant]
  <IS(rdfs:subClassOf),    IS(rdf:type), IS(rdf:Property)>
  <IS(rdfs:subPropertyOf), IS(rdf:type), IS(rdf:Property)>
  <IS(rdfs:seeAlso),       IS(rdf:type), IS(rdf:Property)>
  <IS(rdfs:isDefinedBy),   IS(rdf:type), IS(rdf:Property)>	[redundant]

  <IS(rdfs:range),         IS(rdf:type), IS(rdfs:ConstraintProperty)>
  <IS(rdfs:domain),        IS(rdf:type), IS(rdfs:ConstraintProperty)>

  <IS(rdfs:Class),              IS(rdfs:subClassOf), IS(rdfs:Resource)>
  <IS(rdfs:ConstraintResource), IS(rdfs:subClassOf), IS(rdfs:Resource)>
  <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf), IS(rdfs:Resource)>	
								[redundant]
  <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf),IS(rdfs:ConstraintResource)>

  <IS(rdfs:isDefinedBy),   IS(rdfs:subPropertyOf),   IS(rdfs:seeAlso)>

  <IS(rdf:type),           IS(rdfs:range),  IS(rdfs:Class)>
  <IS(rdfs:subClassOf),    IS(rdfs:domain), IS(rdfs:Class)>
  <IS(rdfs:subClassOf),    IS(rdfs:range),  IS(rdfs:Class)>
  <IS(rdfs:subPropertyOf), IS(rdfs:domain), IS(rdf:Property)>
  <IS(rdfs:subPropertyOf), IS(rdfs:range),  IS(rdf:Property)>
  <IS(rdfs:seeAlso),       IS(rdfs:range),  IS(rdfs:Resource)>
  <IS(rdfs:isDefinedBy),   IS(rdfs:range),  IS(rdfs:Resource)>	[redundant]
  <IS(rdfs:range),	   IS(rdfs:domain), IS(rdf:Property)>
  <IS(rdfs:range),	   IS(rdfs:range),  IS(rdfs:Class)>
  <IS(rdfs:domain),	   IS(rdfs:domain), IS(rdf:Property)>
  <IS(rdfs:domain),	   IS(rdfs:range),  IS(rdfs:Class)>
  <IS(rdfs:label),	   IS(rdfs:domain), IS(rdfs:Resource)>	[redundant]
  <IS(rdfs:label),	   IS(rdfs:range),  IS(rdfs:Literal)>
  <IS(rdfs:comment),       IS(rdfs:domain), IS(rdfs:Resource)>	[redundant]
  <IS(rdfs:comment),       IS(rdfs:range),  IS(rdfs:Literal)>

A frame model for a data set N is a frame interpretation I that is a model
for N and satisfies the following extra conditions:

  RS1. CEXT(IS(rdfs:Resource)) = IR				[redundant]
  RS2. CEXT(IS(rdfs:Literal)) = DV

  if x in CEXT(y) and <y,IS(rdfs:subClassOf),z> in I
    then x in CEXT(z)					[2.3.2]

  if <x,IS(rdfs:subClassOf),y> in I and <y,IS(rdfs:subClassOf),z> in I
    then <x,IS(rdfs:subClassOf),z> in I			[2.3.2]

  if <x,r,y> in I and <r,IS(rdfs:subPropertyOf),s> in I
    then <x,s,y> in I					[2.3.3]

  if <x,IS(rdfs:subPropertyOf),y> in I
  and <y,IS(rdfs:subPropertyOf),z> in I
    then <x,IS(rdfs:subPropertyOf),z> in I		[2.3.3?]

  x in CEXT(IS(rdf:Property)) 
  and x in CEXT(IS(rdfs:ConstraintResource))
    iff  x in CEXT(IS(rdfs:ConstraintProperty))		[3.1.2]

  if <x,p,y> in I and <p,IS(rdfs:range),c> in I
    then y in CEXT(c)					[3.1.3]

  if <x,p,y> in I and <p,IS(rdfs:domain),c> in I
    then x in CEXT(c)					[3.1.4]

A data set N frame entails another data set N'	iff
every frame model of N is also a frame model of N'.
Next message: Peter F. Patel-Schneider: "Fw: Re: DAML+OIL submission to W3C (copyright/patent)"
Previous message: Mike Dean: "Joint Committee telecon today 16 October"
In reply to: Mike Dean: "Joint Committee telecon today 16 October"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]
This archive was generated by hypermail 2.1.4 : 04/02/02 EST