a new way of thinking about RDF and RDF Schema

From: Peter F. Patel-Schneider (pfps@research.bell-labs.com)
Date: 10/12/01


	A Radical Reinterpretation of RDF and RDF Schema plus Datatypes

		Peter F. Patel-Schneider
		Bell Labs Research


This is a radical rethink of how RDF and RDF Schema should work, but
actually doesn't change very much!

Over the last little while I've been looking at XML Infoset, XML Schema,
and the new RDF data model.  I put together a different way of looking at
RDF and RDF Schema that places all RDF and RDF Schema processing after the
creation of the XQuery data model.  It also moves interpretations closer to
the XML way of looking at the world.


1/ Syntax

There are some syntax issues that have not yet been addressed, but the
intent should be clear.

A data set is a set of nodes, N, from the XQuery 1.0 Data Model 
that is well-formed in that if n is in N then the children of n are also in N,
but that need not form a tree.  (Due to the treatment of rdf:ID, etc.,
tree data sets would be fairly general, however, missing only a completely
general treatment of blank nodes.)

Let U be the value space of QNames.
Let U' be the canonical map from strings to the value space of QNames.
[This may need a bit more care to get exactly right.]

Let L be the lexical space of strings.


2/ Data Values and Datatypes

DV is the union of the value spaces of the XML Schema primitive datatypes

DT <= U are the QNames that reference XML Schema datatypes

DTC : DT -> powerset ( DV ), maps XML Schema datatypes to their value spaces

DTS : DT -> ( L -> DV ), 
	contains the lexical to value maps for XML Schema datatypes

XTS : L -> powerset ( DV )
      v in XTS(l)  iff  v = DTS(dt)(l) for some XML Schema datatype dt

(If you didn't want to bother with datatypes, you could just work with
data sets where all text nodes are under nodes with string type.)


3/ Interpretations

An interpretation I is a four-tuple 
	< IR, IEXT, ICEXT, IS >  

where IR is a non-empty set, called resources
      IEXT <= powerset ( IR x (IR u DV) )
      ICEXT : IR -> powerset ( IR u DV )
      IS : U -> IR

and IS(rdf:type) in ICEXT(IS(rdf:Property))

    ICEXT(IS(rdf:Description)) = IR

    ICEXT(IS(rdf:Property)) <= IR

    if < x , y > in IEXT, y in ICEXT(IS(rdf:type)), and < y , z > in IEXT 
       then x in ICEXT ( z )

    if x in ICEXT ( z ) and x in IR
       then there is some y in IR such that 
       < x , y > in IEXT, y in ICEXT(IS(rdf:type)), and < y , z > in IEXT 

This last introduces an infinite ``regress'' but one that, I think, is well
behaved. 

We say that <s, p, o> is in I   iff
there is some r in IR such that <s,r> and <r,o> in IEXT and r in ICEXT(p)

An interpretation I = < IR, IEXT, ICEXT, IS > is an RDF interpretation if 
P = { x : there is some y such that x in ICEXT(y) and y in ICEXT(rdf:Property) }
makes IEXT' bipartite, i.e., all ``edges'' in IEXT' either originate or
terminate, but not both, in this set, and and where each x in P has exactly
one incoming and one outgoing ``edge'' in IEXT', where 
  IEXT' = IEXT - { <y,IS(rdf:type> } - { <x,y> | <y,IS(rdf:type)> in IEXT }


4/ Models and Entailment

An interpretation I = < IR, IEXT, ICEXT, IS> is a model for a data set N 
if there are mappings
      M : N -> IR u DV
      MA : N' -> DV, where N' is the attribute nodes in N
such that

     1.	for each n in N an element node, 
	    M(n) in IR  and  M(n) in ICEXT(U(name(n)))
	    if n has an attribute with name rdf:ID and string-value u
	       then M(n) = IS(U'(u))
	    if n has an attribute with name rdf:about and string-value u
	       then M(n) = IS(U'(u))
	    if n has an attribute with name rdf:resource and string-value u
	       < M(n), IS(U'(u)) > in IEXT
	    for each element, attribute, or text node child, n', of n
		< M(n) , M(n') > in IEXT
	    if n has a simple type, d
	       then for each child, n', of n that is a text node
		    M(n') = DTS(d)(string-value(n'))

     2.	for each n in N a text node
	    M(n) in DV  and  M(n) in XTS(string-value(n))

     3. for each n in N an attribute node
	    M(n) in IR   and  M(n) in ICEXT(U(name(n)))
	    MA(n) in DV  and  MA(n) in XTS(string-value(n))
	    if n has a simple type, d
	       MA(n) = DTS(d)(string-value(n))

(This does not handle the second abbreviation in RDF.  That abbreviation
style could be handled something like 
	if n has an attribute with name rdf:resource and string-value u
	   then for each attribute node child, n', of n
		< IS(U'(u)) , M(n') > in IEXT.
However, I think that this abbreviation should be removed.  I would
actually go even further and require that all RDF be written using the
third abbreviation throughout.)

An RDF model I for N is an RDF interpretation I that is a  model for N.

(There are issues hidden here to do with the possibility RDF
interpretations.  The basic problem is that rdf:ID, rdf:about, and
rdf:resource have two mappings, and the ``normal'' one may preclude
RDF interpretations.)

A data set N entails another data set N'  iff
every model of N is also a model of N'.


5/ RDFS

(I have not yet incorporated rdf:Description here.)

An interpretation I is a frame interpretation the following are in I:

  <IS(rdfs:Resource),      IS(rdf:type), IS(rdfs:Class)>
  <IS(rdf:Property),       IS(rdf:type), IS(rdfs:Class)>
  <IS(rdfs:Class),	   IS(rdf:type), IS(rdfs:Class)>	[redundant]
  <IS(rdfs:Literal),       IS(rdf:type), IS(rdfs:Class)>

  <IS(rdf:type),           IS(rdf:type), IS(rdf:Property)>	[redundant]
  <IS(rdfs:subClassOf),    IS(rdf:type), IS(rdf:Property)>
  <IS(rdfs:subPropertyOf), IS(rdf:type), IS(rdf:Property)>
  <IS(rdfs:seeAlso),       IS(rdf:type), IS(rdf:Property)>
  <IS(rdfs:isDefinedBy),   IS(rdf:type), IS(rdf:Property)>	[redundant]

  <IS(rdfs:range),         IS(rdf:type), IS(rdfs:ConstraintProperty)>
  <IS(rdfs:domain),        IS(rdf:type), IS(rdfs:ConstraintProperty)>

  <IS(rdfs:Class),              IS(rdfs:subClassOf), IS(rdfs:Resource)>
  <IS(rdfs:ConstraintResource), IS(rdfs:subClassOf), IS(rdfs:Resource)>
  <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf), IS(rdfs:Resource)>	
								[redundant]
  <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf),IS(rdfs:ConstraintResource)>

  <IS(rdfs:isDefinedBy),   IS(rdfs:subPropertyOf),   IS(rdfs:seeAlso)>

  <IS(rdf:type),           IS(rdfs:range),  IS(rdfs:Class)>
  <IS(rdfs:subClassOf),    IS(rdfs:domain), IS(rdfs:Class)>
  <IS(rdfs:subClassOf),    IS(rdfs:range),  IS(rdfs:Class)>
  <IS(rdfs:subPropertyOf), IS(rdfs:domain), IS(rdf:Property)>
  <IS(rdfs:subPropertyOf), IS(rdfs:range),  IS(rdf:Property)>
  <IS(rdfs:seeAlso),       IS(rdfs:range),  IS(rdfs:Resource)>
  <IS(rdfs:isDefinedBy),   IS(rdfs:range),  IS(rdfs:Resource)>	[redundant]
  <IS(rdfs:range),	   IS(rdfs:domain), IS(rdf:Property)>
  <IS(rdfs:range),	   IS(rdfs:range),  IS(rdfs:Class)>
  <IS(rdfs:domain),	   IS(rdfs:domain), IS(rdf:Property)>
  <IS(rdfs:domain),	   IS(rdfs:range),  IS(rdfs:Class)>
  <IS(rdfs:label),	   IS(rdfs:domain), IS(rdfs:Resource)>	[redundant]
  <IS(rdfs:label),	   IS(rdfs:range),  IS(rdfs:Literal)>
  <IS(rdfs:comment),       IS(rdfs:domain), IS(rdfs:Resource)>	[redundant]
  <IS(rdfs:comment),       IS(rdfs:range),  IS(rdfs:Literal)>

A frame model for a data set N is a frame interpretation I that is a model
for N and satisfies the following extra conditions:

  RS1. ICEXT(IS(rdfs:Resource)) = IR
  RS2. ICEXT(IS(rdfs:Literal)) = DV

  if x in ICEXT(y) and <y,IS(rdfs:subClassOf),z> in I
    then x in ICEXT(z)					[2.3.2]

  if <x,IS(rdfs:subClassOf),y> in I and <y,IS(rdfs:subClassOf),z> in I
    then <x,IS(rdfs:subClassOf),z> in I			[2.3.2]

  if <x,r,y> in I and <r,IS(rdfs:subPropertyOf),s> in I
    then <x,s,y> in I					[2.3.3]

  if <x,IS(rdfs:subPropertyOf),y> in I
  and <y,IS(rdfs:subPropertyOf),z> in I
    then <x,IS(rdfs:subPropertyOf),z> in I		[2.3.3?]

  x in ICEXT(IS(rdf:Property)) 
  and x in ICEXT(IS(rdfs:ConstraintResource))
    iff  x in ICEXT(IS(rdfs:ConstraintProperty))	[3.1.2]

  if <x,p,y> in I and <p,IS(rdfs:range),c> in I
    then y in ICEXT(c)					[3.1.3]

  if <x,p,y> in I and <p,IS(rdfs:domain),c> in I
    then x in ICEXT(c)					[3.1.4]

A data set N frame entails another data set N'	iff
every frame model of N is also a frame model of N'.


This archive was generated by hypermail 2.1.4 : 04/02/02 EST