Re: new model theory for DAML+OIL

From: Pat Hayes (phayes@ai.uwf.edu)
Date: 10/10/01


>I didn't account for blank nodes in the model theory in a way that is easy
>to define a decent notion of entailment, so I put together a revision of
>the model theory.   The development of interpretations and models is now
>similar to yours.
>
>I believe that entailment has
>
>	G |= G'
>
>for G' isomorphic to G, which would have been somewhat difficult to achieve
>in the previous model theory.

? Not sure I follow you. If you mean (labelled)graph-isomorphism, 
that is trivial in the rdf MT, whereas I think it actually does not 
follow in yours, since the same literal may denote something 
different in G and in G' (?)

>
>peter
>
>
>	Alternative Model Theory for RDF and RDF Schema plus Datatypes
>
>Here is an alternative model theory for RDF and RDF Schema.   The attempt
>here is to have a model theory for RDF and RDF Schema that can be extended
>to datatypes, as I have proposed them.

For the record, I would also very much like to make that extension possible.

>  This is a VERY terse document,
>missing most of what would correspond with Pat Hayes's excellent commentary
>on his model theory for RDF and RDF Schema.
>
>Note that this model theory has not gone through any screening.  I have
>tried to make it as error-free as possible, but there are undoubtably
>errors remaining.  In particular, there may be (little) pieces missing
>concerning some of the RDF and RDF Schema vocabulary.
>
>Note also that this is an unofficial and draft model theory. 
>
>Peter F. Patel-Schneider
>
>
>
>1/ RDF Graph Syntax
>
>   I'm sticking with graphs, even though they have some problems for RDF.
>   Triples have their own problems with respect to RDF.
>
>URI is a collection of URI names.
>
>   URI may be the collection of all URI names, but this is not required.
>   This model theory ignores all aspects of the structure of URIs.
>
>L is the collection of literals, disjoint from URI.
>
>   Literals form the lexical space, in XML Schema datatype terms, not the
>   value space!
>
>An untidy RDF graph, R, is a three-tuple (that can be considered to be a
>partially node labeled, directed triple-graph)
>		< N, E, LN >
>where N is the set of nodes in the graph
>       LN :(partial) N -> URI u L gives labels for nodes
>       E <= N' x N'' x N is the set of edges in the graph
>	where N' = { n : LN(n) is undefined or LN(n) in URI }
>	where N'' = { n : LN(n) is defined and LN(n) in URI }

? Why is an edge a triple of *nodes*? I would prefer to have edges as 
a distinct set with (total) mappings to their endpoints, and LN 
defined on N u E.

>   This accounts for literals not being allowed as ``labels'' of edges, nor
>   as the labels of nodes that are heads of edges, but does not account for
>   edge ``labels'' being properties.

Right. I don't think there is any need to make this a syntactic 
requirement. Any graph with an edge label that doesn't denote a 
property (or, denotes something with an empty extension, if you 
prefer) will be false.

>An untidy RDF graph is ground if LN is a total function on N.
>
>A tidy RDF graph (also called an RDF graph) is an untidy RDF graph where LN
>is injective on URI (but not necessarily total).
>
>   Tidy graphs here do not have to be tidy on literals, which is change from
>   Pat Hayes's model theory.

Better, it fixes a bug in Pat Hayes' model theory, thanks. I will 
make this change in the next draft.

>   I have tried to keep as much of the terminology from Pat Hayes's model
>   theory as possible. 
>
>The union of two untidy RDF graphs,
>	R = < N , E , LN > and R' = < N', E', LN' >
>is the untidy RDF graph
>	R u R' = < N u N', E u E', LN u LN' >
>provided that LN and LN' agree where they are both defined.
>
>   Note that N and N' need not be disjoint.

Right.

>
>The tidy version of an untidy RDF graph R = < N, E, LN >
>is the graph t(R) = < N/eLN, E/eLN, LN/eLN >
>where eLN is the equivalence relation induced by LN range restricted to URI
>i.e., n = n'  iff  LN(n) and LN(n') both defined, LN(n) in URI,
>		   and LN(n) = LN(n')
>and / is point congruence.
>
>	[Is this the right terminology?]

If even the author has to ask, might be better to use a different 
terminology ;-)

>
>
>2/ Literal Values
>
>LV is some collection of literal values.
>
>   Literal values form the value space, in XML Schema datatype terms, not
>   the lexical space!
>
>XLS : L -> powerset ( LV ), maps literals into the set of literal values
>			    that they might have.
>
>   Here is the first substantive difference from Pat Hayes's model theory.
>   The XLS mapping does not provide a definitive answer for the meaning of a
>   literal.  The reason for not pinning down the mapping for literals is to
>   allow different mappings for different datatypes.  For example, a node
>   with literal label 05 might be mapped into the integer 5 or the string "05".

See my recent message; could you change this to XLS: LL -> LV, where 
LL is the set of all *occurrences* of literal labels (or, maybe, of 
nodes which have literals as labels, given that we do not require 
tidiness on such nodes) ?  This would allow a node labelled with 05 
to be mapped into either 5 or "5", but not both. (I think this might 
also simplify the later treatment of DTs, since the extra conditions 
they introduce would indeed be semantic conditions in the usual 
sense, ie restrictions on the class of interpretations.)

>3/ Models and Entailment
>
>An interpretation I is a four-tuple
>	< IR, IP, IEXT, IS >
>where IR is a non-empty set, called resources
>       IP <= IR, called properties
>       IEXT : IP -> powerset ( IR x (IR u LV) )
>       IS : URI -> IR

If you want to keep this mathematically bare-boned, have IEXT total 
on IR (or even IR u LV) and define IP to be {x : IEXT(x) = {} }. One 
fewer primitive sets, the extension is always defined, and it yields 
the same truth-conditions. I think I might make this change in the 
official MT, at least in the appendix.

>An interpretation I = < IR, IP, IEXT, IS>
>is a model for an untidy RDF graph R = < N, E, LN >
>if there is a mapping
>       M : N -> IR u LV
>such that for n, s, p, o in N
>     1. if LN(n) defined and LN(n) in URI then M(n) = IS(LN(n))
>     2. if LN(n) defined and LN(n) in L then M(n) in XLS(LN(n))
>     3. if <s,p,o> is in E then M(p) in IP and <M(s),M(o)> in IEXT(M(p))
>
>   This works for both ground and non-ground untidy RDF graphs.
>
>   One minor difference between this model theory and Pat Hayes's is that
>   unnamed nodes can denote resources or literal values, unless they appear
>   in the subject position of an edge.  This could easily be changed to
>   require that unnamed nodes only denote resources.

I don't think this is really a change, except in nomenclature, since 
in the rdf MT, IR isn't required to be disjoint from LV.

>An untidy RDF graph R entails another untidy RDF graph R' (R |= R')  iff
>R u R' is defined and every model for R is also a model for R u R'.
>
>   This definition of entailment admits the possiblity that the two graphs
>   share nodes.  Will this have interesting consequences?

This is allowed at present, but it might be good to make the 
possibility more explicit.  (Later: I see that the careless wording 
'other graphs' in the section on entailment could easily be read as 
implying disjointness. I will change that.) This would certainly be 
needed with the proposed change in the definition of XL. It has the 
consequence that a graph entails itself, but maybe not a copy of 
itself, so the notion of 'copying' needs to be spelled out a bit more 
explicitly. We could define a 'faithful' copy to be one which 
preserves the denotations of literal labels, for example. Then any 
copy which used the same datatyping scheme would be guaranteed to be 
faithful.

>4/ Core RDF
>
>By core RDF I mean RDF without reification or containers.
>
>A core RDF graph is an untidy RDF graph that contains nodes with the
>following labels:
>
>	rdf:type
>	rdf:Property
>
>and an edge <t,t,P> where t is a node with label rdf:type
>		    and P is a node with label rdf:Property

For the record, I'd rather do this in terms of a reserved vocabulary. 
But that's just a stylistic difference.

>   Throughout this development untidy graphs are allowed.  Theorems about
>   entailment will probably have to use tidy graphs, however.

Yep, and we need to be very careful about untidiness when it comes to 
blank nodes. Right now, tidiness is a *syntactic* requirement in RDF, 
by the way.

>.....
>
>6/ Datatypes (general version)
>
>Datatypes add extra structure to literals and literal values.
>
>A datatype theory is a four-tuple <LV,DT,DTC,DTS>
>where LV is a collection of literal values
>       DT is a collection URIs that are also datatypes
>       DTC : DT -> powerset ( LV )
>       DTS : DT -> ( L -> LV ), with DTS(d) potentially partial
>			       and DTS(d)(L) <= DTC(d) for all d
>
>DTC maps a datatype to its extension (or value space).
>DTS maps a datatype to a partial map from literals (or lexical space) to
>literal values (or value space).

This seems rather complicated. Couldn't you define DTC as a pullback from DTS?

>    Each datatype provides at most one literal value for each literal via
>    the DTS mapping.

Quite. Now, Peter, what is wrong with my original MT in which 
'global' is understood as relative to a datatyping scheme, and XL is 
defined to be
(lambda x. DTS(datatype(x))(x)), where datatype is whatever function 
from literals to their type determines that unique literal value? 
Could we resolve this just by my altering the text to make this 
possibility clear?

>Given a datatype theory <LV,DT,DTC,DTS>
>define XLS(l) = { lv in LV : for some d in DT with DT(d) defined on l
>			     lv = DT(d)(l) }

Wait a minute. Surely once we have a datatyping scheme fixed, the 
denotation of any literal should be uniquely defined? When is this 
actually going to home in on a referent? It seems to me that this is 
getting out of hand. The semantics of '5' is more complicated than 
the entire semantic theory for all of RDF.

>Given a datatype theory <LV,DT,DTC,DTS>
>a datatype RDFS model for a core RDFS graph R is a core RDFS model I for R,
>with the following extra conditions:
>
>   if <s,p,l> is in E with LN(l) in L
>   and <M(p),c> in IEXT(IS(rdfs:range))
>     then for any node nc with LN(nc) in DT and M(nc) = c
>	M(l) = DTS(LN(nc))

so M(l) is a mapping, not a literal value? Or should that be 
DTS(c)(LN(nc)) (??)

>   for v in LV, if <x,v> in IEXT(p) and <p,c> in IEXT(IS(rdfs:range))
>     then for any node nc with LN(nc) in DT and M(nc) = c
>	v in DTC(LN(nc))
>
>
>   These conditions are rather complicated for semantic conditions,

Worse, seems to me: they aren't really semantic conditions. The first 
one should be phrased as a(n extra) truth-condition on E  in terms of 
<s, p, l>. Just having that triple in the graph shouldn't be a 
semantic constraint in itself; what if the graph is false in I?
Also, what set (of nodes) does nc range over?

>so some
>   explanation is in orer.  The first condition says that literals (n) that
>   are objects of statements must denote according to any datatype range for
>   the predicate (M(p)) of the statement.

Even if the assertion that is made about them is false?

>  The second condition says that
>   literals values (y) that are in relationships must belong to the value
>   space (DTC(LN(c))) of any range of the relationship.
>
>
>A core RDFS graph R datatype-entails another core RDFS graph R'  iff
>R u R' is defined and
>every datatype RDFS model for R is also a datatype RDFS model for R u R'.
>

With the same datatype, right(?)

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes


This archive was generated by hypermail 2.1.4 : 04/02/02 EST