a revised version of my datatyping message (agenda item 2)

From: Peter F. Patel-Schneider (pfps@research.bell-labs.com)
Date: 11/27/01

Here is a slight revision of my datatyping message of last week.  I've
cleaned up a few things, and added XML Schema methods as another

By the way, what I would hope comes out of RDF Core is at least something

1/ Use a locally specified lexical-to-value map, employing XML Schema
   mechanisms for compatibility reasons.

2/ Text nodes (i.e., those that don't have an XML Schema-specified
   lexical-to-value map) are underspecified and don't have a fixed
   lexical-to-value map, leaving the door open for better schemes.


Pat's proposal for datatypes (http://lists.w3.org/Archives/Public/
w3c-rdfcore-wg/2001Oct/0453.html) has the following peculiarity.
Suppose you have two datatypes, say integer and fraction, where integer is
interpreted in the usual way but fraction is interpreted as 1/1+n.  Then it
is possible to interpret the following:

<age> <rdfs:type> <integer> .
<John> <age> "0" .

as having John's age be 1, by making the datatype of the node for the "0"
be fraction.

The problem is that the datatype of the node for "0" can be any datatype
that maps "0" into an integer, and not the intended datatype, integer.  

One fix is to have multiple datatypes for the node for "0".  That is,
as "0" maps to 1, then it has two datatypes, integer and fraction, because
1 is in the range of both integer and fraction.  This does not result in an
interpretation as the integer mapping for "0" is to 0, not to 1.
This works, but disallows some datatypeing schemes, including full XML
Schema datatypes, because union types don't satisfy the requirement just
above.  (Think of string union integer versus integer union string.)  

The problem is, I think, that there are too few entities in the model
theory to support the typing involved.  In essence, 7 as an integer union
string has to be different from 7 as a string union integer.

Note that the scheme that uses rdf:type and rdf:value to type literals
falls prey to this problem as well.  The scheme that uses xsi:integer links
doesn't, but only because it is a syntactic variant of the xsi:integer:7

There are three possiblities that I see:

0/ Make the lexical-to-value mapping local to the lexical information.  To
me, this boils down to using xsi:type or other XML Schema constructs
(because of compatibility concerns).  However, the other mechanisms for
local typing also don't exhibit the problem here.

1/ Use a nice datatyping scheme.  Basically the requirement is
that two datatypes whose value spaces overlap cannot disagree on mappings
that go into that overlap.  More formally,

	Let DT be the collection of datatypes
	A datatype, d in DT, is a triple <L,V,LV>
		where L is its literal space
		      V is its value space
		      LV is its mapping from literals to values
	DT is a datatype scheme if
	for d in DT, d' in DT
	for v in V(d) intersection V(d')
	for l in L(d) intersection L(d')
	if LV(d')(l) = v then LV(d)(l) = v 

This prevents any bad use of one datatype's (d') lexical-to-value mapping
when another datatype's (d) is meant.

Note, however, that XML Schema datatypes are not nice in this sense.

2/ Don't only use the value of a literal node to determine if the
lexical-to-value mapping is correct.

For example, you could do something like

	if < x , I(n) > in I(r)
	and < r , d > in I(rdfs:range)
	then I(n) = LV(d)(label(n))

Here all lexical-to-value mappings have to be obeyed, so there is no
possibility of using the wrong mappings.  Disagreements result in incoherency.

What is gained here is a way of determining which datatypes are actually
used for the literal so that one can ignore the other datatypes.  

This more-or-less requires that the type hierarchy for datatypes is not
determined by their extension, so that integer-union-string does not
subsume string.  (Otherwise, any use of a numeral on something typed as a
string would be incoherent.)

I suggest a combination of 0 and 2.  That is, allow local specification of
the lexical-to-value mapping using XML Schema constructs.  For untyped text
nodes use range information to specify which lexical-to-value mapping(s)
must be used.  As far as DAML+OIL is concerned, all datatypes would be

This archive was generated by hypermail 2.1.4 : 04/02/02 EST