Re: Datatyping Foibles

From: Pat Hayes (phayes@ai.uwf.edu)
Date: 11/29/01


>Pat's proposal for datatypes
>(http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0453.html, and
>other messages) has the following peculiarity.
>
>Suppose you have two datatypes, say integer and fraction, where integer is
>interpreted in the usual way but fraction is interpreted as 1/1+n.  Then it
>is possible to interpret the following:
>
><age> <rdfs:type> <integer> .
><John> <age> "0" .

That isn't my proposal, and I don't think it makes sense in this form 
in anyone's proposal.  My proposal would allow the literal 
(occurrence) to be a subject and apply rdf:type to the literal 
itself. In Ntriples++ this would be:

<John> <age> _:x:"0" .
_:x <rdf:type>  <integer> .

or using ascii-art:

John --age-->0--rdf:type-->integer

(Or did you mean

<age> <rdfs:Range> <integer> . ?? )

>as having John's age be 1, by making the datatype of the node for the "0"
>be fraction.
>
>
>The problem is that the datatype of the node for "0" can be any datatype
>that maps "0" into an integer, and not the intended datatype, integer.

This is a problem only if the datatype class (ie the class of 
datatype values) of one is a subclass of the other, which would not 
be true in this case, presumably, since for example 1/(1+1) is not an 
integer. But the general point is well taken, since an example that 
would break the datatyping proposal is easy to find, eg octal and 
decimal integers.

This is a problem with using rdfs:Range to assign datatypes as well, 
by the way. So it applies to Peter's suggestion as well as to mine. 
Any proposal to use class reasoning in RDFS to attach datatypes to 
literals is hostage to this objection.

>One fix is to have multiple datatypes for the node for "0".  That is,
>as "0" maps to 1, then it has two datatypes, integer and fraction, because
>1 is in the range of both integer and fraction.  This does not result in an
>interpretation as the integer mapping for "0" is to 0, not to 1.

I do not follow this. Is your point that if one asserts two 
contradictory datatyping assertions then the resulting graph has no 
interpretations? If so that is true but hardly merits being called a 
'fix'. The following suggests that you mean something else, however, 
and I cannot figure out what it is.

>This works, but disallows some datatypeing schemes, including full XML
>Schema datatypes, because union types don't satisfy the requirement just
>above.  (Think of string union integer versus integer union string.) 
>
>The problem is, I think, that there are too few entities in the model
>theory to support the typing involved.  In essence, 7 as an integer union
>string has to be different from 7 as a string union integer.

??? How did the ordering get involved? That order isn't even in the 
RDF *syntax*.

>Note that the scheme that uses rdf:type and rdf:value to type literals
>falls prey to this problem as well.  The scheme that uses xsi:integer links
>doesn't, but only because it is a syntactic variant of the xsi:integer:7
>one.
>
>
>There are two possiblities that I see:
>
>1/
>
>Place restrictions on the datatype scheme.  Basically the requirement is
>that two datatypes whose value spaces overlap cannot disagree on mappings
>that go into that overlap.

In my scheme, this is only required for the case where one of these 
schemes is an rdfs:subClassOf the other. Recall that in the RDF MT, 
two classes may have identical extensions but still not be in the 
rdfs:subClassOf relation.  Classes have intensions as well as 
extensions, and this is crucial to the use of class reasoning for 
datatyping.

>More formally,
>
>	Let DT be the collection of datatypes
>	A datatype, d in DT, is a triple <L,V,LV>
>		where L is its literal space
>		      V is its value space
>		      LV is its mapping from literals to values
>	DT is a datatype scheme if
>	for d in DT, d' in DT
>	for v in V(d) intersection V(d')
>	for l in L(d) intersection L(d')
>	if LV(d')(l) = v then LV(d)(l) = v
>
>This prevents any bad use of one datatype's (d') mapping when another
>datatype (d) is meant.
>
>2/
>
>Don't only use the value of a literal node to determine if the datatyping
>is correct.
>
>For example, you could do something like
>
>	if < x , I(n) > in I(r)
>	and < r , d > in I(rdfs:range)
>	then I(n) = LV(d)(label(n))
>
>This means that if you have two datatypes, d and d', that are both the
>range of r, and that they disagree on label(n) above, then there is no
>possible model.
>

There is another idea altogether, which is to introduce a separate 
category of dataclass inheritance relationships, called for example 
rdfs:subDataTypeClassOf , which is a (proper) rdfs:subPropertyOf 
rdfs:subClassOf, and require datatypes to be fixed only by 
inheritance on  rdfs:subDataTypeClassOf. This would enable normal 
class reasoning to be done on datatype value classes without 
interfering with assignment of datatypes to literals.  I rather like 
this, in fact: crude but robust, like a Russian tank.

Pat


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes


This archive was generated by hypermail 2.1.4 : 04/02/02 EST