RE: Cutting the Patrician datatype knot

From: Pat Hayes (
Date: 11/29/01

>Subject: RE: Cutting the Patrician datatype knot
>Date: Thu, 29 Nov 2001 14:09:09 +0200
>>  > For example, if you allow union XML Schema datatypes there is
>>  > a model of
>>  >
>>  >	<rdfs:range foo xsd:[integer union string]>
>>  >	<John foo 7>
>>  As I think I've said earlier, I don't consider
>>  [integer union string] to be a "valid" data type.
>And why not? 
>>  The definition of a data type that I subscribe to is
>>  that a data type defines a value space and (optionally)
>>  a lexical space, and a member of the lexical space maps
>>  to one and only one member of the value space.
>[integer union string] satisfies this definition.  In [integer union
>string] the lexical item "7" maps to the integer 7.
>>  In the above union "data type", the literal "7" maps to
>>  two members of the value space. Therefore, it is not a
>>  valid data type.
>Not correct.  Please read the XML Schema recommendation to see how union
>datatypes work.

OK, now I see what you have been getting at (and what was wrong with 
my earlier replies on this thread, sorry if they created more heat 
than light.) Since the lexical space of integer is a subset of that 
of string, and since the ordering of unions is significant, the union 
datatype [integer union string] is exactly the same as that of 
integer on numerals; it takes numerals to integers, while 
non-numerals are treated as strings.

OK, but now what is the problem with my MT extension, again? Of 
course, with THIS sense of "union", one cannot treat [integer union 
string] as in any sense the simple class-union of the classes 
[integer] and [string]. So as far as the RDF reasoner is concerned, 
[integer union string] is just another datatype, which might as well 
be called [foodle]. If someone were to assert that

[integer] rdfs:subClassOf [integer union string] .

that would be correct, but that would cause no problem in the MT, 
since [integer union string] agrees with [integer] on numerals; they 
have the same lexical-to-value mapping on anything that would map to 
an integer. Similarly, it would be correct to assert

[string] rdfs:subClassOf [string union integer] .

though in that case it seems rather pointless since in this case the 
datatypes are identical, since even numerals will be mapped as 
strings by that union. But again, this poses no problems for the MT. 
On the other hand, if someone were to assert

[string] rdfs:subClassOf [integer union string] .

then that would be simply false, as all numeral strings are in the 
former value space but not the latter.

Heres an artificial example that would screw things up in the way I 
think you intend. Suppose there was a datatype  xsd:gnirts for 
backward strings, so that if "abcd" were in that datatype then it 
would denote the value "dcba". Then [gnirts union string] would have 
the same value space as [string], but would have an incompatible 
lexical mapping. (Similarly for octal and decimal integers, eg.)

There are  ways past this issue, if it is really an issue. Are there 
any such cases in XML Schema, however? I can't find any, unless they 
are somehow buried in the details of the Gregorian calendar.

>  > What you seem to be defining is just a union of lexical
>>  space. I.e., the union of the lexical space of integers with
>>  the lexical space of strings; which, however possible to do,
>>  is not particularly useful if you want to deal with the
>>  values themselves.
>No, XML Schema has a method for creating union datatypes that satisfies
>your requirements.  If you want to exclude such datatypes you have to
>provide a criterion other than ``usefulness''.
>>  XML Schema is not concerned with values the same way that
>>  an application would be. XML Schema only has to ensure
>>  the integrity of the lexical and structural space. Thus,
>>  a union such as above is reasonable, as XML Schema does
>>  not itself worry about the ambiguity that arises in the
>  > lexical to value mapping. 
>Again, XML Schema does *not* have ambiguous lexical-to-value mappings.
>Although this is not explicitly stated in the XML Schema datatype document,
>it can be inferred from lots of places in section 2.  [Note to XML Schema
>people:  This property of datatypes should be explicitly stated.  Also,
>datatypes really should be four-tuples, one element being the
>lexical-to-value map!]

Amen to that last point. That is a serious blunder in the way that 
XML schema are stated.

>>  You do, though, raise an important question -- whether it
>>  is possible to define XML Schema simple data types which
>>  do not have a N:1 mapping from lexical space to value space.
>>  If we can have 1:N or N:N mappings, then we are going to
>>  have problems, and that might mean that perhaps XML Schema
>>  may need to be more constrained with regards to some
>>  simple type derivations.
>No XML Schema datatype has a 1:N or N:N lexical-to-value map.  It is not
>the presence of such datatypes that causes problems.
>Instead, again, it is the presence of two (different) datatypes that have
>overlapping value spaces but different lexical-to-value maps within this

Which are, exactly? As far as I can see, this situation never arises 
with the combinations
[integer], [string] , [integer union string], [string union integer].

The lexical and value spaces are respectively
[integer]       numerals   --->  N (integers)
[string]         S (strings)  --->  S (strings)
[integer union string]    S  ---> N union (S - numerals)
[string union integer]    (same as [string])

None of these have the pathological behavior that you describe, since 
their lexical-to-value mappings coincide on the parts of the value 
spaces that overlap.
>>  I'm presuming, of course, that RDF is only concerned with
>>  simple data types, not all XML Schema definable types in
>>  general.
>This is true even if you include all XML Schema datatypes, even the
>composite ones.
>>  > For example, what is the theory of rdf:type on datatype classes?
>>  Good question. I'm not the best person to offer an answer,
>>  insofar as the formal MT is concerned, but I would expect
>>  that the theory of rdf:type is the same for all classes, datatype
>>  or otherwise, and it is the knowledge about a particular class
>>  that tells us it is a data type class, and data type classes
>>  have distinct characteristics, such as defining a value space
>>  and (optionally) lexical space.

Right, exactly.

>If we declare that literals
>>  may only be bound to data type classes,

No need to say this; only that any binding to a non-datatype class 
does not fix the interpretation of the literal. Only datatype class 
bindings have the semantic power to constrain the lexical-to-value 
mapping used in the interpretation, but other bindings are harmless, 
even if datatype-uninformative.

>then we know that a
>>  given class is a data type class if it is bound to a literal,
>>  and thus know how to interpret the pairing of literal (lexical
>>  form) to data type.
>But if you don't provide a theory of rdf:type on datatype classes, then
>others cannot evaluate your mechanism, as it uses rdf:type to determine the
>lexical-to-value mapping.

Right, exactly. If one tries to apply ordinary RDFS reasoning to the 
style adopted in Peter's proposal, which is to use rdfs:range on the 
predicate to define the datatype of the literal in the object 
position, eg

aaa eg:prop "10" .
eg:prop rdfs:range xsd:integer .

then the closure rules for RDFS reasoning will not generate the 
required restrictions of the literal occurrence unless the literal is 
somehow allowed to be the subject of a triple of the form

"10" rdf:type xsd:integer .

It is worth bearing in mind that all of the rdfs: vocabulary is 
eliminable in RDF, and can all be defined in terms of rdf:type. That 
includes rdfs:range.

Pat Hayes

IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax

This archive was generated by hypermail 2.1.4 : 04/02/02 EST