Re: added diagrams to "Using XML Schema Data Types..."

From: pat hayes (phayes@ai.uwf.edu)
Date: 02/07/01


>[Dan, to Peter:]
>I'm not sure what you mean by "the definition of a property".
>Do you mean that there's always exactly one such definition?
>how much stuff goes in a definition? I know what documents
>are and I know what statements/formulas/assertions are.
>Maybe I can read "a document containing statements relevant
>to some property" in place of "the definition of a property."

Let me suggest that a better phrasing might be "a collection of 
statements which together are sufficient to define a property". The 
point being that while these might not be all in one place (the web 
being what it is) and while they may need to assembled in a piecemeal 
fashion, nevertheless they do all need to be somehow got together 
before certain operations can be done reliably (that is, one can jump 
ahead without having them all, maybe, but at the risk of needing to 
later make corrections, whereas if one knows that one has them all in 
hand, one can be sure that the next step is sound.). Analogy might be 
the packets which comprise an email message, which have to be all 
gathered together to reconstruct the message reliably in toto, though 
one could make do with only some of them in an emergency (but would 
later have to move things along to fill in the gaps, etc.)

Your beef with Peter seems to be, if I follow you, that since we have 
to allow 'extendability' by adding further information (since it 
might be scattered around anywhere and we might find it at any time, 
this being a pull technology, etc.) that the web-logic cannot permit 
the use of such 'required' conditions. But I think there are two 
issues: one, we may have to gather data where we find it and have to 
take it as it comes, and two, we shouldnt always be in a nonmonotonic 
position of maybe having to undo things that have been done. One can 
have both of these, at the cost of delaying some things until one is 
certain one knows what to do. But delaying isn't the same as undoing.

>As to why not defer parsing... a major goal of all this
>logic-and-the-web stuff, to me, is a certain sort of scalability
>where we can analyze documents independently, see
>what they mean (i.e. what formulas they state) and then
>merge information from multiple documents in a monotonic
>fashion. This is a major feature of RDF. It's something I
>would require even if we threw out RDF syntax and started over
>with XML and URIs.

But if we take this position very strictly, and if a 'document' can 
be a sufficiently small piece of information, then it becomes 
impossible to say anything nontrivial. You will HAVE to compromise on 
this (a little) or web logic is never going to get off the ground (or 
else you will have to allow a 'document' to have a certain minimal 
complexity, so that the content comes in sufficiently well-organized 
chunks.)

Some content can be rendered down into a huge conjunction of small 
parts, but a lot of it - almost anything that might be called 
'structured data' - cannot be. You can't for example express the 
content of a quantified statement that way, or a reasonably large 
negation, let alone things like images or arrays or most programming 
language expressions.

>To have the parsing of one document depend on the
>contents of another conflicts with that goal/principle.

Suppose that one document contains a large collection of declarations 
of a vocabulary of symbols, and someone wants to write another 
document which uses that vocabulary, in some reasonably expressive 
language. Your requirement has the consequence that it is impossible 
to parse the second document, in general, since without the 
declarations the parser doesnt know how to proceed.

One way around the letter, but not the substance, of this objection 
is to make the 'language' of these documents be not the content 
language, but a meta-language of that language. This is what you are 
doing with RDF (for anything other than a conjunction of ground 
atoms.) Since, on the whole, *expressions* of languages are fairly 
simple structures (directed labelled graphs will probably do the 
trick, with a little tinkering), one canhave a uniform syntax for 
this meta-language which will enable one to describe any expressions 
in any language (like XML); and since this is universal, one can 
write a parser for it, and bob's your uncle: you can parse the second 
document without looking at the first document (in fact, without 
looking at ANY other documents.)

Well, yes; but also no, since you are, remember, now working in a 
metalanguage, not the content language. Parsing an expression in the 
metalanguage only gives you a *description* of the expression you 
wanted originally: it doesn't give you the expression itself. You 
still need to parse *that* in a way that uncovers enough of its 
structure (eg variable bindings, scopes of complex operators, etc.) 
to enable your reasoners/interpreters/whatever do their stuff 
properly, and you still can't do that without reference to the first 
document, because the relevant information just can't be gotten any 
other way (except maybe by prayer).

>Another way to state this principle is that
>the knowledge contained in two documents, X and Y,
>is always the conjunction of the knowledge in X with
>the knowledge in Y. To allow X to change what Y says
>in some non-monotonic way doesn't seem scalable/workable
>to me.

two points:

(1) you need to get clear about what counts as a document and what 
you take X and Y to be *about*. (content language or meta-language, 
see above.) If 'documents' are any amount of information about 
anything, then this is an impossible requirement. (Suppose I want to 
say 'either X or Y'. If I conjoin X or Y then I've said too much; if 
I don't include them then nobody is going to know what I'm talking 
about.)

(2) Jim Hendler is right. If the meaning of Y depends in part on the 
meaning of X, that doesnt automatically produce nonmonotonicity. It 
just means that one has to use X to figure out the meaning of Y. The 
monotonicity would be produced, maybe, if one insisted on making 
assumptions about the meaning of Y without knowing X, which might 
later turn out to be wrong. But sometimes you can wait until you do 
know, or report back to something else that you can't make progress 
until you are told, or whatever. I agree these complexities make for 
boring reading, but I don't think that there is any way out, in 
general: we are going to have to deal with things like this whether 
we want to or not, ultimately because INFORMATION is like this. It 
just is, however we encode it. Life is not a monotonic conjunction of 
cherries.

> > In particular, it seems to me that your proposal has exactly the same
> > problem.  You also depend on external information on how properties should
> > work.
>
>But the various bits of information accumulate in
>the normal monotonic fashion; I don't have
>the situation where I initially parsed it as
>a string, but then I discover I was wrong or something
>and I have to undo stuff.

If you don't parse until you know how to parse, you won't get into 
this position. So what you really want, I think, is to know (locally 
and absolutely, ie monotonically with respect to new info. from 
elsewhere) whether or not you do have enough information to parse 
monotonically. (In many languages, for example, a parser on finding 
an identifier will look it up in a table of declared forms and if it 
isnt there, will post an error condition and refuse to compile.) But 
that's not the same as requiring that you must have, locally, enough 
information to parse. Same point applies to other things as well as 
parsing, of course.

Pat Hayes

PS.  Dan, thanks for stating these design principles in black and 
white. I think that more discussions at this level about what people 
see as their over-riding aims and basic assumptions would be very 
useful. Knowing them would have been very helpful to me in 
understanding the rationale for the  RDF design decisions, for 
example.

PPS. Any objections to putting this discussion onto rdf-logic? It 
seems to belong there.

---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes


This archive was generated by hypermail 2.1.4 : 04/02/02 EST