Re: added diagrams to "Using XML Schema Data Types..."

From: Dan Connolly (connolly@w3.org)
Date: 02/08/01


pat hayes wrote:
> PPS. Any objections to putting this discussion onto rdf-logic? It
> seems to belong there.

no, no objection.


[...]
> >Another way to state this principle is that
> >the knowledge contained in two documents, X and Y,
> >is always the conjunction of the knowledge in X with
> >the knowledge in Y. To allow X to change what Y says
> >in some non-monotonic way doesn't seem scalable/workable
> >to me.
> 
> two points:
> 
> (1) you need to get clear about what counts as a document and what
> you take X and Y to be *about*. (content language or meta-language,
> see above.) If 'documents' are any amount of information about
> anything, then this is an impossible requirement. (Suppose I want to
> say 'either X or Y'. If I conjoin X or Y then I've said too much; if
> I don't include them then nobody is going to know what I'm talking
> about.)
> 
> (2) Jim Hendler is right. If the meaning of Y depends in part on the
> meaning of X, that doesnt automatically produce nonmonotonicity. It
> just means that one has to use X to figure out the meaning of Y. The
> monotonicity would be produced, maybe, if one insisted on making
> assumptions about the meaning of Y without knowing X, which might
> later turn out to be wrong. But sometimes you can wait until you do
> know, or report back to something else that you can't make progress
> until you are told, or whatever. I agree these complexities make for
> boring reading, but I don't think that there is any way out, in
> general: we are going to have to deal with things like this whether
> we want to or not, ultimately because INFORMATION is like this. It
> just is, however we encode it. Life is not a monotonic conjunction of
> cherries.

I am not convinced; I'm not even quite sure we're talking
about the same thing. I'm not talking about any deep
sense of what the formulas mean; I'm just talking about
knowing what formula you're looking at without having
to look anywhere else.

Let me try a few mundane examples
from programming langauges to illustrate...

Consider ex1.c:

	#include "otherThingy.h"
	#include <stdio.h>

	main(int argc; char **argv)
	{
		if(OTHER_THINGY_CONSTANT < 10){
			printf("x");
		}else{
			printf("y");
		}
	}

The meaning of ex1.c -- what program it contains, or how
to compile it to ex1.o -- depends on otherThingy.h. It's
either a program that prints "x" or a program that prints "y", but
we're not sure which without looking in otherThingy.h.

On the other hand, consider ex2.c:

	int printf(const char*, ...);
	int otherThing();

	main(int argc)
	{
		if(otherThing() < 10){
			printf("a");
		}else{
			printf("b");
		}
	}

we can tell what this compilation unit means; we can
compile it to ex2.o. It's a program that calls the otherThing()
function, and depending on the result, prints "a" or "b".

I'm happy to have one RDF/DAML+OIL document interact with
another in ex2 style, but I'm not happy for the syntactic
meaning of an RDF/DAML+OIL document to be dependent on another
in the way that ex1.c depends on otherThingy.h.

I don't see how you can claim that there is something fundamental
about information that forces our language to include
garbage like the C preprocessor. Surely we can design our
language so that the formulas contained in each document
are syntactically evident, no?

KIF has this property, after all. You either write

	(size x "10")
or
	(size x 10)

and there's nothing that can be written in some other
file (or some other part of this file, for that matter)
that will change the formula that a given piece
of KIF represents.

CycL is an example of making the choice the
other way: the syntax of formulas depends on context;
this
	(both (tall Fred) (green TheCar))
might be a two-place predicate applied to two
function terms, or a new logical connective
applied to two predicates, depending on whether
(isa both Connective) is true when it's parsed.
cf http://www.cyc.com/cycl.html#logical_connectives

A related issue came up in the design of
XML namespaces. It might seem more convenient
if you could just open a bunch of namespaces
and use names from any of them ala:

	<aDoc xmlnss="http://example/fruits
			http://example/vegetables
			http://example/minerals">
	<apple/>
	<tomatoe/>
	<sandstone/>
	</aDoc>

but that's no good because it's not syntactically
evident wither tomatoe is a fruit or a vegetable
(or... if you're a dumb computer: a mineral).

This was part of a whole requirements document
about extensible languages:

  Lack of ambiguity 

  Some programming languages allow one to introduce
  identifiers from new name spaces in such a way that
  it is not possible to know which namespace a local identifier
  belongs to without accessing both the module interface
  specifications and checking which one has with the highest
  priority, or  most recently in the document, redefined a given
  local identifier. 

  This may have some uses in a programming language such
  as Java[Java], but it has a serious flaw in that when one
  module changes (without the knowledge of the designers of the
  other module), it can unwittingly redefine a local identifier
  used by the second module, completely changing the meaning of
  a previously written document. Clearly, in the Web world in
  which modules evolve but documents must have clearly defined
  meanings, this is unacceptable.  Contrast with Modula-3,
  where all names are either lexically scoped or fully
  qualified [SPwM3]. 

  -- http://www.w3.org/TR/NOTE-webarch-extlang#Ambiguity


> > > In particular, it seems to me that your proposal has exactly the same
> > > problem.  You also depend on external information on how properties should
> > > work.
> >
> >But the various bits of information accumulate in
> >the normal monotonic fashion; I don't have
> >the situation where I initially parsed it as
> >a string, but then I discover I was wrong or something
> >and I have to undo stuff.
> 
> If you don't parse until you know how to parse, you won't get into
> this position. So what you really want, I think, is to know (locally
> and absolutely, ie monotonically with respect to new info. from
> elsewhere) whether or not you do have enough information to parse
> monotonically. (In many languages, for example, a parser on finding
> an identifier will look it up in a table of declared forms and if it
> isnt there, will post an error condition and refuse to compile.) But
> that's not the same as requiring that you must have, locally, enough
> information to parse. Same point applies to other things as well as
> parsing, of course.

Yes, we could go there without introducing the problem
I'm trying to avoid, but I don't see sufficient reason to.


> Pat Hayes
> 
> PS.  Dan, thanks for stating these design principles in black and
> white. I think that more discussions at this level about what people
> see as their over-riding aims and basic assumptions would be very
> useful. Knowing them would have been very helpful to me in
> understanding the rationale for the  RDF design decisions, for
> example.

OK.

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
office: tel:+1-913-491-0501
pager: mailto:connolly.pager@w3.org
  (put return phone number in from/subject)


This archive was generated by hypermail 2.1.4 : 04/02/02 EST