Re: on behalf of sandro (fwd)

From: Ian Horrocks ([email protected])
Date: 10/27/01
Next message: Jim Hendler: "Re: announcement by W3C"
Previous message: Peter F. Patel-Schneider: "announcement by W3C"
Maybe in reply to: Ian Horrocks: "Re: on behalf of sandro (fwd)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]
I think I forgot to copy this to the group list.
------- start of forwarded message -------
From: Ian Horrocks <[email protected]>
To: Pat Hayes <[email protected]>
Date: Thu, 25 Oct 2001 23:32:29 -0400
Subject: Re: on behalf of sandro
Reply-To: Ian Horrocks <[email protected]>

On October 25, Pat Hayes writes:
> >On October 24, Pat Hayes writes:
> >>  >On October 24, Richard Fikes writes:
> >>  >>  >The knowledge base contains the statements: "Pat's car is blue, and
> >>  >>  >there is something colored red."  Somewhat more formally:
> >>  >>  >
> >>  >>  >   RDF(PatsCar, color, blue).
> >>  >>  >   exists x (RDF(x, color, red)).
> >>  >>
> >>  >>  Sorry to be dense, but how does one state "there is something colored
> >>  >>  red" in DAML+OIL?
> >>  >
> >>  >The simple answer is that you can't without either naming it (i.e.,
> >>  >asserting that some named individual is red) or connecting it to some
> >>  >named individual via properties. This "collapsed model property" is
> >>  >one of the basic properties of description logics on which their
> >>  >decision procedures depend.
> >>
> >>  Hmm. This seems easy in RDF:
> >>
> >>  _:xxx hasColor Red .
> >
> >Firstly, I can do this easily enough in a DL because we are connected
> >to a named individual, i.e., I can just say that Red is the colour of
> >at least one thing. What I was talking about was whether or not it is
> >possible to assert/deduce their existence of some unnamed individual
> >that is not connected to any named individual by any path. I guess
> >that in RDF you can just say:
> >
> >  _:xxx type Red
> >
> >Secondly, this is, in my mind, a totally different animal to the kind
> >of individual that only exists "in the mind of the reasoner".
> 
> Hmm.  Ever hear of Herbrand's theorem?
> 
> >For one
> >thing, there can only be finitely many of this kind of individual,
> >whereas there could be infinitely many of the "mind of the reasoner"
> >kind. Moreover, we have some handle on them, even if this handle is
> >in some sense "anonymous". In fact given that we don't make a unique
> >name assumption there doesn't seem to be much difference whether we
> >give the node an anonymous name or a "real" one. As far as DL's are
> >concerned, I can equally well assert something like "_:xxx is an
> >instance of Red". From a logical point of view, _:xxx would of course
> >be treated in the same way as any other individual, but I could build
> >a system that treated it differently at the user interface - e.g.,
> >would not return is part of a query answer.
> 
> You would also need to treat it differently during inference. (even 
> in DLs, there is a distinction between universal and existential 
> quantification, I presume?)

Of course, but this is just the existential kind. So I can just invent
a "fresh" name and use that. In what way would I have to treat that
name differently from your _:xxx?

> >>  That would seem to imply that RDF can express something that DAML+OIL
> >>  cannot express! Which is fine, I guess, but it doesn't jibe with what
> >>  I had (perhaps naively) thought was the intended relationship between
> >>  RDF/S and DAML+OIL.
> >>
> >>  (Or does this example satisfy your c.m.property by using 'Red' as the
> >>  named individual??)
> >
> >Yes - see above.
> >
> >>  >To be honest, I think we are getting way off base here with the whole
> >>  >discussion. If we are talking about querying a DAML+OIL KB, the
> >>  >problem is whether a query such as "give me all the red objects in the
> >>  >KB" should return individuals whose existence is only implied by the
> >>  >logic, i.e., individuals that don't exist at all in terms of the rdf,
> >>  >either as named or anonymous nodes.
> >>
> >>  That is yet another problem.
> >>
> >>  >E.g., if I assert that the
> >>  >individual Ian is an instance of the class of people that own at least
> >>  >one car and all of whose cars are red, then a REASONER can INFER that
> >>  >some instance of red coloured car must exist, but there may be no RDF
> >>  >node representing that individual. In this case, we can't/shouldn't
> >>  >return it as part of the answer to our query.
> >>
> >>  But what if the reasoner can infer that a blank RDF node represents
> >>  such an individual? That is the case that seems to be tricky here,
> >>  when there is no name to be returned.
> >
> >From a theoretical point of view, this is an easy case compared to the
> >one I mentioned. We could treat these "blank" individuals exactly like
> >named individuals within the query answering machinery and simply
> >decide what to do with then in the user interface - e.g., discard
> >tuples containing any such individuals, or return the tuples with
> >"blanks" in.
> 
> I don't think you are taking the RDF seriously enough here.

You must admit that it is hard to take RDF seriously!

> There is 
> a perfectly clear semantics for blank nodes: they are, in effect, 
> existentially quantified variables whose scope is the RDF document or 
> graph.

Yes, but as I pointed out above, what is the difference between that
and just inventing a "fresh" name.

> >>  >On the other hand, if
> >>  >the query asks for all people owning red cars, then we CAN return Ian
> >>  >as part of the answer, even if there is no RDF node representing the
> >  > >red car Ian owns, because the reasoner can infer that in all models
> >>  >satisfying the KB such an individual must exist.
> >>  >
> >>  >As far as I can understand it we were all agreed on this point - i.e.,
> >>  >that the answer to queries should only include named individuals, and
> >>  >not those whose existence is only implicit.
> >>
> >>  Well, I can see a reasonable sense in which if the query is simply
> >>  'does such a thing exist?' then the answer could be simply 'yes',
> >>  even when no name of any thing is known, as in this case. Did you
> >>  meant to allow this kind of simple query?
> >
> >Yes, but I think that this is a different query. It is asking if Ian
> >is an instance of the class of red car owning people (or
> >alternatively, is Ian in the answer to the query "return all the
> >people owning red cars"). This isn't the same as asking for all the
> >red cars owned by Ian.
> 
> Ian is a member of this class just in case he owns a red car, right? 
> So if I phrase the query in terms of class membership, or in terms of 
> the existence of a car that he owns, why should this be considered a 
> different query? (If anything, *that* seems to me to be a 
> user-interface issue.) And the difference between
> ?[x] (exists y)....
> and
> ?[x y]....
> seems to me to make virtually no difference to a reasoner other than 
> to request an answer in a different form.

I already pointed out one major difference. In the second case queries
are not, in general, decidable because there could be infinitely many
answers. 

> >>  >Things seem to get a bit more tricky when the query asks for a
> >>  >tuple. E.g., we ask for all <x,y> such that x is a person, y is a red
> >>  >car, and x owns y. There may be a temptation to want to return the
> >>  >tuple consisting of Ian and the "anonymous" car which we know he must
> >>  >own. I am convinced we should NOT do this. If there is no individual
> >>  >in the KB that we can infer to be a red car owned by Ian, then Ian
> >>  >should not form part of the answer to the query - if we are really
> >>  >only interested in the owners, then the query should simply have asked
> >>  >for the instances of the class of red car owners.
> >>  >
> >>  >As far as I can understand it, most/all of us were agreed on this
> >>  >point (at least Richard, Peter, Pat and I).
> >>
> >>  Hmmm. I'm not so sure. Consider a query formalization like this:
> >>
> >>  ?[x,y] ( P(x) & RC(y) & O(x,y) )
> >>
> >>  where the query marker binds the variables defining the required
> >>  answer. Then we can distinguish this from
> >>
> >>  ?[x] (exists (y) ( P(x) & RC(y) & O(x,y) ) )
> >>
> >>  , right? In your example, the second query will return [x/Ian], but
> >>  the first one will return nothing. But it seems to me to be
> >>  reasonable for the first query to return something like [x/Ian,
> >>  y/<blank>], meaning that something exists which satisfies the query,
> >>  but no name is available for it (contrast [x/Ian, y/y], which says
> >>  that Ian owns all the red cars in the universe), thereby answering
> >>  the second query form automatically.  Any reasoner that could answer
> >>  the second query could in fact generate this information in response
> >>  to the first query, so it seems kind of petty to withhold such an
> >>  answer on the grounds that the query wasn't formulated in exactly the
> >>  right way.
> >
> >I don't think it is a good idea to try to give intensional answers to
> >queries in this way.
> 
> I don't follow you. There are no intensions involved: this is all 
> perfectly straightforward FOL.  But in any case, I think we have to 
> allow this kind of answer at some point in the process.

Intensional and extensional are often used in KR to distinguish these
two cases - concrete facts, like Ian owns all the cars I know about,
and inferences like the fact that the class of cars not owned by Ian
must be empty because I know that Ian owns all cars.

> >For one thing, as I mentioned in another message,
> >the situation could easily arise where the answer to such a query was
> >infinite.
> 
> But that could arise with any kind of query. Suppose the KB knows 
> that Ian owns every red car in the universe, for example, and someone 
> asks for all the cars he owns.

But only if you allow "anonymous" nodes in the answer! This is exactly
my point. (By the way - there is no guarantee that the universe is
infinite. In fact DAML+OIL is expressive enough to fix an upper bound
on the size of the universe.)

> >I believe that this is best avoided. I also believe we
> >should initially try to devise the simplest language/formalism that is
> >capable of answering the queries we are interested in.
> 
> I agree, and part of my point here is that the same query form can 
> handle both kinds of query, which makes things even simpler, right?

I agree on this, but I think that in this case trying to force
everything into this one query type ends up making things much more
complex, for the reasons I have tried to indicate. Of course if we
define query semantics such that the answer is always empty when the
KB is unsatisfiable (which is probably sensible) then we can easily
answer subsumption and satisfiability queries (e.g., C subsumes D iff
adding a "fresh" individual or type (D and (not C)) to the KB makes it
unsatisfiable - which we can determine by querying for all instances
of the class (D and (not C))). This seems a bit clunky though for what
I imagine will be a very common query.

> >Many of these
> >issues we have been discussing could then be solved by a suitable user
> >interface (or language extension if you prefer) that called on the
> >simpler language, e.g., to automatically investigate an intensional
> >answer in the case that the extensional answer is empty.
> 
> But my point is that this would be inefficient, since the same 
> machinery that would determine the lack of an extensional answer 
> would in fact have already determined the intensional one (that is, 
> the existential form of the query.) The query-answering process makes 
> no distinction between variables bound by ?[] and those bound with 
> (exists) during the answer-finding process (it cannot, right? since 
> otherwise it would miss some genuine answers); the distinction only 
> arises when it is time to deliver the answers.

I think we can worry about efficiency once we have a well defined and
well understood basic framework. It also isn't necessarily the case
that the intensional query is answered while determining the
extensional one, because if we are limiting ourself to named
individuals, we only need to check if they satisfy the query. E.g., in
the case where there are NO named individuals, I can answer an
extensional query like "return all cars owned by Ian" without doing
any work at all, but I still have to call the reasoner to check if Ian
is an instance of the class of people who don't own any cars to answer
the intensional query "is it possible for Ian to own a car".

> >>  >As far as the case where Ian owns two red cars is concerned, this is
> >>  >straightforwardly dealt with by returning 2 tuples: <Ian, RedCar1>,
> >>  ><Ian,RedCar2>. Note that we already agreed that all names should
> >>  >really be sets of equivalent names, we can deal with the case where
> >>  >RedCar1 and RedCar2 are inferred to be the same car by returning the
> >>  >tuple <Ian, {RedCar1,RedCar2}>. Anyway, these are relatively minor
> >>  >details compared to the above.
> >>
> >>    I agree here.
> >>
> >>  >
> >>  >Apart from this kind of query, I think we also need a second kind of
> >>  >logical query relating to subsumption/satisfiability questions, e.g.,
> >>  >is the class C satisfiable and does C subsume D. These kinds of query
> >>  >can all be reduced to KB satisfiability.
> >>
> >>  Can't they all be so reduced?
> >
> >Maybe - but that isn't clear yet because we didn't formally define the
> >query language. However, we can't reduce this kind of intensional
> >query to the extensional kind, so I think we still need to consider
> >at least these two kinds of query.
> 
> Maybe I am not following something subtle about DLs, but it seems to 
> me that the answer-finding process involved is the same in the two 
> cases.

Actually, if we define the semantics of extensional queries to always
return an empty answer in the case that the KB is unsatisfiable, then
we can reduce subsumption/satisfiability to an extensional query as I
mentioned above. It isn't clear if all extensional queries can be
reduced to KB satisfiability - its quite trivial in the case where
queries are trees, but is complex (maybe impossible) in the case where
there are cycles in the non distinguished variables (ones that don't
occur in the answer) of the query.

> >>  >Then, there is a third kind of query that asks about the structure of
> >>  >the current class hierarchy, e.g., give me all the (direct)
> >>  >sub-classes of C. These kinds of query are not really "logical"
> >>  >queries in the same sense as the first two types - the query asks
> >>  >about the structure of the class hierarchy based on (possibly implicit)
> >>  >subsumption relationships. For this reason I don't really believe that
> >>  >the "non-monotonicity" of the answers to the direct subsumer query is
> >>  >a problem - DL systems have been living with this for years without
> >>  >anyone worrying about it (so it must be OK!). In any case, this is a
> >>  >relatively minor detail.
> >>
> >>  I don't agree either that it is minor or that it is so easily
> >>  dismissed. If we are just considering querying in isolation, I might
> >>  agree. But if you put it into a broader semweb context, I think this
> >>  needs to be looked at more carefully.
> >
> >I'm open to persuasion, but I still don't understand your worries
> >about this. And from a pragmatic point of view, this will be one of
> >THE most common queries in many applications, e.g., those using an
> >ontology to help refine/broaden queries.
> 
> Heres my worry. Suppose I embed this query engine in something 
> slightly larger which publishes the 'answers' it finds on the web in 
> some way. Now, this seems perfectly reasonable; after all, the KB is 
> kosher, let us suppose, and the QE has used kosher querying means to 
> get its answers. But suddenly the combination has become a 
> nonmonotonic reasoner which is publishing invalid conclusions, yet 
> saying (correctly, in a sense) that it has drawn them from the KB. 
> To put the point in another way; if we allow the querying process to 
> access information which is not strictly entailed by the KB (but is 
> in some sense 'about' the KB), then we will need to have some fairly 
> elaborate distinctions made in order to protect the perceived 
> integrity of the KB, if those conclusions are published. And my worry 
> is that those elaborate distinctions will not in fact get made, and 
> the result will be a kind of global corruption of entailment, which I 
> think is exactly the kind of issue that the SW needs to be very 
> sensitive about, as there simply will not be time for electronic 
> agents to find truth in the resulting non-monotonic mist.

I see what you are getting at, but I still think that from a pragmatic
perspective you will have difficulty persuading users to live without
this kind of query. Anyway, I don't believe that this is central to
the discussion - let's leave it on the side for the moment.

> >>  >Finally (maybe), we could also consider queries relating to the
> >>  >syntactic structure of the asserted facts, but I'm not sure that this
> >>  >should be part of a basic query language.
> >>
> >>  How about answers which indicate contradictions? eg if I ask
> >>
> >>  ?[x] (P(x))
> >>
> >>  and the system is able to prove that the class of P's has *no*
> >>  members. Should it just say 'no answer', or should it tell you that
> >>  your assumptions are wrong?
> >
> >I'll repeat what I said above. I believe we should initially try to
> >devise the simplest language/formalism that is capable of answering
> >the queries we are interested in.
> 
> Well, let me propose the language I sketched above. One takes the 
> existing assertional language, generalizes it if necessary to allow 
> variables, and then a query has the form
> 
> ?[<queryvarlist>] (exists (<varlist>) <exp>)
> 
> where varlist and queryvarlist are disjoint lists of variables. (OR, 
> one could just omit the 'exists' part and take it that all unbound 
> variables in a query are understood existentially. )
> 
> Could hardly be simpler, right?
> 
> An answer is a list of bindings, where each binding is a list of 
> pairs of a variable in the queryvarlist and a denoting expression or 
> <blank>. An empty list of bindings means 'no'. An empty binding in a 
> binding list means 'yes (but I'm not telling you anything else'. A 
> blank is interpreted as an existential, ie an instruction to treat 
> the variable as existentially quantified. A missing query variable in 
> a binding can be interpreted as a universal. Any variables in the 
> denoting expressions are understood as universals.
> 
> In our example
> ?[x,y](Px & RCy & Oxy)
> the answers might be:
> [] - no (or: Idunno.)

We obviously need to be able to differentiate no and don't know.

> [[]] - yes.

I'm not clear as to what this answer means and/or how it differs from
the next one.

> [[x/blank y/blank]] -Yes, someone owns a red car
> [[x/Ian y/blank]] - Ian owns (at least one) red car
> [[x/Ian y/Car4][x/Ian y/Car5]] - Ian owns at least two red cars
> [[x/blank y/Car4]] - Somebody owns this red car

It could work. We would need to define the semantics a bit more
clearly - e.g., is it possible to return an answer like:

[[x/blank y/blank][x/blank y/blank]] - at least 2 different people own
red cars (warning, this may lead to infinite answers)
[[x/Ian y/Car4][x/Ian y/Car5][x/blank y/blank]] - Ian owns at least
two red cars, and someone else who is not Ian also owns a red car

Ian

> 
> Pat
> -- 
> ---------------------------------------------------------------------
> IHMC					(850)434 8903   home
> 40 South Alcaniz St.			(850)202 4416   office
> Pensacola,  FL 32501			(850)202 4440   fax
> [email protected] 
> http://www.coginst.uwf.edu/~phayes
------- end of forwarded message -------
Next message: Jim Hendler: "Re: announcement by W3C"
Previous message: Peter F. Patel-Schneider: "announcement by W3C"
Maybe in reply to: Ian Horrocks: "Re: on behalf of sandro (fwd)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]
This archive was generated by hypermail 2.1.4 : 04/02/02 EST