Re: Query Language Issues

From: Pat Hayes (phayes@ai.uwf.edu)
Date: 11/13/01


>This message is to reply to the points in your earlier messages that I
>have not replied to --
>
>>  > I think all that matters for our formalization and core
>>  > design work is that a query answer may make use of the
>>  > bindings to only a subset of the variables in the query
>>  > pattern.  So, my recommendation is that for now we consider
>>  > a query answer to consist of a set of bindings for a subset
>>  > of the variables, call them “query variables”, and that the
>>  > query specifies which variables are in that subset.
>
>[the above excerpt has been edited to match the message I sent out last
>evening.]
>
>>  That would be my natural inclination also, but I think it needs to be
>>  extended some. For example, we might want to distinguish between the
>>  case where the query is said to be OK but no bindings are provided,
>>  from the case where the query is simply answered with 'no' or 'fail'
>>  or whatever. Ian wants to distinguish the 'don't know' (ie unprovable
>>  from KB) answer case from the 'no' (ie, contradictory with the KB)
>>  cases. So in general there might be other information in an answer
>>  than just the bindings alone.
>
>Yes, there will be more information in a query result than just bindings

OK, but I feel we are going in circles here. Doesn't the first-quoted 
paragraph above say the exact opposite of this? (Or are you making a 
distinction between a query ANSWER and a query RESULT ?)

>(and more information in a query than just a query pattern).  That is a
>core motivation for the various components of the query language design
>we have been putting together and I have been proposing to the
>committee.
>
>If a query has no query variables, then a "yes" answer to the query
>would be represented by a query answer containing an empty set of
>bindings.  When no answers are found for a query, the query result would
>contain no query answers. 
>
>Note that the proposal is for a query result to contain a specification
>of how many query answers are entailed ("# answers entailed") in the
>form of an ordered paired denoting a closed interval.

As you know from the telecon, I do not like this proposal. Well, it 
depends on what it means. If this is just a matter of keeping count 
of the number of bindings produced, then of course it is harmless, 
though also kind of pointless (and there is no need for the closed 
interval.)  But I gather you mean this to be a much more powerful 
idea, to be used to convey intensional information about cardinality 
of the answer set even in cases where this cannot be inferred from 
the KB itself. I do not think that we should undertake to propose 
such a construct in a querying language unless we have a VERY good 
idea about how these answers are going to be determined, in general.

>  So, the value of
>#-Answers-Entailed distinguishes the 'don't know" case from the 'no'
>case in that if the server produces no answers and doesn't know how many
>answers there are, it would produce a query result with
>#-Answers-Entailed set to [0,0-0], but if the server knows there are no
>answers, then it would produce a query result with #-Answers-Entailed
>set to [0,0].
>
>>  >, and that the query specifies which query
>>  >variables are in that subset.  I will make that assumption in the
>>  >remainder of this document.
>>
>>  Wait. The *query* specifies which query variables are in the set? So
>>  there could be query variables which are not being, as it were,
>>  queried? (Then why did you call them query variables?)
>
>In response to your messages, I have changed my terminology about
>variables in the query pattern in the proposal I sent out last evening
>(as per the excerpt repeated above).  I.e., I am now calling "query
>variables" the subset of the variables in the query pattern that are
>specified by the query to be in the query answer.  I think that
>corresponds to the terminology you have been using in your messages.

I gather than Ian now wants us to use a different terminology.

>  > >Uniqueness of Answers
>  > >
>>  >In general, a query result may contain multiple query answers.
>>  >
>>  >ISSUE: What guarantees do we want to make about the distinctiveness of
>>  >multiple query answers in a query result?  We may want to guarantee that
>>  >no two answers consist of identical sets of bindings.
>>
>>  Why would we want to do that? This might be useful information, eg if
>>  one queries
>>  [?x ?y](exists (?z)(and (P ?x ?z)(P ?z ?y)))
>>
>>  (Sorry about the KIF; query variables in brackets)
>>
>>  then getting back two copies of the binding might allow you to infer
>>  (if the KB is closed) that there are precisely two ?z's involved. (If
>>  P is parent-of, this detects  a certain family arrangement which is
>>  universally frowned upon, for example.)
>
>I thought the committee agreed that each query answer would have a
>distinct set of bindings of the query variables.  (The minutes of the
>9/25 telecon say: RESOLVED: DQL should provide some mechanism for
>returning subsets of the variables mentioned in the query. The user can
>get
>multiple occurrences of the same bindings for some variables by adding
>the variables occurring along multiple distinct paths.)

The issue in which this arose was whether we should allow a query to 
specify that duplicate bindings should be suppressed., as in SQL's 
'select distinct'. Providing a mechanism to allow suppression of 
duplicates is not a decision to make duplications illegal. In fact, 
in the context indicated, I think that the last sentence of the 
resolution can be read as stating that duplications *are* permitted, 
by exactly the kind of mechanism suggested.

>  Note that in
>your example, ?z is not a query variable and so wasn't ask about by the
>query.  The rationale for requiring binding sets to be distinct is that
>if one wants to distinguish cases involving multiple bindings of
>variables in the query pattern (e.g., ?z in your example), then those
>variables should be included as query variables.  So, in your example,
>if you want to find out how many ?z's are involved, then include ?z as a
>query variable.
>
>>  > My recommendation is that we guarantee there are no identical
>>  >sets of bindings
>>
>>  I disagree. This is both needless, and onerous for the implementer.
>
>Are you disagreeing that an answer result should not have query answers
>having the same set of bindings?

Yes, I am disagreeing with that. I think that our decision was to 
allow the querier to specify whether or not duplications are 
permitted, and I think the default case should be that they are, as 
it puts less of a burden on the query-answering engine.

I would like us to invent a very simple, basic, vanilla 
query-answering engine which rapidly produces accurate, if dumb, 
answers, and whose answer stream, if I may express it in that way, 
can be 'filtered' by smarter query-answering interfaces to give more 
sensitive kinds of answers. Removing duplicates would be a very 
natural kind of filter. Other kinds of smarter query-answering 
interface might do things like check for identity, try drawing 
conclusions about cardinalities, etc. .

BTW, given that you want to allow cardinalities on numbers of 
answers, aren't you obliged to at least *count* duplications?

>If so, that is a substantive
>disagreement that needs further airing.  Requiring the binding sets to
>be distinctive seems to me to be a rather basic requirement.

Why?

>  What am I
>missing?
>
>>  >and that we enable a query to include an indicator as
>>  >to whether equal sets of bindings are acceptable.
>>
>>  Again, I disagree. First, who is responsible for knowing that two
>>  things are equal? The KB may not have the resources to be able to
>>  prove this, but the querying engine might. But in any case, what is
>>  the utility? We can allow people to include statements of equality in
>>  the query if this issue bothers them.
>
>So, this is more on the issue of what is acceptable as multiple answers
>to a query.  I don't see how one could include in a query pattern that
>multiple answers must denote different objects.

No, but one could in a subsequent query ask whether two of the 
answers to a previous query were identical or not. (This is a good 
example of how allowing multiple queries allows each query to be 
simpler, by the way.)

>The query pattern only
>specifies relationships that must hold among a set of objects that are
>*one* answer to the query.  The issue here is what is allowable as
>multiple sets of objects, each set of which constitutes a query answer.
>
>By the way, it is the server that would be determining that equivalentTo
>holds for two constants.

Yes, of course, but my point above is that the querier may have 
access to information that isn't available to the KB. (BTW, I really 
dislike that term 'server'; it suggests an asymmetry between the 
querier and queried that isn't likely to survive in the broader SW 
world, I think.)

>
>We could, indeed, not provide for a query to include the requirement
>that multiple answers denote distinctive sets of objects.  The client
>could then test the distinctiveness of multiple answers with follow-up
>queries (using differentIndividualFrom).  Note that such testing would
>require many additional queries when there are multiple query variables
>and/or more than two sets of bindings.

Right. If you want to know all this, you have to do more work. But if 
you weren't interested in knowing it, you shouldn't be obliged to do 
all that work. If you just want a quick dirty answer, you should be 
able to get one at minimal cost.

Notice also that these subsequent queries might be entirely generated 
inside a more sophisticated query-answering engine which would 
'compile' a sophisticated query into a series of smaller, dumber, 
queries and then ask them all in turn (keeping track of the common 
ground being created, ie the state of the 'conversation'), and 
finally deliver its more sophisticated answer. That is, we can 
imagine building more sophisticated query engines on top of less 
sophisticated ones, so we do not have to get it all done at once, 
maybe :-).

>
>So, I think enabling a query to include a request for answers to denote
>distinct sets of objects is an optional feature that we can decide do
>include or not in our query language.
>
>>  >I assume our query language needs to enable a query to specify what is
>>  >being asked for and our query result language needs to enable a query
>>  >result to contain the information requested in a query.
>>
>>  You havn't said what this means, though. What IS the information
>>  requested in a query? (Not just a debating point, this seems like a
>>  central issue that we need to get clear, as many other things would
>>  then be also clarified, I think.)
>
>My message that I resent last evening laying out issues and proposals
>describes what I mean by the information requested in a query.  That
>information includes bindings for the query pattern, the number of
>answers entailed by the KB, and justifications for answers.

OK, but you need to say exactly what counts as a 'justification'. 
That term has no self-evident meaning in logic or DL. (See later.)

>
>>  This all seems rather like overkill. Do we really want to get into
>>  infinite ordinals?
>>
>>  Why not have the following distinctions:
>>  query forms:
>>  1.  one answer (ie first one found)/
>>  2. (up to) N answers/
>>  3. all answers.
>>  replies:
>>  1. an answer or 'not provable'/
>>  2. list of M answers, for some M<=N; if M<N then this means these are
>>  all the answers/
>>  3. some list of answers.
>>  Issues: (1) what if there are infinitely many answers? Should we
>>  provide some way to indicate this even though not all the bindings
>>  can be provided?
>>  (2) what about time? For example, it might be useful to get the first
>>  answer when it is available, even if more have been requested. Which
>>  suggests another dimension, viz. do you want the answers all at once
>>  when the entire list is done, or each one when it is fresh?
>
>I think I have covered all of those cases in my proposals, and rather
>succinctly I would claim.  I don't want to go back through the proposals
>here again comparing them to your list above.  I will do so if you
>insist.

But I am suggesting that this is *simpler* than your proposal. It is 
less expressive, but I suggest that it would be adequate, and easier 
to implement. Obviously you can express all these distinctions using 
closed intervals; the questions are, do you need that more expressive 
notation (why?), and do you have any idea how those more general 
queries could be answered?

And BTW, what about the second issue raised above, about timely 
delivery of answers when others are still being discovered (? If you 
have covered this somewhere, forgive me for missing it.)

>
>>  >Justification For Answers
>>  >
>>  >Under the assumption that our query language needs to enable a query to
>>  >specify what is being asked for and our query result language needs to
>>  >enable a query result to contain the information requested in a query,
>>  >the query language needs to enable inclusion in a query a request for a
>>  >justification for each query answer and the query result language needs
>>  >to enable inclusion of such justifications in a query result.
>>
>>  I don't see how this follows.  Again, I think we need to get clearer
>>  on what exactly counts as the information requested. If I hold up a
>>  query then I am asking the KB to prove it for me. What do I want
>>  back: the fact that it has been proved, some of the bindings in the
>>  proof, the expressions used as assumptions in the proof, the entire
>>  proof itself, all the proofs, or what? The last is probably the most
>>  general form, but it would be overkill for many querying
>>  applications. Still, it might be worth thinking of 'information
>>  requested' as being ultimately the entire set of proofs, and then
>>  defining various kinds of 'pruning' or simplification of this. That
>>  would certainly cover the variable-bindings, for example, and it
>>  might also cover justifications. (BTW, what is a justification,
>>  exactly??)
>
>As I said in my message, we have the following issue:
>
>>  ISSUE: Whether and how to include provisions for justifications
>>  to be requested in a query and be included in a query result.
>>  What kind of justification language(s) do we include in our query
>>  results language?
>
>I think we need to enable queries to ask for justifications and for
>query answers to include justifications, but that is only my opinion.

Since you yourself say that you don't know what 'justification' 
means, it hardly amounts to an opinion at present.

>  I
>thought the need for justifications to accompany results in the Semantic
>Web was one of the compelling points that Jim Hendler made repeatedly.
>If we believe that point, then it seems to me we need to include a
>provision for justifications in our query language.
>
>Now, what is a justification?  I don't have an answer to that.

Well then let us work on answering it. (Passing resolutions using 
words when we have no idea what they mean is just pissing into the 
wind.)

As  a start, let me suggest that a 'maximal' justification - the most 
information that one could possibly hope to get from any kind of 
deductive query-answering mechanism - is a fully annotated proof (in 
some suitable format) of the query from the KB, including all the 
antecedents used in the proof, presented in such a way that it could 
be checked mechanically with no more computational effort than was 
used to generate it. Any kind of query that expects more than this 
would be asking too much.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes


This archive was generated by hypermail 2.1.4 : 04/02/02 EST