Re: Query Language Issues

From: Richard Fikes (fikes@ksl.stanford.edu)
Date: 11/12/01


This message is to reply to the points in your earlier messages that I
have not replied to --

> > I think all that matters for our formalization and core
> > design work is that a query answer may make use of the
> > bindings to only a subset of the variables in the query
> > pattern.  So, my recommendation is that for now we consider
> > a query answer to consist of a set of bindings for a subset
> > of the variables, call them “query variables”, and that the
> > query specifies which variables are in that subset.

[the above excerpt has been edited to match the message I sent out last
evening.]

> That would be my natural inclination also, but I think it needs to be 
> extended some. For example, we might want to distinguish between the 
> case where the query is said to be OK but no bindings are provided, 
> from the case where the query is simply answered with 'no' or 'fail' 
> or whatever. Ian wants to distinguish the 'don't know' (ie unprovable 
> from KB) answer case from the 'no' (ie, contradictory with the KB) 
> cases. So in general there might be other information in an answer 
> than just the bindings alone.

Yes, there will be more information in a query result than just bindings
(and more information in a query than just a query pattern).  That is a
core motivation for the various components of the query language design
we have been putting together and I have been proposing to the
committee.

If a query has no query variables, then a "yes" answer to the query
would be represented by a query answer containing an empty set of
bindings.  When no answers are found for a query, the query result would
contain no query answers.  

Note that the proposal is for a query result to contain a specification
of how many query answers are entailed ("# answers entailed") in the
form of an ordered paired denoting a closed interval.  So, the value of
#-Answers-Entailed distinguishes the 'don't know" case from the 'no'
case in that if the server produces no answers and doesn't know how many
answers there are, it would produce a query result with
#-Answers-Entailed set to [0,0-0], but if the server knows there are no
answers, then it would produce a query result with #-Answers-Entailed
set to [0,0].

> >, and that the query specifies which query
> >variables are in that subset.  I will make that assumption in the
> >remainder of this document.
> 
> Wait. The *query* specifies which query variables are in the set? So 
> there could be query variables which are not being, as it were, 
> queried? (Then why did you call them query variables?)

In response to your messages, I have changed my terminology about
variables in the query pattern in the proposal I sent out last evening
(as per the excerpt repeated above).  I.e., I am now calling "query
variables" the subset of the variables in the query pattern that are
specified by the query to be in the query answer.  I think that
corresponds to the terminology you have been using in your messages.

> >Uniqueness of Answers
> >
> >In general, a query result may contain multiple query answers.
> >
> >ISSUE: What guarantees do we want to make about the distinctiveness of
> >multiple query answers in a query result?  We may want to guarantee that
> >no two answers consist of identical sets of bindings.
> 
> Why would we want to do that? This might be useful information, eg if 
> one queries
> [?x ?y](exists (?z)(and (P ?x ?z)(P ?z ?y)))
> 
> (Sorry about the KIF; query variables in brackets)
> 
> then getting back two copies of the binding might allow you to infer 
> (if the KB is closed) that there are precisely two ?z's involved. (If 
> P is parent-of, this detects  a certain family arrangement which is 
> universally frowned upon, for example.)

I thought the committee agreed that each query answer would have a
distinct set of bindings of the query variables.  (The minutes of the
9/25 telecon say: RESOLVED: DQL should provide some mechanism for
returning subsets of the variables mentioned in the query. The user can
get
multiple occurrences of the same bindings for some variables by adding
the variables occurring along multiple distinct paths.)  Note that in
your example, ?z is not a query variable and so wasn't ask about by the
query.  The rationale for requiring binding sets to be distinct is that
if one wants to distinguish cases involving multiple bindings of
variables in the query pattern (e.g., ?z in your example), then those
variables should be included as query variables.  So, in your example,
if you want to find out how many ?z's are involved, then include ?z as a
query variable. 

> > My recommendation is that we guarantee there are no identical
> >sets of bindings
> 
> I disagree. This is both needless, and onerous for the implementer.

Are you disagreeing that an answer result should not have query answers
having the same set of bindings?  If so, that is a substantive
disagreement that needs further airing.  Requiring the binding sets to
be distinctive seems to me to be a rather basic requirement.  What am I
missing? 

> >and that we enable a query to include an indicator as
> >to whether equal sets of bindings are acceptable.
> 
> Again, I disagree. First, who is responsible for knowing that two 
> things are equal? The KB may not have the resources to be able to 
> prove this, but the querying engine might. But in any case, what is 
> the utility? We can allow people to include statements of equality in 
> the query if this issue bothers them.

So, this is more on the issue of what is acceptable as multiple answers
to a query.  I don't see how one could include in a query pattern that
multiple answers must denote different objects.  The query pattern only
specifies relationships that must hold among a set of objects that are
*one* answer to the query.  The issue here is what is allowable as
multiple sets of objects, each set of which constitutes a query answer.

By the way, it is the server that would be determining that equivalentTo
holds for two constants.  

We could, indeed, not provide for a query to include the requirement
that multiple answers denote distinctive sets of objects.  The client
could then test the distinctiveness of multiple answers with follow-up
queries (using differentIndividualFrom).  Note that such testing would
require many additional queries when there are multiple query variables
and/or more than two sets of bindings.

So, I think enabling a query to include a request for answers to denote
distinct sets of objects is an optional feature that we can decide do
include or not in our query language.

> >I assume our query language needs to enable a query to specify what is
> >being asked for and our query result language needs to enable a query
> >result to contain the information requested in a query.
> 
> You havn't said what this means, though. What IS the information 
> requested in a query? (Not just a debating point, this seems like a 
> central issue that we need to get clear, as many other things would 
> then be also clarified, I think.)

My message that I resent last evening laying out issues and proposals
describes what I mean by the information requested in a query.  That
information includes bindings for the query pattern, the number of
answers entailed by the KB, and justifications for answers.  

> This all seems rather like overkill. Do we really want to get into 
> infinite ordinals?
> 
> Why not have the following distinctions:
> query forms:
> 1.  one answer (ie first one found)/
> 2. (up to) N answers/
> 3. all answers.
> replies:
> 1. an answer or 'not provable'/
> 2. list of M answers, for some M<=N; if M<N then this means these are 
> all the answers/
> 3. some list of answers.
> Issues: (1) what if there are infinitely many answers? Should we 
> provide some way to indicate this even though not all the bindings 
> can be provided?
> (2) what about time? For example, it might be useful to get the first 
> answer when it is available, even if more have been requested. Which 
> suggests another dimension, viz. do you want the answers all at once 
> when the entire list is done, or each one when it is fresh?

I think I have covered all of those cases in my proposals, and rather
succinctly I would claim.  I don't want to go back through the proposals
here again comparing them to your list above.  I will do so if you
insist.

> >Justification For Answers
> >
> >Under the assumption that our query language needs to enable a query to
> >specify what is being asked for and our query result language needs to
> >enable a query result to contain the information requested in a query,
> >the query language needs to enable inclusion in a query a request for a
> >justification for each query answer and the query result language needs
> >to enable inclusion of such justifications in a query result.
> 
> I don't see how this follows.  Again, I think we need to get clearer 
> on what exactly counts as the information requested. If I hold up a 
> query then I am asking the KB to prove it for me. What do I want 
> back: the fact that it has been proved, some of the bindings in the 
> proof, the expressions used as assumptions in the proof, the entire 
> proof itself, all the proofs, or what? The last is probably the most 
> general form, but it would be overkill for many querying 
> applications. Still, it might be worth thinking of 'information 
> requested' as being ultimately the entire set of proofs, and then 
> defining various kinds of 'pruning' or simplification of this. That 
> would certainly cover the variable-bindings, for example, and it 
> might also cover justifications. (BTW, what is a justification, 
> exactly??)

As I said in my message, we have the following issue:

> ISSUE: Whether and how to include provisions for justifications
> to be requested in a query and be included in a query result.
> What kind of justification language(s) do we include in our query
> results language?

I think we need to enable queries to ask for justifications and for
query answers to include justifications, but that is only my opinion.  I
thought the need for justifications to accompany results in the Semantic
Web was one of the compelling points that Jim Hendler made repeatedly. 
If we believe that point, then it seems to me we need to include a
provision for justifications in our query language.

Now, what is a justification?  I don't have an answer to that.  We
probably would want to have a standard ontology of justifications that
we would seed with some obvious ones (e.g., standard FOL proofs) and
that people would extend on an as needed basis.  ... 

Ok.  I think I have responded to all the points in your messages.  

Richard


This archive was generated by hypermail 2.1.4 : 04/02/02 EST