Re: DQL Query Patterns and Variable Bindings

From: Pat Hayes (phayes@ai.uwf.edu)
Date: 12/11/01


>We seem to have reached some significant agreements in last week's
>telecon regarding DQL, the query language for DAML+OIL.  This message is
>an attempt to record those agreements.
>
>As a prelude, Pat in particular has been asking for an overview of what
>is in a DQL query and what is in the response to a query.  We haven't
>agreed on all the components of a query or a query response, but
>attached below is an outline that hopefully will provide some context
>for the agreements we have reached and help clarify what else is to be
>done.

OK, Ive inserted suggestions either as comments or inline text marked /thus/

>
>Here is what I think we agreed on in last week's telecon:
>
>A query contains a query pattern that specifies a conjunction

Do we want to allow conjunctions in the first instance? I would 
suggest not, ie say 'a DAML+OIL expression'.

>of
>DAML+OIL sentences in which /some/ literals and/or resource /name/s 
>have been replaced
>by variables.  A query includes a specification of which of the
>variables in the query pattern are distinguished variables.  A query
>answer includes a binding for each distinguished variable such that the
>query knowledge base entails the sentence produced by replacing each
>distinguished variable  in the query pattern by its binding and
>considering the remaining variables in the query pattern to be
>existentially quantified.

There may also be restrictions on the ways that variables can occur 
in the query pattern, eg requirements of no 'looping'. (We should 
keep this possibility open, maybe?. )

>A binding for a distinguished variable can be one of the following:
>
>* A literal;
>
>* The name of a resource

Does that mean a URI? If so, best to say that explicitly. (If not, 
what other kind of resource names are contemplated?)

>                      that occurs in the query kb or that is defined
in the DAML+OIL language; or
>
>* A "blank" name that corresponds to an anonymous resource that occurs
>in the query kb.

rephrase:
*A "blank" name that indicates that a binding has been found, but its 
scope is local to the KB.

HOWEVER, I now think that we don't really need the blanks after all; 
sorry I caused such a fuss. (See other message.)

>There is a given set of DAML+OIL property names and class names that
>cannot be bindings for a distinguished variable in a query answer.  That
>set is yet to be specified.
>
>
>If you disagree with any of the above, I would like for you to be
>constructive in your disagreement by providing an edited version of the
>above with which you do agree.

OK, did my best.

>
>Richard
>
>---------------------------------------------------------------------
>
>TERMINOLOGY
>
>I am referring to the agent sending the query as the "client" and the
>agent receiving the query as the "server".  I am referring to the
>response sent by the server to the client as a "query result" and
>assuming that a query result may contain one or more "query answers".
>
>WHAT'S IN A QUERY
>
>* Knowledge Base - I think we are all agree that a query is posed with
>respect to a DAML+OIL knowledge base.  Thus, a query needs to include a
>reference to a DAML+OIL knowledge base.  I am referring to that
>knowledge base as the "query KB".

It would make more sense to call it the 'server KB'.

Do we in fact want to assume that there is a unique KB for each 
query? Eg consider a 'services' setting in which a query can be 
published, meaning 'any site that can prove this, give me an answer'. 
The RDF core WG considered such a possibility, where one might 
publish a piece of RDF that said, in effect, please prove that I can 
get flowers from you amounting to this quantity before this date at 
less than this price (and then you and I have a deal).

This might well be a natural way to deal with 'queries', in fact, in 
a commercial B2B context; the logic is the same, so why not allow it 
as a possibility? In other words, such a publication is a kind of 
open-ended query in which the KB - ie the identification of the KB - 
is itself part of the answer.

>
>* Premise - I have proposed that a query optionally include a premise to
>facilitate if-then queries while still remaining within the
>expressiveness of DAML+OIL.  Specifically, I have proposed that a
>premise be an arbitrary DAML+OIL knowledge base.  There has been no
>formal agreement on whether or not DQL will allow a "query premise".

I would vote not, in the first draft. It smacks of tiptoeing into 
'rules' territory, and it ought to be definable in any case by 
querying a KB containing the premise and an import of the previous KB.

>
>* Pattern - I think we are agreed that a query necessarily contains a
>"query pattern" that specifies relationships among unknown sets of
>objects in a domain of discourse.  Each unknown object is represented in
>the query pattern by a variable.
>
>* Distinguished Variables - I think we all agree that a query
>necessarily contains a specification of which of the variables that
>occur in the query pattern are "distinguished variables".  Each query
>answer will include a binding for each distinguished variable.  I am
>referring to the variables in the query pattern that are not
>distinguished variables as "non-distinguished variables".

undistinguished variables?

>
>* Number of Answers Requested - I have proposed that a query contain a
>specification of how many query answers are being requested and whether
>those answers are to be returned all at once or in batches accompanied
>by an "enumerator" that allows for requesting the next batch of
>answers.  We have not agreed on that proposal nor have we agreed on what
>makes two query answers distinct when counting them.

I would like to leave this out of the query at present, but retain 
the core of the 'batch' idea, in that an answer may contain an 
indication of whether or not the querying process is 'finished'. That 
is, it can say that there may be other answers to be found, and 
provide a way to continue the process. A response to such an answer 
is allowed to be 'continue, find the next one'. Notice that no 
numbers are involved in this. This gives us the functionality 
(essentially a primitive API) to define more elaborate processes that 
find all answers, a certain number of answers, and so on, but it 
keeps arithmetic out of the basic query-answering process, which I 
think is semantically crucial. (See PS.)

>
>* Justifications Requested - I have proposed that a query can optionally
>contain a request for a justification of each query answer.  We have not
>agreed on that proposal nor on what a justification for a query answer
>is.

I like that idea, and would propose that a justification is a proof. 
What a DAML+OIL proof is, has yet to be determined, of course: but 
whatever it turns out to be, that is what a justification should be 
as well. (Later: maybe this is more complicated, now I think about 
it: see end of message.)

>* What else?  
>
>
>WHAT'S IN A QUERY RESPONSE
>
>* Query - The query to which this is a response.
>
>* Server - The server that produced this response.

? This seems rather like having a piece of code sign its name to 
everything it does. Surely, if I am querying a KB, I already know 
what the query was. Why do I need to be told this again?

BUt in any case, what exactly *is* the 'server' here? You seem to be 
assuming that servers are genuine things on the web, but that seems 
to be something that we havn't really decided on yet. How does DAML 
refer to agents, so it can express this response? (Or indeed to 
queries, for that matter)?

>
>* Answers - Zero or more answers to the query.

Right, but how is 'zero answers' indicated? See below for a suggestion.

>
>* Number of Answers - I have proposed that a query result optionally
>contain a specification provided by the server of the number of answers
>to the query that are entailed by the query kb.  We have not agreed on
>that proposal.

I suggest not, as part of the basic querying protocol. I don't want 
the basic query language to assume arithmetic, or to require that 
*all* query-answering processes must support counting and tallying. 
That seems to be an optional extra, involving processes that are 
external to the basic query-answering process.

>
>* What else?  
>
>WHAT'S IN A QUERY ANSWER

I would like each response to deliver precisely one answer (which 
might be an indication of the failure to find a(nother) binding.) 
However, it also retains a state, in the sense that the process of 
answering can be resumed, in general. Abstractly, then, a response 
could be defined as either 'none', meaning that there are no further 
bindings; or a pair consisting of an answer (ie a set of bindings) 
and a server continuation (possibly temporary and time-stamped in 
some way; but the basic querying protocol needn't go into the details 
of that.) See PS.

>* Bindings - I think we all agree that each query answer will contain a
>binding for each of the query's distinguished variables.  We have not
>agreed on whether bindings for the non-distinguished variables are to be
>included in a query answer.

Surely not, or else there is no need to even make the "distinguished" 
distinction.

>* Justification - If we include a request for justifications in a query,
>then each query answer would include a justification when the query
>contains such a request.

In this case I think we have to re-think what an answer amounts to. 
There seems to be a continuum here,  since the 'justification' kind 
of answer includes all the information in the 'binding' kind of 
answer. The real question might be phrased as, how much of the 
'justification' is going to be included in an answer? There are other 
options here as well as the two extremes; eg one might want to know 
the bindings and the premises used, but not the full proof; or the 
bindings and the names of the source ontologies of the premises; or 
even something like knowing that a certain property was true of those 
sources (eg that they were all warranted by a certain authority, or 
had transaction dates later than some named date, etc.).

In the first instance, however, I think we should just stick to 
bindings, to keep things simple.

>* What else?

The continuations, mentioned above, would be my suggestion.

Pat

PS Ive summed this up in a separate message.
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes


This archive was generated by hypermail 2.1.4 : 04/02/02 EST