DQL

From: Ian Horrocks (horrocks@cs.man.ac.uk)
Date: 08/20/02

  • Next message: Deborah McGuinness: "Re: DQL"
    
    
    Re: DQL next draft (for
    comments)
    
    Content-Type: text/plain; charset=us-ascii
    Content-Description: message body text
    Content-Transfer-Encoding: 7bit

    On July 12, pat hayes writes:
    > This is here in plain text and also as attached HTML. Sorry it took so long.

    Here is an updated version with some minor errors corrected and some
    questions regarding trickier issues. The main point is that I think we
    need a bit more precision w.r.t. what constitutes an answer. I
    suggested a MT style definition. Let me know what you think. I have
    highlighted comments and changes - I have found this to be a useful
    technique for cooperative working on html documents.
    I'm afraid I didn't bother with the plain text. Sorry it took so long.




    Regards, Ian
    Informal DQL Specification
    DAML Joint Committee
    Richard Fikes, Pat Hayes, Ian Horrocks, editors
    June 12, 2002
    1. Overview
    DQL is a formal language and protocol for posing queries from a querying agent (which we refer to as the "client") to an answering agent (which we refer to as the "server").  A DQL query contains a "query pattern" that is a collection of DAML+OIL sentences in which some literals and/or urirefs have been replaced by variables. An answer to the query provides bindings of terms to some of these variables such that the conjunction of the answer sentences - produced by applying the bindings to the query pattern and considering the remaining variables in the query pattern to be existentially quantified - is entailed by a knowledge base called the "answer KB".

    1. You use "sentences" above. Is this wise? What is the difference between these and the "assertions" you refer to below?

    None. Maybe it would be better to use 'expressions', but that is capable of being misunderstood, since eg terms are expressions. I was using 'sentence' in the usual logical sense of 'expression with a truthvalue'.

    You are right that I have been sloppy with collection of sentences versus conjunction of sentences. We should clarify that. I think it would be less confusing if we just said that collection of sentences in a query is treated as a conjunction, and then to assume that all queries are single sentences.

     I'm not sure why we don't say that a query is a DAML+OIL KB where some of literals and/or urirefs have been replaced by variables?

    Sure, OK. I guess it seems odd to call a query a knowledge base, which is why I prefer to use a more neutral term like 'sentence'. But whatever....

    2. The use of "collection" is a bit vague. When you say the answer sentence is entailed, do you mean the conjunction of answer sentences (as I wrote above), or each of them independently?

    Each independently. That is, each answer should be entailed by the server KB.

     There is obviously a big difference in the case where they share variables and/or constants.

    I didnt anticipate that two answers would ever share a variable. How could that happen?

    Oh, wait. I see your point. Right, I meant that the query pattern with the answer binding is entailed. If the pattern is a conjunction, then of course its the conjunction that is entailed. Sorry about that confusion.

    3. We really need to make it clear that the variables are existentially quantified at the outer level.

    I thought that was clear, but feel free to change the wording.

    A formal definition would be useful (if not here, then in the detailed spec below), but I'm not sure how to do it in HTML.


    Each binding in a query answer is a uri-ref or a literal that either explicitly occurs as a term in the answer KB or is a term in DAML+OIL.  That is, DQL is designed for answering queries of the form "What uri-refs and literals from the answer KB and DAML+OIL denote objects that make the query pattern true?" We note that this does not require servers to generate existential conclusions from 'implicit' knowledge in order to answer queries.

    I don't understand the last sentence.

    It is imprecise, but it was intended to address the point that you originally raised in this  discussion. I just wanted to emphasize that we are not expecting that every logical entailment will be reflected in a possible answer.


    Variables in queries may be designated as "must bind" or "may bind" variables. Answers are required to provide bindings for all "must bind" variables, and may provide bindings for "may bind" variables. Queries may optionally provide or require information about the knowledge based used to answer the query and impose constraints on the dynamics of the answering process. Answers provided by the server must conform to these requirements, but a server may restrict its answers to certain classes of query pattern,  to a certain class of knowledge bases, or to a limited range of bindings.

    A single query may have zero or more answers. The set of all answers provided by the server in response to a query is called the "response set" of that query. Not all the answers in the response set need be produced at once: in general, answers will be delivered in groups. A query may specify an upper bound on the number of answers that are delivered in a single group.

    The set of DAML assertions which are used by the server in answering a query is referred to as the "answer KB". This may be an actual knowledge base (or a finite set of knowledge bases) or it may be a virtual entity representing the total information available to the server at the time of answering; however, all servers are required to be able to provide a reference to a resource representing the answer KB. We will assume that such a reference to an answer KB has the form of a uriref; in many cases this may be a URL which can be used to access the KB or communicate with the server, but this is not required. A DQL query contains an "answer KB expression" which is either a variable or a reference to a KB. If the answer KB expression in a query is a reference to a KB, then all answer sentences of answers in the response set must be entailed by that KB. If it is a variable, then the server is free to select or to generate an answer KB, but if the variable is "must bind" then the answer must provide a binding to this variable which references the answer KB. 

    DQL specifies a core set of protocol elements that are to be used by a client to obtain query answers from a server.  Specifically, DQL specifies that a client initiates a query-answering dialogue with a server by sending the server a DQL query.  The server is expected to respond by sending answers to the client one or more at a time along with a server continuation that is either a process handle which the client can use to request additional answers or a token indicating that the server will not provide any more answers to the query. A process handle is an atomic entity with no internal structure visible to the client; its role is only to allow the server to record the state of its answer search. The token can be 'none', meaning that the server is claiming that there are no further answers entailed by the answer KB, or 'end', meaning that the server is making no claims as to whether there are more answers entailed by the answer KB. Other token values may be allowed, but in all cases it is required that a token be clearly distinguishable from a process handle.  No attempt is made here to specify a complete inter-agent protocol (e.g., with provisions for time-outs, error handling, resource budgets, etc.).  Query answering servers are required to support the specified core protocol elements and are not constrained by the DQL specification as to how additional protocol functionality is provided.
    2. Detailed specification
    The client initiates a dialog with the server by sending a query. The typical response is a bundle of answers plus a server continuation which can be send back by the client to the server. On receiving a server continuation, the server responds similarly until the continuation in the response is a termination token. The set of all answers in all groups sent from the server to the client between the query and the termination token is the response set of the query.
    Query
    A DQL query necessarily includes:
    A DQL query may also optionally include:
    Answer
    An answer to a query must contain:
    1.      A variable in the answer KB pattern is bound to a reference to the answer KB;
    2.      All variables in the query pattern are bound to terms which occur in the DAML+OIL language or in the answer KB;
    3.      The answer KB entails the answer sentence got by replacing all variables in the query pattern which are bound in the answer by their bindings, and replacing all other variables by new RDF blank nodes.

    Right. I guess I have gotten too familiar with the usual convention whereby a collection of sentences is considered to be a conjunction (which is a sentence). THis is so automatic that I often don't notice it, but we should be more careful.


    Well, I guess I was thinking of that as synonymous with 'existential variable' but expressed in RDF-friendly terminology, is all. BUt I now agree that it is not adequate and should be changed.


    BUt it is clear from other parts of the DAML spec, right?


    That would just be a mistake, since DAML is defined to be RDF and RDF is defined to be the RDF graph. But maybe we should be more explicit about this.


    This is one reason why RDF/XML can't be used as a reference language, by the way.


    Ah, good point. OK, lets not refer to blank nodes at all :-)


    OK, though I think it can be made more readable:

    Suppose Q is a query pattern, ie a KB in which some urirefs and/or literals have been replaced by variables. A binding for Q is a lexical mapping which associates a uriref or literal  in the answer KB to every must-bind variable and possibly to some of the other variables in Q. We write Q(B) to refer to the KB got by applying the binding mapping B to Q, ie substituting B(v) for every variable v which occurs in B. B(Q) may contains some variables from Q which are not replaced by B; these are called remaining variables.

    An interpretation I satisfies B(Q) if there is a mapping C from the remaining variables of B(Q) to the universe of I such that I+C satisfies B(Q); that is, if the interpretation can be extended to provide interpretations of the remaining variables in some way which makes B(Q) true. Then, in the usual way, we say that the answer KB entails B(Q) just in case B(Q) is true in every interpretation which makes the answer KB true. Intuitively, this means that the remaining variables are treated as existential 'blanks', which indicate that something exists without saying what it is.

    In addition, an answer may contain:
    Answer bundle
    An answer bundle is a finite set of answers plus a tag consisting of either a server continuationor one or more termination tokens. The number of answers in an answer bundle given in response to a query must not exceed the answer bundle size bound in the query, if present.

    Server continuations are atomic entities with no internal structure visible to the client. A server must continue the answering process when sent a server continuation by sending back another answer bundle. Servers should encode sufficient information in the server continuation to enable them to continue the answering dialog even if they have been engaged in other activities since sending the previous answer bundle. Every dialog should terminate after a finite number of exchanges of server continuations and answer bundles between the client and server. A dialog is said to have terminated when the server sends a bundle containing a termination token. If a server is sent a termination token as a server continuation, the server should reply with a bundle containing no answers and the same termination token, thereby terminating the dialog.

    Clients must be able to distinguish termination tokens from server continuations. Termination tokens may be used to convey information about the response set; in particular, two termination tokens have fixed meanings. The token "end" simply means that the server is unable to deliver any further answers, but makes no claim of completeness. The token "none" indicates that the answer KB does not entail any other answers not in the response set. We note that the use of the "none" token should be restricted to those cases where the server is able to make a positive affirmation that no other answers exist, i.e., to provide a guarantee that there are no other possible bindings to the query variables which would produce an answer sentence that would be entailed by the answer KB. Other termination tags may also be used, but the "end" tag is the recommended way to indicate termination of a question-answering dialog.

    We said above that a bundle ends with a single token; here with one or more tokens. Which do we mean?

    One or more. Sorry, that was just a slip.
    I would say just one token, but maybe you are thinking of cases where you want/need more. If only one, then we can't say that "end" is the recommended terminator. If we allow more than one, what would it mean if I say "none end" as opposed to "end none"?

    Order is immaterial. We should say that explicitly.


    There is no provision in DQL for a query to indicate an upper bound on the total number of answers in a dialog, but a client can terminate a question-answering dialog at any time by sending the "end" token as a server continuation, or simply by not requesting any further continuations.

    Response Set
    While there are no global requirements on a response set other than that all its members are correct answers, it is recommended that servers ensure that answer bundles do not contain duplicate or redundant answers, i.e. answers which are subsumed by other answers.  One answer subsumes another if its bindings are a superset of the bindings in the other answer.  Servers which are able to guarantee that their response sets contain no duplicate answers can be called "non-repeating". Servers which are able to guarantee that their response sets contain no duplicate or redundant answers can be called "terse" or "non-verbose".  Servers which are able to guarantee that their response sets will be correctly terminated with "none" can be called "complete".  

    The answer set of a query is the largest set of answers which are entailed by the answer KB and none of which are entailed by any other answer. Notice that this definition is semantic rather than operational. A complete server is one whose response set contains the answer set of the query. A terse complete server is one whose response set is precisely the answer set of the query.

    It may be impossible to implement a server that can guarantee to be terse and complete for all KBs and query patterns.
    3. Other Issues
    Restricted query patterns
    The specification of query pattern above allows for arbitrary patterns of variable replacement in DAML expressions. Particular servers, however, may restrict themselves to particular such query patterns, or provide guarantees of giving meaningful answers only when given particular kinds of query pattern. To allow for such cases we introduce the notion of a query class, defined simply as a class of patterns. Server specifications may refer to any well-defined query pattern restriction and define their performance, in the terms of this standard, to the case where all references to a query pattern are understood to refer only to query patterns in that class.

    If a server accepts only query patterns in a certain class, it is said to "apply to" that class; the notions of completeness and terseness may also be relativized to queries of a certain class, when stating the conformance of a server to this specification.

    For example, one class of query patterns might be those of one of the forms:

    ?x rdf:type CCC .
    ?x daml:subClassOf ?y .

    where CCC is some DAML class expression, or of the form

    ?x PPP ?y .

    where PPP is some DAML property expression other than those used in RDF(S) or DAML+OIL syntax.

    When performance is defined relative to a query class, the termination token 'none' should not be used to mean 'none relative to the class'. To avoid confusion, the use of other termination tokens, each with a meaning defined relative to the particular class, is required.

    Future versions of this specification may define particular query classes and corresponding termination tokens.
    "How Many" Queries
    The language and protocol contains no explicit constructs for asking how many (or how many more) answers there are to a given query.  Defining what is meant by "how many" is problematic in that there can be multiple bindings for a given distinguished variable that all denote the same object in the domain of discourse, so that how many answer bindings there are for a given distinguished variable will in general differ from how many answer objects in the domain of discourse that variable can denote.  The core protocol could reasonably be extended to support "how many" queries, where "how many" means how many answers containing distinct sets of bindings can the server produce.  The difficulty of a server determining how many answers it can produce to a query without actually producing the answers has been the primary rationale for not including a "how many" construct in the query language.
    Inability to respond to queries
    Servers are not required to deliver answers to queries. Under various circumstances, a query may be phrased in a form which makes it impossible for a server to respond with any answers; for example, if the query specifies an answer KB which the server is unable to access or use, or where no bindings are available for "must bind" variables. Under these circumstances, the server should terminate the dialog with a bundle containing no answers and an appropriate termination tag or tags, one of which should be 'end'.



    -----


    -- 
    
    ---------------------------------------------------------------------
    IHMC                                       (850)434 8903   home
    40 South Alcaniz St.                        (850)202 4416   office
    Pensacola,  FL 32501                      (850)202 4440   fax
    phayes@ai.uwf.edu                              http://www.coginst.uwf.edu/~phayes


    This archive was generated by hypermail 2.1.4 : 08/20/02 EDT