Re: DQL

From: Deborah McGuinness (dlm@ksl.stanford.edu)
Date: 08/20/02

  • Next message: Richard Fikes: "Updated DQL Spec"
    this was the document that i found when i was hunting yesterday.
    the file was called 6-24-02  but had an embedded date of may 24
    
    Ian Horrocks wrote:
    
    >> Content-Type: text/plain; charset=us-ascii
    >> Content-Description: message body text
    >> Content-Transfer-Encoding: 7bit
    >>
    >> On July 12, pat hayes writes:
    >> > This is here in plain text and also as attached HTML. Sorry it
    >> took so long.
    >>
    >> Here is an updated version with some minor errors corrected and some
    >>
    >> questions regarding trickier issues. The main point is that I think
    >> we
    >> need a bit more precision w.r.t. what constitutes an answer. I
    >> suggested a MT style definition. Let me know what you think. I have
    >> highlighted comments and changes - I have found this to be a useful
    >> technique for cooperative working on html documents.
    >
    >> I'm afraid I didn't bother with the plain text. Sorry it took so
    >> long.
    >
    >
    >
    >>
    >> Regards, Ian
    >
    >> Informal DQL Specification
    >
    >> DAML Joint Committee
    >
    >> Richard Fikes, Pat Hayes, Ian Horrocks, editors
    >
    >> June 12, 2002
    >
    >> 1. Overview
    >
    >> DQL is a formal language and protocol for posing queries from a
    >> querying agent (which we refer to as the "client") to an answering
    >> agent (which we refer to as the "server").  A DQL query contains a
    >> "query pattern" that is a collection of DAML+OIL sentences in which
    >> some literals and/or urirefs have been replaced by variables. An
    >> answer to the query provides bindings of terms to some of these
    >> variables such that the conjunction of the answer sentences -
    >> produced by applying the bindings to the query pattern and
    >> considering the remaining variables in the query pattern to be
    >> existentially quantified - is entailed by a knowledge base called
    >> the "answer KB".
    >>
    >> 1. You use "sentences" above. Is this wise? What is the difference
    >> between these and the "assertions" you refer to below?
    >
    >  None. Maybe it would be better to use 'expressions', but that is
    > capable of being misunderstood, since eg terms are expressions. I was
    > using 'sentence' in the usual logical sense of 'expression with a
    > truthvalue'. You are right that I have been sloppy with collection of
    > sentences versus conjunction of sentences. We should clarify that. I
    > think it would be less confusing if we just said that collection of
    > sentences in a query is treated as a conjunction, and then to assume
    > that all queries are single sentences.
    >
    >>  I'm not sure why we don't say that a query is a DAML+OIL KB where
    >> some of literals and/or urirefs have been replaced by variables?
    >
    >  Sure, OK. I guess it seems odd to call a query a knowledge base,
    > which is why I prefer to use a more neutral term like 'sentence'. But
    > whatever....
    >
    >> 2. The use of "collection" is a bit vague. When you say the answer
    >> sentence is entailed, do you mean the conjunction of answer
    >> sentences (as I wrote above), or each of them independently?
    >
    >  Each independently. That is, each answer should be entailed by the
    > server KB.
    >
    >>  There is obviously a big difference in the case where they share
    >> variables and/or constants.
    >
    >  I didnt anticipate that two answers would ever share a variable. How
    > could that happen? Oh, wait. I see your point. Right, I meant that the
    > query pattern with the answer binding is entailed. If the pattern is a
    > conjunction, then of course its the conjunction that is entailed.
    > Sorry about that confusion.
    >
    >> 3. We really need to make it clear that the variables are
    >> existentially quantified at the outer level.
    >
    >  I thought that was clear, but feel free to change the wording.
    >
    >> A formal definition would be useful (if not here, then in the
    >> detailed spec below), but I'm not sure how to do it in HTML.
    >
    >
    >
    >>
    >> Each binding in a query answer is a uri-ref or a literal that either
    >> explicitly occurs as a term in the answer KB or is a term in
    >> DAML+OIL.  That is, DQL is designed for answering queries of the
    >> form "What uri-refs and literals from the answer KB and DAML+OIL
    >> denote objects that make the query pattern true?" We note that this
    >> does not require servers to generate existential conclusions from
    >> 'implicit' knowledge in order to answer queries.
    >>
    >> I don't understand the last sentence.
    >
    >  It is imprecise, but it was intended to address the point that you
    > originally raised in this  discussion. I just wanted to emphasize that
    > we are not expecting that every logical entailment will be reflected
    > in a possible answer.
    >
    >>
    >> Variables in queries may be designated as "must bind" or "may bind"
    >> variables. Answers are required to provide bindings for all "must
    >> bind" variables, and may provide bindings for "may bind" variables.
    >> Queries may optionally provide or require information about the
    >> knowledge based used to answer the query and impose constraints on
    >> the dynamics of the answering process. Answers provided by the
    >> server must conform to these requirements, but a server may restrict
    >> its answers to certain classes of query pattern,  to a certain class
    >> of knowledge bases, or to a limited range of bindings.
    >>
    >> A single query may have zero or more answers. The set of all answers
    >> provided by the server in response to a query is called the
    >> "response set" of that query. Not all the answers in the response
    >> set need be produced at once: in general, answers will be delivered
    >> in groups. A query may specify an upper bound on the number of
    >> answers that are delivered in a single group.
    >>
    >> The set of DAML assertions which are used by the server in answering
    >> a query is referred to as the "answer KB". This may be an actual
    >> knowledge base (or a finite set of knowledge bases) or it may be a
    >> virtual entity representing the total information available to the
    >> server at the time of answering; however, all servers are required
    >> to be able to provide a reference to a resource representing the
    >> answer KB. We will assume that such a reference to an answer KB has
    >> the form of a uriref; in many cases this may be a URL which can be
    >> used to access the KB or communicate with the server, but this is
    >> not required. A DQL query contains an "answer KB expression" which
    >> is either a variable or a reference to a KB. If the answer KB
    >> expression in a query is a reference to a KB, then all answer
    >> sentences of answers in the response set must be entailed by that
    >> KB. If it is a variable, then the server is free to select or to
    >> generate an answer KB, but if the variable is "must bind" then the
    >> answer must provide a binding to this variable which references the
    >> answer KB.
    >>
    >> DQL specifies a core set of protocol elements that are to be used by
    >> a client to obtain query answers from a server.  Specifically, DQL
    >> specifies that a client initiates a query-answering dialogue with a
    >> server by sending the server a DQL query.  The server is expected to
    >> respond by sending answers to the client one or more at a time along
    >> with a server continuation that is either a process handle which the
    >> client can use to request additional answers or a token indicating
    >> that the server will not provide any more answers to the query. A
    >> process handle is an atomic entity with no internal structure
    >> visible to the client; its role is only to allow the server to
    >> record the state of its answer search. The token can be 'none',
    >> meaning that the server is claiming that there are no further
    >> answers entailed by the answer KB, or 'end', meaning that the server
    >> is making no claims as to whether there are more answers entailed by
    >> the answer KB. Other token values may be allowed, but in all cases
    >> it is required that a token be clearly distinguishable from a
    >> process handle.  No attempt is made here to specify a complete
    >> inter-agent protocol (e.g., with provisions for time-outs, error
    >> handling, resource budgets, etc.).  Query answering servers are
    >> required to support the specified core protocol elements and are not
    >> constrained by the DQL specification as to how additional protocol
    >> functionality is provided.
    >
    >> 2. Detailed specification
    >
    >> The client initiates a dialog with the server by sending a query.
    >> The typical response is a bundle of answers plus a server
    >> continuation which can be send back by the client to the server. On
    >> receiving a server continuation, the server responds similarly until
    >> the continuation in the response is a termination token. The set of
    >> all answers in all groups sent from the server to the client between
    >> the query and the termination token is the response set of the
    >> query.
    >
    >> Query
    >
    >> A DQL query necessarily includes:
    >
    >>    * a query pattern, which is a collection of DAML+OIL sentences in
    >>      which some of the literals and urirefs have been replaced by
    >>      variables;
    >>    * an answer KB pattern, which is either a single variable or a
    >>      reference to a KB;
    >>    * It isn't clear how this is compatible with allowing the answer
    >>      KB to be a finite set of KBs, as promised above.
    >>    * an indication of which of the variables in the patterns are
    >>      "must bind" or "may bind" variables. No variable can be both
    >>      "must bind" and "may bind".
    >>
    >> A DQL query may also optionally include:
    >
    >>    * a query premise, which is either a DAML+OIL KB or a reference
    >>      to a KB.  When a query premise is specified, the sentences in
    >>      the query premise are considered to be included in the answer
    >>      KB. This option is intended to facilitate if-then queries while
    >>      still remaining within the expressiveness of DAML+OIL.
    >>      Omitting the query premise is equivalent to providing an empty
    >>      query premise.
    >>    * a justification request. A DQL query can optionally include a
    >>      request for a justification for each query answer.  (This
    >>      option is noted here for future reference but no further
    >>      details are provided, and servers may ignore this part of a
    >>      query.  The content and structure of a justification for a
    >>      query answer has not yet been determined.  The intent is to
    >>      specify various types of justifications that can be requested
    >>      in a query. Examples of justification range from the set of
    >>      sentences used to derive the answer ('set of support') to a
    >>      complete proof or derivation of the answer in some
    >>      proof-theoretic framework.)
    >>    * an answer bundle size bound, which is a positive nonzero
    >>      integer. Omitting the answer bundle size bound effectively sets
    >>      it to infinity.
    >>
    >> Answer
    >
    >> An answer to a query must contain:
    >
    >>    * a binding of a uriref or a literal to each of the "must bind"
    >>      and zero or more of the "may bind" variables which satisfies
    >>      the following:
    >>
    >>                1.A variable in the answer KB pattern
    >>                is bound to a reference to the answer
    >>                KB;
    >>
    >>                2.All variables in the query pattern
    >>                are bound to terms which occur in the
    >>                DAML+OIL language or in the answer KB;
    >>
    >>                3.The answer KB entails the answer
    >>                sentence got by replacing all
    >>                variables in the query pattern which
    >>                are bound in the answer by their
    >>                bindings, and replacing all other
    >>                variables by new RDF blank nodes.
    >>
    >>    *
    >>    * Again, we went from collection of sentences in the query to
    >>      sentence in the specification of the answer.
    >>
    >  Right. I guess I have gotten too familiar with the usual convention
    > whereby a collection of sentences is considered to be a conjunction
    > (which is a sentence). THis is so automatic that I often don't notice
    > it, but we should be more careful.
    >
    >>    * I'm also rather concerned about the use of RDF blank nodes in
    >>      this context.
    >>
    >  Well, I guess I was thinking of that as synonymous with 'existential
    > variable' but expressed in RDF-friendly terminology, is all. BUt I now
    > agree that it is not adequate and should be changed.
    >
    >>    * This may be OK if we are thinking of the query as an RDF graph
    >>      (which we didn't make clear up to now)
    >>
    >  BUt it is clear from other parts of the DAML spec, right?
    >
    >>    * , so a single variable gets replaced by a single blank node,
    >>      but if it is a collection of XML serialised triples
    >>
    >  That would just be a mistake, since DAML is defined to be RDF and RDF
    > is defined to be the RDF graph. But maybe we should be more explicit
    > about this.
    >
    >>    * , say, we need to be sure that the a given variable is always
    >>      replaced with the same new blank node or we will loose the
    >>      co-reference constraint on answers.
    >>
    >  This is one reason why RDF/XML can't be used as a reference language,
    > by the way.
    >
    >>    *  (Also, by using RDF blank nodes, aren't we precluding the case
    >>      where a variable corresponds to a property?
    >>
    >  Ah, good point. OK, lets not refer to blank nodes at all :-)
    >
    >>    *  Is this deliberate?) All in all, I think we need to be rather
    >>      more precise at this point.
    >>
    >>
    >>      Here is a rough cut at a MT style of defining what constitutes
    >>      a valid answer binding if we consider a query to be a KB:
    >>
    >>      Let K be the answer KB, U the set of unirefs and literals
    >>      occurring in K, Q a query KB in which some unirefs and literals
    >>      have been replaced with variables, V the set of variables in Q,
    >>      Vm (a subset of V) the set of must-bind variables in Q, B a
    >>      binding that maps every element of Vm to an element of U and
    >>      zero or more elements of V-Vm to elements of U, and B(Q) the KB
    >>      that results from applying the binding B to the KB Q. A model I
    >>      of K satisfies B(Q) if the interpretation function can be
    >>      extended to any remaining variables in B(Q) in such a way that
    >>      I is a model of B(Q). K entails B(Q) if every model I of K
    >>      satisfies B(Q).
    >>
    >  OK, though I think it can be made more readable: Suppose Q is a query
    > pattern, ie a KB in which some urirefs and/or literals have been
    > replaced by variables. A binding for Q is a lexical mapping which
    > associates a uriref or literal  in the answer KB to every must-bind
    > variable and possibly to some of the other variables in Q. We write
    > Q(B) to refer to the KB got by applying the binding mapping B to Q, ie
    > substituting B(v) for every variable v which occurs in B. B(Q) may
    > contains some variables from Q which are not replaced by B; these are
    > called remaining variables. An interpretation I satisfies B(Q) if
    > there is a mapping C from the remaining variables of B(Q) to the
    > universe of I such that I+C satisfies B(Q); that is, if the
    > interpretation can be extended to provide interpretations of the
    > remaining variables in some way which makes B(Q) true. Then, in the
    > usual way, we say that the answer KB entails B(Q) just in case B(Q) is
    > true in every interpretation which makes the answer KB true.
    > Intuitively, this means that the remaining variables are treated as
    > existential 'blanks', which indicate that something exists without
    > saying what it is.
    >
    >>    *
    >>    * the query to which it is the answer;
    >>    * a reference to the server which produced the answer.
    >>
    >> In addition, an answer may contain:
    >
    >>    * An answer justification.
    >>
    >> Answer bundle
    >
    >> An answer bundle is a finite set of answers plus a tag consisting of
    >> either a server continuationor one or more termination tokens. The
    >> number of answers in an answer bundle given in response to a query
    >> must not exceed the answer bundle size bound in the query, if
    >> present.
    >>
    >> Server continuations are atomic entities with no internal structure
    >> visible to the client. A server must continue the answering process
    >> when sent a server continuation by sending back another answer
    >> bundle. Servers should encode sufficient information in the server
    >> continuation to enable them to continue the answering dialog even if
    >> they have been engaged in other activities since sending the
    >> previous answer bundle. Every dialog should terminate after a finite
    >> number of exchanges of server continuations and answer bundles
    >> between the client and server. A dialog is said to have terminated
    >> when the server sends a bundle containing a termination token. If a
    >> server is sent a termination token as a server continuation, the
    >> server should reply with a bundle containing no answers and the same
    >> termination token, thereby terminating the dialog.
    >>
    >> Clients must be able to distinguish termination tokens from server
    >> continuations. Termination tokens may be used to convey information
    >> about the response set; in particular, two termination tokens have
    >> fixed meanings. The token "end" simply means that the server is
    >> unable to deliver any further answers, but makes no claim of
    >> completeness. The token "none" indicates that the answer KB does not
    >> entail any other answers not in the response set. We note that the
    >> use of the "none" token should be restricted to those cases where
    >> the server is able to make a positive affirmation that no other
    >> answers exist, i.e., to provide a guarantee that there are no other
    >> possible bindings to the query variables which would produce an
    >> answer sentence that would be entailed by the answer KB. Other
    >> termination tags may also be used, but the "end" tag is the
    >> recommended way to indicate termination of a question-answering
    >> dialog.
    >>
    >> We said above that a bundle ends with a single token; here with one
    >> or more tokens. Which do we mean?
    >
    >
    > One or more. Sorry, that was just a slip.
    >
    >> I would say just one token, but maybe you are thinking of cases
    >> where you want/need more. If only one, then we can't say that "end"
    >> is the recommended terminator. If we allow more than one, what would
    >> it mean if I say "none end" as opposed to "end none"?
    >
    >  Order is immaterial. We should say that explicitly.
    >
    >>
    >> There is no provision in DQL for a query to indicate an upper bound
    >> on the total number of answers in a dialog, but a client can
    >> terminate a question-answering dialog at any time by sending the
    >> "end" token as a server continuation, or simply by not requesting
    >> any further continuations.
    >
    >>
    >
    >> Response Set
    >
    >> While there are no global requirements on a response set other than
    >> that all its members are correct answers, it is recommended that
    >> servers ensure that answer bundles do not contain duplicate or
    >> redundant answers, i.e. answers which are subsumed by other
    >> answers.  One answer subsumes another if its bindings are a superset
    >> of the bindings in the other answer.  Servers which are able to
    >> guarantee that their response sets contain no duplicate answers can
    >> be called "non-repeating". Servers which are able to guarantee that
    >> their response sets contain no duplicate or redundant answers can be
    >> called "terse" or "non-verbose".  Servers which are able to
    >> guarantee that their response sets will be correctly terminated with
    >> "none" can be called "complete".
    >>
    >> The answer set of a query is the largest set of answers which are
    >> entailed by the answer KB and none of which are entailed by any
    >> other answer. Notice that this definition is semantic rather than
    >> operational. A complete server is one whose response set contains
    >> the answer set of the query. A terse complete server is one whose
    >> response set is precisely the answer set of the query.
    >>
    >> It may be impossible to implement a server that can guarantee to be
    >> terse and complete for all KBs and query patterns.
    >
    >> 3. Other Issues
    >
    >> Restricted query patterns
    >
    >> The specification of query pattern above allows for arbitrary
    >> patterns of variable replacement in DAML expressions. Particular
    >> servers, however, may restrict themselves to particular such query
    >> patterns, or provide guarantees of giving meaningful answers only
    >> when given particular kinds of query pattern. To allow for such
    >> cases we introduce the notion of a query class, defined simply as a
    >> class of patterns. Server specifications may refer to any
    >> well-defined query pattern restriction and define their performance,
    >> in the terms of this standard, to the case where all references to a
    >> query pattern are understood to refer only to query patterns in that
    >> class.
    >>
    >> If a server accepts only query patterns in a certain class, it is
    >> said to "apply to" that class; the notions of completeness and
    >> terseness may also be relativized to queries of a certain class,
    >> when stating the conformance of a server to this specification.
    >>
    >> For example, one class of query patterns might be those of one of
    >> the forms:
    >>
    >> ?x rdf:type CCC .
    >> ?x daml:subClassOf ?y .
    >>
    >> where CCC is some DAML class expression, or of the form
    >>
    >> ?x PPP ?y .
    >>
    >> where PPP is some DAML property expression other than those used in
    >> RDF(S) or DAML+OIL syntax.
    >>
    >> When performance is defined relative to a query class, the
    >> termination token 'none' should not be used to mean 'none relative
    >> to the class'. To avoid confusion, the use of other termination
    >> tokens, each with a meaning defined relative to the particular
    >> class, is required.
    >>
    >> Future versions of this specification may define particular query
    >> classes and corresponding termination tokens.
    >
    >> "How Many" Queries
    >
    >> The language and protocol contains no explicit constructs for asking
    >> how many (or how many more) answers there are to a given query.
    >> Defining what is meant by "how many" is problematic in that there
    >> can be multiple bindings for a given distinguished variable that all
    >> denote the same object in the domain of discourse, so that how many
    >> answer bindings there are for a given distinguished variable will in
    >> general differ from how many answer objects in the domain of
    >> discourse that variable can denote.  The core protocol could
    >> reasonably be extended to support "how many" queries, where "how
    >> many" means how many answers containing distinct sets of bindings
    >> can the server produce.  The difficulty of a server determining how
    >> many answers it can produce to a query without actually producing
    >> the answers has been the primary rationale for not including a "how
    >> many" construct in the query language.
    >
    >> Inability to respond to queries
    >
    >> Servers are not required to deliver answers to queries. Under
    >> various circumstances, a query may be phrased in a form which makes
    >> it impossible for a server to respond with any answers; for example,
    >> if the query specifies an answer KB which the server is unable to
    >> access or use, or where no bindings are available for "must bind"
    >> variables. Under these circumstances, the server should terminate
    >> the dialog with a bundle containing no answers and an appropriate
    >> termination tag or tags, one of which should be 'end'.
    >
    >>
    >>
    >>
    >> -----
    >
    >
    >
    > --
    >
    > ---------------------------------------------------------------------
    > IHMC(850)434 8903   home
    > 40 South Alcaniz St.(850)202 4416   office
    > Pensacola,  FL 32501(850)202 4440   fax
    > phayes@ai.uwf.edu    http://www.coginst.uwf.edu/~phayes
    
    --
     Deborah L. McGuinness
     Knowledge Systems Laboratory
     Gates Computer Science Building, 2A Room 241
     Stanford University, Stanford, CA 94305-9020
     email: dlm@ksl.stanford.edu
     URL: http://ksl.stanford.edu/people/dlm/index.html
     (voice) 650 723 9770    (stanford fax) 650 725 5850   (computer fax)
    801 705 0941
    
    
    
    



    This archive was generated by hypermail 2.1.4 : 08/20/02 EDT