From: Deborah McGuinness ([email protected])
Date: 08/20/02
this was the document that i found when i was hunting yesterday.
the file was called 6-24-02 but had an embedded date of may 24
Ian Horrocks wrote:
>> Content-Type: text/plain; charset=us-ascii
>> Content-Description: message body text
>> Content-Transfer-Encoding: 7bit
>>
>> On July 12, pat hayes writes:
>> > This is here in plain text and also as attached HTML. Sorry it
>> took so long.
>>
>> Here is an updated version with some minor errors corrected and some
>>
>> questions regarding trickier issues. The main point is that I think
>> we
>> need a bit more precision w.r.t. what constitutes an answer. I
>> suggested a MT style definition. Let me know what you think. I have
>> highlighted comments and changes - I have found this to be a useful
>> technique for cooperative working on html documents.
>
>> I'm afraid I didn't bother with the plain text. Sorry it took so
>> long.
>
>
>
>>
>> Regards, Ian
>
>> Informal DQL Specification
>
>> DAML Joint Committee
>
>> Richard Fikes, Pat Hayes, Ian Horrocks, editors
>
>> June 12, 2002
>
>> 1. Overview
>
>> DQL is a formal language and protocol for posing queries from a
>> querying agent (which we refer to as the "client") to an answering
>> agent (which we refer to as the "server"). A DQL query contains a
>> "query pattern" that is a collection of DAML+OIL sentences in which
>> some literals and/or urirefs have been replaced by variables. An
>> answer to the query provides bindings of terms to some of these
>> variables such that the conjunction of the answer sentences -
>> produced by applying the bindings to the query pattern and
>> considering the remaining variables in the query pattern to be
>> existentially quantified - is entailed by a knowledge base called
>> the "answer KB".
>>
>> 1. You use "sentences" above. Is this wise? What is the difference
>> between these and the "assertions" you refer to below?
>
> None. Maybe it would be better to use 'expressions', but that is
> capable of being misunderstood, since eg terms are expressions. I was
> using 'sentence' in the usual logical sense of 'expression with a
> truthvalue'. You are right that I have been sloppy with collection of
> sentences versus conjunction of sentences. We should clarify that. I
> think it would be less confusing if we just said that collection of
> sentences in a query is treated as a conjunction, and then to assume
> that all queries are single sentences.
>
>> I'm not sure why we don't say that a query is a DAML+OIL KB where
>> some of literals and/or urirefs have been replaced by variables?
>
> Sure, OK. I guess it seems odd to call a query a knowledge base,
> which is why I prefer to use a more neutral term like 'sentence'. But
> whatever....
>
>> 2. The use of "collection" is a bit vague. When you say the answer
>> sentence is entailed, do you mean the conjunction of answer
>> sentences (as I wrote above), or each of them independently?
>
> Each independently. That is, each answer should be entailed by the
> server KB.
>
>> There is obviously a big difference in the case where they share
>> variables and/or constants.
>
> I didnt anticipate that two answers would ever share a variable. How
> could that happen? Oh, wait. I see your point. Right, I meant that the
> query pattern with the answer binding is entailed. If the pattern is a
> conjunction, then of course its the conjunction that is entailed.
> Sorry about that confusion.
>
>> 3. We really need to make it clear that the variables are
>> existentially quantified at the outer level.
>
> I thought that was clear, but feel free to change the wording.
>
>> A formal definition would be useful (if not here, then in the
>> detailed spec below), but I'm not sure how to do it in HTML.
>
>
>
>>
>> Each binding in a query answer is a uri-ref or a literal that either
>> explicitly occurs as a term in the answer KB or is a term in
>> DAML+OIL. That is, DQL is designed for answering queries of the
>> form "What uri-refs and literals from the answer KB and DAML+OIL
>> denote objects that make the query pattern true?" We note that this
>> does not require servers to generate existential conclusions from
>> 'implicit' knowledge in order to answer queries.
>>
>> I don't understand the last sentence.
>
> It is imprecise, but it was intended to address the point that you
> originally raised in this discussion. I just wanted to emphasize that
> we are not expecting that every logical entailment will be reflected
> in a possible answer.
>
>>
>> Variables in queries may be designated as "must bind" or "may bind"
>> variables. Answers are required to provide bindings for all "must
>> bind" variables, and may provide bindings for "may bind" variables.
>> Queries may optionally provide or require information about the
>> knowledge based used to answer the query and impose constraints on
>> the dynamics of the answering process. Answers provided by the
>> server must conform to these requirements, but a server may restrict
>> its answers to certain classes of query pattern, to a certain class
>> of knowledge bases, or to a limited range of bindings.
>>
>> A single query may have zero or more answers. The set of all answers
>> provided by the server in response to a query is called the
>> "response set" of that query. Not all the answers in the response
>> set need be produced at once: in general, answers will be delivered
>> in groups. A query may specify an upper bound on the number of
>> answers that are delivered in a single group.
>>
>> The set of DAML assertions which are used by the server in answering
>> a query is referred to as the "answer KB". This may be an actual
>> knowledge base (or a finite set of knowledge bases) or it may be a
>> virtual entity representing the total information available to the
>> server at the time of answering; however, all servers are required
>> to be able to provide a reference to a resource representing the
>> answer KB. We will assume that such a reference to an answer KB has
>> the form of a uriref; in many cases this may be a URL which can be
>> used to access the KB or communicate with the server, but this is
>> not required. A DQL query contains an "answer KB expression" which
>> is either a variable or a reference to a KB. If the answer KB
>> expression in a query is a reference to a KB, then all answer
>> sentences of answers in the response set must be entailed by that
>> KB. If it is a variable, then the server is free to select or to
>> generate an answer KB, but if the variable is "must bind" then the
>> answer must provide a binding to this variable which references the
>> answer KB.
>>
>> DQL specifies a core set of protocol elements that are to be used by
>> a client to obtain query answers from a server. Specifically, DQL
>> specifies that a client initiates a query-answering dialogue with a
>> server by sending the server a DQL query. The server is expected to
>> respond by sending answers to the client one or more at a time along
>> with a server continuation that is either a process handle which the
>> client can use to request additional answers or a token indicating
>> that the server will not provide any more answers to the query. A
>> process handle is an atomic entity with no internal structure
>> visible to the client; its role is only to allow the server to
>> record the state of its answer search. The token can be 'none',
>> meaning that the server is claiming that there are no further
>> answers entailed by the answer KB, or 'end', meaning that the server
>> is making no claims as to whether there are more answers entailed by
>> the answer KB. Other token values may be allowed, but in all cases
>> it is required that a token be clearly distinguishable from a
>> process handle. No attempt is made here to specify a complete
>> inter-agent protocol (e.g., with provisions for time-outs, error
>> handling, resource budgets, etc.). Query answering servers are
>> required to support the specified core protocol elements and are not
>> constrained by the DQL specification as to how additional protocol
>> functionality is provided.
>
>> 2. Detailed specification
>
>> The client initiates a dialog with the server by sending a query.
>> The typical response is a bundle of answers plus a server
>> continuation which can be send back by the client to the server. On
>> receiving a server continuation, the server responds similarly until
>> the continuation in the response is a termination token. The set of
>> all answers in all groups sent from the server to the client between
>> the query and the termination token is the response set of the
>> query.
>
>> Query
>
>> A DQL query necessarily includes:
>
>> * a query pattern, which is a collection of DAML+OIL sentences in
>> which some of the literals and urirefs have been replaced by
>> variables;
>> * an answer KB pattern, which is either a single variable or a
>> reference to a KB;
>> * It isn't clear how this is compatible with allowing the answer
>> KB to be a finite set of KBs, as promised above.
>> * an indication of which of the variables in the patterns are
>> "must bind" or "may bind" variables. No variable can be both
>> "must bind" and "may bind".
>>
>> A DQL query may also optionally include:
>
>> * a query premise, which is either a DAML+OIL KB or a reference
>> to a KB. When a query premise is specified, the sentences in
>> the query premise are considered to be included in the answer
>> KB. This option is intended to facilitate if-then queries while
>> still remaining within the expressiveness of DAML+OIL.
>> Omitting the query premise is equivalent to providing an empty
>> query premise.
>> * a justification request. A DQL query can optionally include a
>> request for a justification for each query answer. (This
>> option is noted here for future reference but no further
>> details are provided, and servers may ignore this part of a
>> query. The content and structure of a justification for a
>> query answer has not yet been determined. The intent is to
>> specify various types of justifications that can be requested
>> in a query. Examples of justification range from the set of
>> sentences used to derive the answer ('set of support') to a
>> complete proof or derivation of the answer in some
>> proof-theoretic framework.)
>> * an answer bundle size bound, which is a positive nonzero
>> integer. Omitting the answer bundle size bound effectively sets
>> it to infinity.
>>
>> Answer
>
>> An answer to a query must contain:
>
>> * a binding of a uriref or a literal to each of the "must bind"
>> and zero or more of the "may bind" variables which satisfies
>> the following:
>>
>> 1.A variable in the answer KB pattern
>> is bound to a reference to the answer
>> KB;
>>
>> 2.All variables in the query pattern
>> are bound to terms which occur in the
>> DAML+OIL language or in the answer KB;
>>
>> 3.The answer KB entails the answer
>> sentence got by replacing all
>> variables in the query pattern which
>> are bound in the answer by their
>> bindings, and replacing all other
>> variables by new RDF blank nodes.
>>
>> *
>> * Again, we went from collection of sentences in the query to
>> sentence in the specification of the answer.
>>
> Right. I guess I have gotten too familiar with the usual convention
> whereby a collection of sentences is considered to be a conjunction
> (which is a sentence). THis is so automatic that I often don't notice
> it, but we should be more careful.
>
>> * I'm also rather concerned about the use of RDF blank nodes in
>> this context.
>>
> Well, I guess I was thinking of that as synonymous with 'existential
> variable' but expressed in RDF-friendly terminology, is all. BUt I now
> agree that it is not adequate and should be changed.
>
>> * This may be OK if we are thinking of the query as an RDF graph
>> (which we didn't make clear up to now)
>>
> BUt it is clear from other parts of the DAML spec, right?
>
>> * , so a single variable gets replaced by a single blank node,
>> but if it is a collection of XML serialised triples
>>
> That would just be a mistake, since DAML is defined to be RDF and RDF
> is defined to be the RDF graph. But maybe we should be more explicit
> about this.
>
>> * , say, we need to be sure that the a given variable is always
>> replaced with the same new blank node or we will loose the
>> co-reference constraint on answers.
>>
> This is one reason why RDF/XML can't be used as a reference language,
> by the way.
>
>> * (Also, by using RDF blank nodes, aren't we precluding the case
>> where a variable corresponds to a property?
>>
> Ah, good point. OK, lets not refer to blank nodes at all :-)
>
>> * Is this deliberate?) All in all, I think we need to be rather
>> more precise at this point.
>>
>>
>> Here is a rough cut at a MT style of defining what constitutes
>> a valid answer binding if we consider a query to be a KB:
>>
>> Let K be the answer KB, U the set of unirefs and literals
>> occurring in K, Q a query KB in which some unirefs and literals
>> have been replaced with variables, V the set of variables in Q,
>> Vm (a subset of V) the set of must-bind variables in Q, B a
>> binding that maps every element of Vm to an element of U and
>> zero or more elements of V-Vm to elements of U, and B(Q) the KB
>> that results from applying the binding B to the KB Q. A model I
>> of K satisfies B(Q) if the interpretation function can be
>> extended to any remaining variables in B(Q) in such a way that
>> I is a model of B(Q). K entails B(Q) if every model I of K
>> satisfies B(Q).
>>
> OK, though I think it can be made more readable: Suppose Q is a query
> pattern, ie a KB in which some urirefs and/or literals have been
> replaced by variables. A binding for Q is a lexical mapping which
> associates a uriref or literal in the answer KB to every must-bind
> variable and possibly to some of the other variables in Q. We write
> Q(B) to refer to the KB got by applying the binding mapping B to Q, ie
> substituting B(v) for every variable v which occurs in B. B(Q) may
> contains some variables from Q which are not replaced by B; these are
> called remaining variables. An interpretation I satisfies B(Q) if
> there is a mapping C from the remaining variables of B(Q) to the
> universe of I such that I+C satisfies B(Q); that is, if the
> interpretation can be extended to provide interpretations of the
> remaining variables in some way which makes B(Q) true. Then, in the
> usual way, we say that the answer KB entails B(Q) just in case B(Q) is
> true in every interpretation which makes the answer KB true.
> Intuitively, this means that the remaining variables are treated as
> existential 'blanks', which indicate that something exists without
> saying what it is.
>
>> *
>> * the query to which it is the answer;
>> * a reference to the server which produced the answer.
>>
>> In addition, an answer may contain:
>
>> * An answer justification.
>>
>> Answer bundle
>
>> An answer bundle is a finite set of answers plus a tag consisting of
>> either a server continuationor one or more termination tokens. The
>> number of answers in an answer bundle given in response to a query
>> must not exceed the answer bundle size bound in the query, if
>> present.
>>
>> Server continuations are atomic entities with no internal structure
>> visible to the client. A server must continue the answering process
>> when sent a server continuation by sending back another answer
>> bundle. Servers should encode sufficient information in the server
>> continuation to enable them to continue the answering dialog even if
>> they have been engaged in other activities since sending the
>> previous answer bundle. Every dialog should terminate after a finite
>> number of exchanges of server continuations and answer bundles
>> between the client and server. A dialog is said to have terminated
>> when the server sends a bundle containing a termination token. If a
>> server is sent a termination token as a server continuation, the
>> server should reply with a bundle containing no answers and the same
>> termination token, thereby terminating the dialog.
>>
>> Clients must be able to distinguish termination tokens from server
>> continuations. Termination tokens may be used to convey information
>> about the response set; in particular, two termination tokens have
>> fixed meanings. The token "end" simply means that the server is
>> unable to deliver any further answers, but makes no claim of
>> completeness. The token "none" indicates that the answer KB does not
>> entail any other answers not in the response set. We note that the
>> use of the "none" token should be restricted to those cases where
>> the server is able to make a positive affirmation that no other
>> answers exist, i.e., to provide a guarantee that there are no other
>> possible bindings to the query variables which would produce an
>> answer sentence that would be entailed by the answer KB. Other
>> termination tags may also be used, but the "end" tag is the
>> recommended way to indicate termination of a question-answering
>> dialog.
>>
>> We said above that a bundle ends with a single token; here with one
>> or more tokens. Which do we mean?
>
>
> One or more. Sorry, that was just a slip.
>
>> I would say just one token, but maybe you are thinking of cases
>> where you want/need more. If only one, then we can't say that "end"
>> is the recommended terminator. If we allow more than one, what would
>> it mean if I say "none end" as opposed to "end none"?
>
> Order is immaterial. We should say that explicitly.
>
>>
>> There is no provision in DQL for a query to indicate an upper bound
>> on the total number of answers in a dialog, but a client can
>> terminate a question-answering dialog at any time by sending the
>> "end" token as a server continuation, or simply by not requesting
>> any further continuations.
>
>>
>
>> Response Set
>
>> While there are no global requirements on a response set other than
>> that all its members are correct answers, it is recommended that
>> servers ensure that answer bundles do not contain duplicate or
>> redundant answers, i.e. answers which are subsumed by other
>> answers. One answer subsumes another if its bindings are a superset
>> of the bindings in the other answer. Servers which are able to
>> guarantee that their response sets contain no duplicate answers can
>> be called "non-repeating". Servers which are able to guarantee that
>> their response sets contain no duplicate or redundant answers can be
>> called "terse" or "non-verbose". Servers which are able to
>> guarantee that their response sets will be correctly terminated with
>> "none" can be called "complete".
>>
>> The answer set of a query is the largest set of answers which are
>> entailed by the answer KB and none of which are entailed by any
>> other answer. Notice that this definition is semantic rather than
>> operational. A complete server is one whose response set contains
>> the answer set of the query. A terse complete server is one whose
>> response set is precisely the answer set of the query.
>>
>> It may be impossible to implement a server that can guarantee to be
>> terse and complete for all KBs and query patterns.
>
>> 3. Other Issues
>
>> Restricted query patterns
>
>> The specification of query pattern above allows for arbitrary
>> patterns of variable replacement in DAML expressions. Particular
>> servers, however, may restrict themselves to particular such query
>> patterns, or provide guarantees of giving meaningful answers only
>> when given particular kinds of query pattern. To allow for such
>> cases we introduce the notion of a query class, defined simply as a
>> class of patterns. Server specifications may refer to any
>> well-defined query pattern restriction and define their performance,
>> in the terms of this standard, to the case where all references to a
>> query pattern are understood to refer only to query patterns in that
>> class.
>>
>> If a server accepts only query patterns in a certain class, it is
>> said to "apply to" that class; the notions of completeness and
>> terseness may also be relativized to queries of a certain class,
>> when stating the conformance of a server to this specification.
>>
>> For example, one class of query patterns might be those of one of
>> the forms:
>>
>> ?x rdf:type CCC .
>> ?x daml:subClassOf ?y .
>>
>> where CCC is some DAML class expression, or of the form
>>
>> ?x PPP ?y .
>>
>> where PPP is some DAML property expression other than those used in
>> RDF(S) or DAML+OIL syntax.
>>
>> When performance is defined relative to a query class, the
>> termination token 'none' should not be used to mean 'none relative
>> to the class'. To avoid confusion, the use of other termination
>> tokens, each with a meaning defined relative to the particular
>> class, is required.
>>
>> Future versions of this specification may define particular query
>> classes and corresponding termination tokens.
>
>> "How Many" Queries
>
>> The language and protocol contains no explicit constructs for asking
>> how many (or how many more) answers there are to a given query.
>> Defining what is meant by "how many" is problematic in that there
>> can be multiple bindings for a given distinguished variable that all
>> denote the same object in the domain of discourse, so that how many
>> answer bindings there are for a given distinguished variable will in
>> general differ from how many answer objects in the domain of
>> discourse that variable can denote. The core protocol could
>> reasonably be extended to support "how many" queries, where "how
>> many" means how many answers containing distinct sets of bindings
>> can the server produce. The difficulty of a server determining how
>> many answers it can produce to a query without actually producing
>> the answers has been the primary rationale for not including a "how
>> many" construct in the query language.
>
>> Inability to respond to queries
>
>> Servers are not required to deliver answers to queries. Under
>> various circumstances, a query may be phrased in a form which makes
>> it impossible for a server to respond with any answers; for example,
>> if the query specifies an answer KB which the server is unable to
>> access or use, or where no bindings are available for "must bind"
>> variables. Under these circumstances, the server should terminate
>> the dialog with a bundle containing no answers and an appropriate
>> termination tag or tags, one of which should be 'end'.
>
>>
>>
>>
>> -----
>
>
>
> --
>
> ---------------------------------------------------------------------
> IHMC(850)434 8903 home
> 40 South Alcaniz St.(850)202 4416 office
> Pensacola, FL 32501(850)202 4440 fax
> [email protected] http://www.coginst.uwf.edu/~phayes
--
Deborah L. McGuinness
Knowledge Systems Laboratory
Gates Computer Science Building, 2A Room 241
Stanford University, Stanford, CA 94305-9020
email: [email protected]
URL: http://ksl.stanford.edu/people/dlm/index.html
(voice) 650 723 9770 (stanford fax) 650 725 5850 (computer fax)
801 705 0941
This archive was generated by hypermail 2.1.4 : 08/20/02 EDT