March 2003

Abstract Specification[1]

DAML Joint Committee [2]

Richard Fikes, Pat Hayes, Ian Horrocks, editors

Overview

DAML Query Language (DQL) is a formal language and protocol for a querying agent and an answering agent to use in conducting a query-answering dialogue using knowledge represented in DAML+OIL. A DQL query contains a query pattern that is a collection of DAML+OIL sentences in which some literals and/or URIrefs have been replaced by variables. A query answer provides bindings of terms to some of these variables such that the conjunction[3] of the answer sentences - produced by applying the bindings to the query pattern and considering the remaining variables in the query pattern to be existentially quantified - is entailed by a knowledge base (KB) called the answer KB.

Each binding in a query answer is a URIref or a literal that either explicitly occurs as a term in the answer KB or is a term in DAML+OIL. That is, DQL is designed for answering queries of the form "What URIrefs and literals from the answer KB and DAML+OIL denote objects that make the query pattern true?”

Variables in queries may be designated as must-bind or may-bind variables (but not both). Answers are required to provide bindings for all must-bind variables, and may provide bindings for may-bind variables. Variables in queries that are not designated as either must-bind or may-bind variables are don’t-bind variables and are not to have bindings in query answers. Answers provided by the server must conform to the requirements in this DQL specification, but a server may restrict its answers to certain classes of query patterns, to certain classes of knowledge bases, or to a limited range of bindings.

A query may have zero or more answers. The set of all answers provided by the server in a query-answering dialogue is called the response collection of that dialogue. Not all the answers in the response collection need be produced at once: in general, answers will be delivered in groups, each of which is called an answer bundle. A querying agent may specify throughout a query-answering dialogue an upper bound on the number of answers that are delivered in the next answer bundle.

The set of DAML sentences that are used by the server in answering a query is referred to as the answer KB. This may be an actual knowledge base, a conjunction of actual knowledge bases, or a virtual entity representing the total information available to the server at the time of answering. However, all servers are required to be able to provide a URIref that is a reference to a resource representing the answer KB. In many cases, that URIref will be a URL that can be used to access the KB or to communicate with the server about the KB, but the URIref is not required to be a URL.

A DQL query contains an answer KB pattern that is a KB, a list of KB references, or a variable. If a query’s answer KB pattern is a KB or a reference to a KB, then the conjunction of the answer sentences in each query answer in the response collection must be entailed by that KB. If a query’s answer KB pattern is a list of KB references, then the conjunction of the answer sentences in each query answer in the response collection must be entailed by the conjunction of the KBs referenced in that list. If a query’s answer KB pattern is a variable, then the server is free to select or to generate an answer KB from which to answer the query, but if the variable is a must-bind variable, then the answer must provide a binding to the variable that is a reference to the answer KB.

DQL specifies a core set of protocol elements that are to be used by a querying agent (which we refer to as a client) to obtain query answers from an answering-agent (which we refer to as a server). Specifically, DQL specifies that a client initiates a query-answering dialogue with a server by sending the server a DQL query. The server is expected to respond by sending to the client an answer bundle containing either at least one answer and a process handle which the client can use to request additional answers or zero answers and one or more termination tokens indicating that the server will not provide any more answers to the query and is terminating the query-answering dialogue. A client requests additional answers from the server by sending the server a server continuation containing the process handle from the most recently produced answer bundle. The server is expected to respond to a server continuation by sending to the client that sent the server continuation another answer bundle as before. A client terminates a query-answering dialogue by sending the server a server termination containing the process handle provided by the server in the previously produced answer bundle.

A process handle is an atomic entity with no internal structure visible to the client; its role is only to allow the server to record the state of its answer search so that it can continue the answering dialog even if it has been engaged in other activities since sending the previous answer bundle. A termination token can be 'none', meaning that the server is claiming that there are no further answers entailed by the answer KB, or 'end', meaning that the server is making no claims as to whether there are more answers entailed by the answer KB. Other termination tokens are allowed, but in all cases it is required that a termination token be clearly distinguishable from a process handle.

No attempt is made here to specify a complete inter-agent protocol (e.g., with provisions for time-outs, error handling, resource budgets, etc.). Query answering servers are required to support the specified core protocol elements and are not constrained by the DQL specification as to how additional protocol functionality is provided.

Detailed Specification

Query

A client initiates a query-answering dialogue with a server by sending the server a DQL query. A DQL query necessarily includes:

· A query pattern that represents a collection of DAML+OIL sentences in which some of the literals and URIrefs have been replaced by variables. Formally, a query pattern is a DAML+OIL KB in which some URIrefs and/or literals have been replaced by variables.

· An answer KB pattern that is a KB reference, a list of KB references, or a variable.

· A must-bind variables list and a may-bind variables list. Each item on these lists is a query variable (i.e., a variable that occurs either in the query pattern on in the answer KB pattern), and no query variable can be an item of both lists. Query variables that are on neither of these lists are considered to be don’t-bind variables.

A DQL query may also optionally include:

· An answer pattern that is a DAML+OIL KB in which some URIrefs and/or literals have been replaced by variables. The answer pattern must contain all of the must-bind and may-bind query variables. When there is no answer pattern in a query, the query pattern is considered to be the answer pattern.

· A query premise that is a DAML+OIL KB or a KB reference. When a query premise is specified, the sentences in the query premise are considered to be included in the answer KB. This option is intended to facilitate if-then queries while still remaining within the expressiveness of DAML+OIL. Omitting the query premise is equivalent to providing an empty query premise.

· A justification request. A DQL query can optionally include a request for a justification for each query answer. (This option is noted here for future reference, but no further details are provided, and servers implementing the August 2002 version of DQL may ignore this part of a query.[4])

· An answer bundle size bound that is a positive integer. The first answer bundle produced by the server is to contain no more than this number of answers. Omitting the answer bundle size bound effectively sets it to infinity.

Query Answer

An answer to a query necessarily includes:

· An answer pattern instance that is the answer pattern with each of the must-bind variables and zero or more of the may-bind variables (and none of any other variables that occur in the answer pattern) replaced by a URIref or literal. The answer pattern instance specifies a binding set that must satisfy the following conditions:

1. Each element of the binding set is a lexical mapping that associates a URIref or literal to a query variable.

2. The binding set contains a binding to each of the must-bind query variables, to zero or more of the may-bind query variables, and to none of the don’t-bind query variables;

3. If the binding set contains a binding to a variable that is the answer KB pattern, then the binding is to a reference to the answer KB;

4. All bindings in the binding set to variables in the query pattern are terms that occur in the DAML+OIL language or in the answer KB;

5. Suppose Q is the query pattern for the query of which this is an answer, and B is the subset of the binding set consisting of all the bindings to variables in Q. We write B(Q) to refer to the KB obtained by applying the bindings B to Q, i.e., by substituting the URIref or literal that is associated with v for every variable v that has a binding in B. B(Q) may contain some variables from Q that are not replaced by B; we refer to these variables as remaining variables. An interpretation I satisfies B(Q) if there is a mapping C from the remaining variables of B(Q) to the universe of I such that I+C satisfies B(Q); that is, if the interpretation can be extended to provide interpretations of the remaining variables in some way that makes B(Q) true. Then, in the usual way, we say that the answer KB entails B(Q) just in case B(Q) is true in every interpretation that makes the answer KB true. Intuitively, this means that the remaining variables are treated as existential 'blanks', which indicate that something exists without saying what it is. The condition, then, that must be satisfied by the binding set is that the answer KB entails B(Q).

· The query to which this is an answer.

· A reference to the server that produced the answer.

A query answer may also optionally include:

· An answer justification. (As noted above, this option is here for future reference, but no further details are provided, and servers implementing the August 2002 version of DQL may ignore a justification request in a query.)

Answer Bundle

When a server receives a DQL query or a DQL server continuation from a client, it is expected to respond by sending to the client an answer bundle. An answer bundle necessarily includes:

· An answer collection each of whose elements is a query answer. The answer collection may be empty and must not contain more query answers than the answer bundle size bound that was most recently sent to the server in the query-answering dialogue.

· A process handle or a non-empty termination token set. A dialog is said to have terminated when the server sends a bundle containing a non-empty termination token set. We note that the use of the 'none' termination token (which indicates that the answer KB does not entail any other answers not in the response collection) should be restricted to those cases where the server is able to make a positive affirmation that no other answers exist, i.e., to provide a guarantee that there are no other possible bindings to the query variables which would produce an answer sentence that would be entailed by the answer KB.

Server Continuation

A client requests additional answers to a query from the server by sending the server a server continuation. A server continuation necessarily includes:

· The process handle from the answer bundle most recently produced by the server in this query-answering dialogue. A process handle is an atomic entity with no internal structure visible to the client. Its role is only to allow the server to record the state of its answer search so that it can continue the answering dialog even if it has been engaged in other activities since sending the previous answer bundle.

A server continuation optionally includes:

· An answer bundle size bound that is a positive integer. The next answer bundle produced by the server is to contain no more than this number of answers. Omitting the answer bundle size bound effectively sets it to infinity.

Server Termination

A client terminates a query-answering dialogue by sending the server a server termination. A server termination necessarily includes:

· The process handle from the answer bundle most recently produced by the server in this query-answering dialogue.

Response Collection

While there are no global requirements on a response collection other than that all its members are correct answers, it is recommended that servers ensure that answer bundles do not contain duplicate or redundant answers, i.e., answers which are subsumed by other answers. One answer subsumes another if its bindings are a superset of the bindings in the other answer. Servers that are able to guarantee that their response collections contain no duplicate answers can be called non-repeating. Servers that are able to guarantee that their response collections contain no duplicate or redundant answers can be called terse. Servers that are able to guarantee that their response collections will be correctly terminated with 'none' can be called complete.

Answer Set

The answer set of a query is the largest set of query answers that are entailed by the answer KB and none of which are entailed by any other answer. Notice that this definition is semantic rather than operational. A complete server is one whose response set contains the answer set of the query. A terse complete server is one whose response collection is precisely the answer set of the query.

Other Issues

Restricted Query Patterns

The specification of query pattern above allows for arbitrary patterns of variable replacement in DAML+OIL expressions. Particular servers, however, may restrict themselves to particular such query patterns, or provide guarantees of giving meaningful answers only when given particular kinds of query patterns. To allow for such cases, we introduce the notion of a query class, defined simply as a class of query patterns. Server specifications may refer to any well-defined query pattern restriction and define their performance, in the terms of this specification, to the case where all references to a query pattern are understood to refer only to query patterns in that class.

If a server accepts only query patterns in a certain class, it is said to apply to that class. The notions of completeness and terseness may also be relativized to queries of a certain class when stating the conformance of a server to this specification.

For example, one class of query patterns might be those of one of the forms:

?x rdf:type CCC
?x daml:subClassOf ?y

where CCC is some DAML class expression and ?x and ?y are variables, or of the form

?x PPP ?y

where PPP is some DAML property expression other than those used in RDF(S) or DAML+OIL syntax.

When performance is defined relative to a query class, the termination token 'none' should not be used to mean 'none relative to the class'. To avoid confusion, the use of other termination tokens, each with a meaning defined relative to the particular class, is required.

Future versions of this specification may define particular query classes and corresponding termination tokens.

“How Many” Queries

The language and protocol contains no explicit constructs for asking how many (or how many more) answers there are to a given query. Defining what is meant by “how many” is problematic in that there can be multiple bindings for a given variable that all denote the same object in the domain of discourse, so that how many answer bindings there are for a given distinguished variable will in general differ from how many answer objects in the domain of discourse that variable can denote. The core protocol could reasonably be extended to support “how many” queries, where “how many” means how many answers containing distinct sets of bindings can the server produce. The difficulty of a server determining how many answers it can produce to a query without actually producing the answers has been the primary rationale for not including a “how many” construct in the query language.

Inability to respond to queries

Servers are not required to deliver answers to queries. Under various circumstances, a query may be phrased in a form which makes it impossible for a server to respond with any answers; for example, where the query specifies an answer KB that the server is unable to access or use, or where no bindings are available for must-bind variables. Under these circumstances, the server should terminate the dialog with a bundle containing no answers and an appropriate termination tag or tags, one of which should be 'end'.

[1] Although this specification is intended to provide sufficient information to enable use and analysis of DQL, it does not include a recommendation for a normative external syntax for the language.

[2] The DAML Joint Committee is chaired by Mike Dean, and the current members are Harold Boley, Dan Brickley, Stefan Decker, Richard Fikes, Benjamin Grosof, Frank van Harmelen, Pat Hayes, Jeff Heflin, Ian Horrocks, Ora Lassila, Deb McGuinness, Peter Patel-Schneider, and Lynn Andrea Stein.

[3] We use “conjunction” informally in this Overview section since DAML+OIL does not have a logical connective for conjoining sentences or for conjoining knowledge bases. We consider a conjunction of sentences to be a sentence that is true if and only if all of its conjuncts are true. We consider a conjunction of knowledge bases to be a knowledge base consisting of all the sentences in all the conjunct knowledge bases.

[4] The content and structure of a DQL justification for a query answer has not yet been determined. The intent of the DQL designers is for there to be various types of justifications that can be requested in a query. Examples of justifications range from the set of sentences used to derive the answer ('set of support') to a complete proof or derivation of the answer in some proof-theoretic framework.

Draft Updated DQL Specification

DAML Query Language (DQL)