August 2002

Abstract Specification[1]

DAML Joint Committee [2]

Richard Fikes, Pat Hayes, Ian Horrocks, editors

Overview

DAML Query Language (DQL) is a formal language and protocol for a querying agent and an answering agent to use in conducting a query-answering dialogue using knowledge represented in DAML+OIL. A DQL query contains a query pattern that is a collection of DAML+OIL sentences in which some literals and/or URIrefs have been replaced by variables. A query answer provides bindings of terms to some of these variables such that the conjunction[3] of the answer sentences - produced by applying the bindings to the query pattern and considering the remaining variables in the query pattern to be existentially quantified - is entailed by a knowledge base (KB) called the answer KB.

Each binding in a query answer is a URIref or a literal that either explicitly occurs as a term in the answer KB or is a term in DAML+OIL. That is, DQL is designed for answering queries of the form "What URIrefs and literals from the answer KB and DAML+OIL denote objects that make the query pattern true?”

Variables in queries may be designated as must-bind or may-bind variables (but not both). Answers are required to provide bindings for all must-bind variables, and may provide bindings for may-bind variables. Variables in queries that are not designated as either must-bind or may-bind variables are don’t-bind variables and are not to have bindings in query answers. Answers provided by the server must conform to the requirements in this DQL specification, but a server may restrict its answers to certain classes of query patterns, to certain classes of knowledge bases, or to a limited range of bindings.

A query may have zero or more answers. The set of all answers provided by the server in a query-answering dialogue is called the response set of that dialogue. Not all the answers in the response set need be produced at once: in general, answers will be delivered in groups, each of which is called an answer bundle. A query may specify an upper bound on the number of answers that are delivered in a single answer bundle.

The set of DAML sentences that are used by the server in answering a query is referred to as the answer KB. This may be an actual knowledge base, a conjunction of actual knowledge bases, or a virtual entity representing the total information available to the server at the time of answering. However, all servers are required to be able to provide a URIref that is a reference to a resource representing the answer KB. In many cases, that URIref will be a URL that can be used to access the KB or to communicate with the server about the KB, but the URIref is not required to be a URL.

A DQL query contains an answer KB pattern that is a KB, a list of KB references, or a variable. If a query’s answer KB pattern is a KB or a reference to a KB, then the conjunction of the answer sentences in each query answer in the response set must be entailed by that KB. If a query’s answer KB pattern is a list of KB references, then the conjunction of the answer sentences in each query answer in the response set must be entailed by the conjunction of the KBs referenced in that list. If a query’s answer KB pattern is a variable, then the server is free to select or to generate an answer KB from which to answer the query, but if the variable is a must-bind variable, then the answer must provide a binding to the variable that is a reference to the answer KB.

DQL specifies a core set of protocol elements that are to be used by a querying agent (which we refer to as the client) to obtain query answers from an answering-agent (which we refer to as the server). Specifically, DQL specifies that a client initiates a query-answering dialogue with a server by sending the server a DQL query. The server is expected to respond by sending an answer bundle to the client containing zero or more answers and a server continuation that is either a process handle which the client can use to request additional answers or one or more termination tokens indicating that the server will not provide any more answers to the query. Upon receiving a server continuation from the client, the server is expected to respond similarly by sending the client another answer bundle containing zero or more additional query answers until the server continuation in the answer bundle is one or more termination tokens. If an answer bundle contains zero answers, its server continuation must be one or more termination tokens.

A process handle is an atomic entity with no internal structure visible to the client; its role is only to allow the server to record the state of its answer search so that it can continue the answering dialog even if it has been engaged in other activities since sending the previous answer bundle. A termination token can be 'none', meaning that the server is claiming that there are no further answers entailed by the answer KB, or 'end', meaning that the server is making no claims as to whether there are more answers entailed by the answer KB. Other termination tokens are allowed, but in all cases it is required that a termination token be clearly distinguishable from a process handle.

No attempt is made here to specify a complete inter-agent protocol (e.g., with provisions for time-outs, error handling, resource budgets, etc.). Query answering servers are required to support the specified core protocol elements and are not constrained by the DQL specification as to how additional protocol functionality is provided.

Detailed Specification

Query

A DQL query necessarily includes:

· A query pattern that is a collection of DAML+OIL sentences in which some of the literals and URIrefs have been replaced by variables. Formally, a query pattern is a DAML+OIL KB in which some URIrefs and/or literals have been replaced by variables.

· An answer KB pattern that is a KB reference, a list of KB references, or a variable.

· A must-bind variables list and a may-bind variables list. Each item on these lists is a query variable, and no query variable can be an item of both lists. Query variables that are on neither of these lists are considered to be don’t-bind variables.

A DQL query may also optionally include:

· A query premise that is a DAML+OIL KB or a KB reference. When a query premise is specified, the sentences in the query premise are considered to be included in the answer KB. This option is intended to facilitate if-then queries while still remaining within the expressiveness of DAML+OIL. Omitting the query premise is equivalent to providing an empty query premise.

· A justification request. A DQL query can optionally include a request for a justification for each query answer. (This option is noted here for future reference, but no further details are provided, and servers implementing the August 2002 version of DQL may ignore this part of a query.[4])

· An answer bundle size bound that is a positive integer. Answer bundles are to contain no more than this number of answers. Omitting the answer bundle size bound effectively sets it to infinity.

Query Answer

An answer to a query necessarily includes:

· A (possible empty) binding set that satisfies the following conditions:

1. Each element of the binding set is a lexical mapping that associates a URIref or literal to a query variable.

2. The binding set contains a binding to each of the must-bind query variables, to zero or more of the may-bind query variables, and to none of the don’t-bind query variables;

3. If the binding set contains a binding to a variable that is the answer KB pattern, then the binding is to a reference to the answer KB;

4. All bindings in the binding set to variables in the query pattern are terms that occur in the DAML+OIL language or in the answer KB;

5. Suppose Q is the query pattern for the query of which this is an answer, and B is the subset of the binding set consisting of all the bindings to variables in Q. We write B(Q) to refer to the KB obtained by applying the bindings B to Q, i.e., by substituting the URIref or literal that is associated with v for every variable v that has a binding in B. B(Q) may contain some variables from Q that are not replaced by B; we refer to these variables as remaining variables. An interpretation I satisfies B(Q) if there is a mapping C from the remaining variables of B(Q) to the universe of I such that I+C satisfies B(Q); that is, if the interpretation can be extended to provide interpretations of the remaining variables in some way that makes B(Q) true. Then, in the usual way, we say that the answer KB entails B(Q) just in case B(Q) is true in every interpretation that makes the answer KB true. Intuitively, this means that the remaining variables are treated as existential 'blanks', which indicate that something exists without saying what it is. The condition, then, that must be satisfied by the binding set is that the answer KB entails B(Q).

· The query to which this is an answer.

· A reference to the server that produced the answer.

A query answer may also optionally include:

· An answer justification. (As noted above, this option is here for future reference, but no further details are provided, and servers implementing the August 2002 version of DQL may ignore a justification request in a query.)

Answer Bundle

An answer bundle necessarily includes:

· An answer set each of whose elements is a query answer. The answer set may be empty and must not contain more query answers than the answer bundle size bound in the query, if present.

· A server continuation consisting either of a process handle or one or more termination tokens. A dialog is said to have terminated when the server sends a bundle containing one or more termination tokens. If a server is sent a termination token as a server continuation, the server is expected to reply with a bundle containing an empty answer set and the same termination token, thereby terminating the dialog. We note that the use of the 'none' termination token (which indicates that the answer KB does not entail any other answers not in the response set) should be restricted to those cases where the server is able to make a positive affirmation that no other answers exist, i.e., to provide a guarantee that there are no other possible bindings to the query variables which would produce an answer sentence that would be entailed by the answer KB. There is no provision in DQL for a query to indicate an upper bound on the total number of answers in a dialog, but a client can terminate a question-answering dialog at any time by sending the 'end' token as a server continuation, or simply by not requesting any further continuations.

Response Set

While there are no global requirements on a response set other than that all its members are correct answers, it is recommended that servers ensure that answer bundles do not contain duplicate or redundant answers, i.e., answers which are subsumed by other answers. One answer subsumes another if its bindings are a superset of the bindings in the other answer. Servers that are able to guarantee that their response sets contain no duplicate answers can be called non-repeating. Servers that are able to guarantee that their response sets contain no duplicate or redundant answers can be called terse. Servers that are able to guarantee that their response sets will be correctly terminated with 'none' can be called complete.

Answer Set

The answer set of a query is the largest set of query answers that are entailed by the answer KB and none of which are entailed by any other answer. Notice that this definition is semantic rather than operational. A complete server is one whose response set contains the answer set of the query. A terse complete server is one whose response set is precisely the answer set of the query.

Other Issues

Restricted Query Patterns

The specification of query pattern above allows for arbitrary patterns of variable replacement in DAML+OIL expressions. Particular servers, however, may restrict themselves to particular such query patterns, or provide guarantees of giving meaningful answers only when given particular kinds of query patterns. To allow for such cases, we introduce the notion of a query class, defined simply as a class of query patterns. Server specifications may refer to any well-defined query pattern restriction and define their performance, in the terms of this specification, to the case where all references to a query pattern are understood to refer only to query patterns in that class.

If a server accepts only query patterns in a certain class, it is said to apply to that class. The notions of completeness and terseness may also be relativized to queries of a certain class when stating the conformance of a server to this specification.

For example, one class of query patterns might be those of one of the forms:

?x rdf:type CCC
?x daml:subClassOf ?y

where CCC is some DAML class expression and ?x and ?y are variables, or of the form

?x PPP ?y

where PPP is some DAML property expression other than those used in RDF(S) or DAML+OIL syntax.

When performance is defined relative to a query class, the termination token 'none' should not be used to mean 'none relative to the class'. To avoid confusion, the use of other termination tokens, each with a meaning defined relative to the particular class, is required.

Future versions of this specification may define particular query classes and corresponding termination tokens.

“How Many” Queries

The language and protocol contains no explicit constructs for asking how many (or how many more) answers there are to a given query. Defining what is meant by “how many” is problematic in that there can be multiple bindings for a given variable that all denote the same object in the domain of discourse, so that how many answer bindings there are for a given distinguished variable will in general differ from how many answer objects in the domain of discourse that variable can denote. The core protocol could reasonably be extended to support “how many” queries, where “how many” means how many answers containing distinct sets of bindings can the server produce. The difficulty of a server determining how many answers it can produce to a query without actually producing the answers has been the primary rationale for not including a “how many” construct in the query language.

Inability to respond to queries

Servers are not required to deliver answers to queries. Under various circumstances, a query may be phrased in a form which makes it impossible for a server to respond with any answers; for example, where the query specifies an answer KB that the server is unable to access or use, or where no bindings are available for must-bind variables. Under these circumstances, the server should terminate the dialog with a bundle containing no answers and an appropriate termination tag or tags, one of which should be 'end'.

[1] Although this specification is intended to provide sufficient information to enable use and analysis of DQL, it does not include a recommendation for a normative external syntax for the language.

[2] The DAML Joint Committee is chaired by Mike Dean, and the current members are Harold Boley, Dan Brickley, Stefan Decker, Richard Fikes, Benjamin Grosof, Frank van Harmelen, Pat Hayes, Jeff Heflin, Ian Horrocks, Ora Lassila, Deb McGuinness, Peter Patel-Schneider, and Lynn Andrea Stein.

[3] We use “conjunction” informally in this Overview section since DAML+OIL does not have a logical connective for conjoining sentences or for conjoining knowledge bases. We consider a conjunction of sentences to be a sentence that is true if and only if all of its conjuncts are true. We consider a conjunction of knowledge bases to be a knowledge base consisting of all the sentences in all the conjunct knowledge bases.

[4] The content and structure of a DQL justification for a query answer has not yet been determined. The intent of the DQL designers is for there to be various types of justifications that can be requested in a query. Examples of justifications range from the set of sentences used to derive the answer ('set of support') to a complete proof or derivation of the answer in some proof-theoretic framework.

DQL Spec and Announcement

DAML Query Language (DQL)