Informal DQL Specification
DAML Joint Committee
Richard Fikes, Pat Hayes, Ian
Horrocks, editors
June 12, 2002
1. Overview
DQL is a formal language and protocol for posing queries from a querying agent
(which we refer to as the "client") to an answering agent (which we refer
to as the "server"). A DQL query contains a "query pattern" that is
a collection of DAML+OIL sentences in which some literals and/or urirefs have
been replaced by variables. An answer to the query provides bindings of terms
to some of these variables such that the answer sentence - produced by applying
the bindings to the query pattern and considering the remaining variables
in the query pattern to be existentially quantified - is entailed by
a knowledge
base called the "answer KB".
Each binding in a query answer is a uri-ref or a literal that either explicitly
occurs as a term in the answer KB or is a term in DAML+OIL. That is,
DQL is designed for answering queries of the form "What uri-refs and literals
from the answer KB and DAML+OIL denote objects that make the query pattern
true?" We note that this does not require servers to generate existential
conclusions from 'implicit' knowledge in order to answer queries.
Variables in queries may be designated as "must bind" or "may bind" variables.
Answers are required to
provide bindings
for all "must bind" variables, and may
provide bindings for "may bind" variables. Queries may optionally provide
or require information about the knowledge based used to answer the query
and impose constraints on the dynamics of the answering process. Answers
provided by the server must conform to these requirements, but a server may
restrict its answers to certain classes of query pattern, to a certain
class of knowledge bases, or to a limited range of bindings.
A single query may have one, none or several answers. The set of all answers
provided by the server in response to a query is called the "response set"
of that query. Not all the answers in the response set need be produced at
once: in general, answers will be delivered in groups. A query may specify
an upper bound on the number of answers that are delivered in a single group.
The set of DAML assertions which are used by the server in answering a query
is referred to as the "answer KB". This may be an actual knowledge base (or
a finite set of knowledge bases) or it may be a virtual entity representing
the total information available to the server at the time of
answering; however,
all servers are required to be able to provide a reference to a
resource representing
the answer KB. We will assume that such a reference to an answer KB has the
form of a uriref; in many cases this may be a URL which can be used to access
the KB or communicate with the server, but this is not required. A DQL query
contains an "answer KB expression" which is either a variable or a reference
to a KB. If the answer KB expression in a query is a reference to a KB, then
all answer sentences of answers in the response set must be entailed by that
KB. If it is a variable, then the server is free to select or to generate
an answer KB, but if the variable is "must bind" then the answer must provide
a binding to this variable which references the answer KB.
DQL specifies a core set of protocol elements that are to be used by a client
to obtain query answers from a server. Specifically, DQL specifies that
a client initiates a query-answering dialogue with a server by sending the
server a DQL query. The server is expected to respond by sending answers
to the client one or more at a time along with a server continuation that is
either a process
handle which the client can use to request additional answers or a token
indicating that the server will not provide any more answers to the query.
A process handle is an atomic entity with no internal structure visible to
the client; its role is only to allow the server to record the state of its
answer search. The token can be 'none',
meaning that the server is claiming that there are no further answers entailed
by the answer KB, or 'unknown', meaning
that the server is making no claims as to whether there are more
answers entailed
by the answer KB. Other token values may be allowed, but in all cases it
is required that a token be clearly distinguishable from a process
handle.
No attempt is made here to specify a complete inter-agent protocol (e.g.,
with provisions for time-outs, error handling, resource budgets, etc.).
Query answering servers are required to support the specified core protocol
elements and are not constrained by the DQL specification as to how additional
protocol functionality is provided.
2. Detailed specification
The client initiates a dialog with the server by sending a query. The typical response is a bundle of answers plus a server continuation which can be send
back by the client to the server. On receiving a server continuation, the
server responds similarly until the continuation in the response is a termination token. The set of all answers
in all groups sent from the server to the client between the query and the
termination token is the response set
of the query.
Query
A DQL query necessarily includes:
- a query pattern, which
is a collection of DAML+OIL sentences in which some of the literals and urirefs
have been replaced by variables;
- an answer KB pattern, which
is either a single variable or a reference to a KB;
- an indication of which of the
variables in the patterns are "must bind" or "may bind" variables. No variable
can be both "must bind" and "may bind".
A DQL query may also optionallyinclude:
- a query premis, which is either
a DAML+OIL KB or a reference to a KB. When a query premise is specified,
the sentences in the query premise are considered to be included in the answer
KB. This option is intended to facilitate if-then queries while still remaining
within the expressiveness of DAML+OIL. Omitting the query premis is
equivalent to providing an empty query premis.
- a justification request. A DQL
query can optionally include a request for a justification for each query answer.
(This option is noted here for future reference but no further details are
provided, and servers may ignore this part of a query. The content and
structure of a justification for a query answer has not yet been
designed.
The intent is to specify various types of justifications that can be requested
in a query. Examples of justification range from the set of sentences used
to derive the answer ('set of support') to a complete proof or derivation
of the answer in some proof-theoretic framework.)
- an answer bundle size bound,
which is a positive nonzero number. Omitting the answer bundle size bound
effectively sets it to infinity.
Answer
An answer to a query must contain:
- a binding of a uriref
or a literal
to each of the "must bind" and zero or more of the "may bind" variables which
satisfies the following:
- 1. A variable in the answer KB pattern is bound to a reference
to the answer KB;
- 2. All variables in the query pattern are bound to terms which occur
in the DAML+OIL language or in the answer KB;
- 3. The answer KB entails the answer sentence got by replacing all
variables in the query pattern which are bound in the answer by
their bindings,
and replacing all other variables by new RDF blank nodes.
- the query to which it is the answer;
- a reference to the server which produced the answer.
In addition, an answer may contain:
Answer bundle
An answer bundle is a finite set
of answers plus a tag consisting
of either a server continuationor
one or more termination termination
tokens.
The number of answers in an answer bundle given in response to a query must
not exceed the answer bundle size bound in the query, if present.
Server continuations are atomic entities with no internal structure visible
to the client. A server must continue the answering process when sent a server
continuation by sending back another answer bundle. Servers should encode
sufficient information in the server continuation to enable them to continue
the answering dialog even if they have been engaged in other activities since
sending the previous answer bundle. Every dialog should terminate after a
finite number of exchanges of server continuations and answer bundles between
the client and server. A dialog is said to have terminated when the server sends a bundle
containing a termination token. If a server is sent a termination token as
a server continuation, the server should reply with a bundle containing no
answers and the same termination token, thereby terminating the dialog.
Clients must be able to distinguish termination tokens from server
continuations.
Termination tokens may be used to convey information about the response set;
in particular, two termination tokens have fixed meanings. The token "end" simply means that the server is unable
to deliver any further answers, but makes no claim of completeness. The token
"none" indicates that the answer KB
does not entail any other answers not in the response set. We note that the
use of the "none" token should be
restricted to those cases where the server is able to make a positive
affirmation
that no other answers exist, i.e. to provide a guarantee that there are no
other possible bindings to the query variables which would produce an answer
sentence that would be entailed by the answer KB. Other termination tags
may also be used, but the "end" tag
is the recommended way to indicate termination of a question-answering dialog.
There is no provision in DQL for a query to indicate an upper bound on the
total number of answers in a dialog, but a client can terminate a
question-answering
dialog at any time by sending the "end"
token as a server continuation, or simply by not requesting any
further continuations.
Response Set
While there are no global requirements on a response set other than that
all its members are correct answers, it is recommended that servers ensure that answer
bundles do not contain duplicate or redundant answers, i.e. answers which
are subsumed by other answers. One answer subsumes another if it is
like the second answer but binds more variables. Servers which are
able to guarantee that each answer bundle in a dialog contains no answers
that are redundant with any answers given earlier in the dialog can be called
"non-repeating". Servers which are able to guarantee that their response
sets contain no redundant or duplicate answers can be called "terse"
or "non-verbose".
Servers which are able to guarantee that their response sets will be
correctly terminated with "none"
can be called "complete".
The answer set of a query is the
largest set of answers which are entailed by the answer KB and none of which
are entailed by any other answer. Notice that this definition is semantic
rather than operational. A complete server is one whose response set contains
the answer set of the query. A terse complete server is one whose response
set is precisely the answer set of the query.
For some KBs it may be impossible to implement a terse complete server for
all query patterns.
3. Other Issues
Restricted query patterns
The specification of query pattern above allows for arbitrary patterns of
variable replacement in DAML expressions. Particular servers, however, may
restrict them selves to particular such query patterns, or provide guarantees
of giving meaningful answers only when given particular kinds of query pattern.
To allow for such cases we introduce the notion of a query class, defined simply as a class
of patterns. Server specifications may refer to any well-defined query pattern
restriction and define their performance, in the terms of this standard, to
the case where all references to a query pattern are understood to refer to
only to query patterns in that class.
If a server accepts only query patterns in a certain class, it is said to
"apply to" that class; the notions of completeness and terseness may also
be relativized to queries of a certain class, when stating the conformance
of a server to this specification.
For example, one class of query patterns might be those of one of the
forms:
?x rdf:type CCC .
?x daml:subClassOf ?y .
where CCC is some DAML class expression, or of the form
?x PPP ?y .
where PPP is some DAML property expression other than those used in RDF(S)
or DAML+OIL syntax.
When performance is defined relative to a query class, the termination token
'none' should not be used to mean 'none relative to
the class'. To avoid confusion, the use of other termination tokens, each
with a meaning defined relative to the particular class, is required.
Future versions of this specification may define particular query classes
and corresponding termination tokens.
“How Many” Queries
The language and protocol contains no explicit constructs for asking how many
(or how many more) answers there are to a given query. Defining what
is meant by “how many” is problematic in that there can
be multiple bindings
for a given distinguished variable that all denote the same object in the
domain of discourse, so that how many answer bindings there are for a given
distinguished variable will in general differ from how many answer objects
in the domain of discourse that variable can denote. The core protocol
could reasonably be extended to support “how many”
queries, where “how many”
means how many answers containing distinct sets of bindings can the server
produce. The difficulty of a server determining how many answers it
can produce to a query without actually producing the answers has been the
primary rationale for not including a “how many”
construct in the query language.
Inability to respond to queries
Servers are not required to deliver answers to queries. Under various
circumstances,
a query may be phrased in a form which makes it impossible for a server to
respond with any answers; for example, if the query specifies an answer KB
which the server is unable to access or use, or where no bindings are available
for "must bind" variables. Under these circumstances, the server
should terminate
the dialog with a bundle containing no answers and an appropriate termination
tag or tags, one of which should be 'end'.
-----