From: Richard Fikes ([email protected])
Date: 08/23/02
Attached is an updated DQL spec for your consideration. Neither Pat nor Ian have seen this version, so I take responsibility for any screw-ups. As of now, this is the version of the spec to be considered for committee approval on Tuesday. Richard
Richard Fikes,
Pat Hayes, Ian Horrocks, editors
DAML Query Language (DQL) is a formal language and protocol
for posing queries from a querying agent (which we refer to as the client) to an answering agent (which we
refer to as the server). A DQL
query contains a query pattern that
is a collection of DAML+OIL sentences in which some literals and/or URIrefs
have been replaced by variables. An
answer to the query provides bindings of terms to some of these variables such
that the conjunction[1] of
the answer sentences - produced by applying the bindings to the query pattern
and considering the remaining variables in the query pattern to be
existentially quantified - is entailed by a knowledge base called the answer KB.
Each binding in a query answer is a URIref or a
literal that either explicitly occurs as a term in the answer KB or is a term
in DAML+OIL. That is, DQL is designed for answering queries of the form
"What URIrefs and literals from the answer KB and DAML+OIL denote objects
that make the query pattern true?”
Variables in queries may be designated as must-bind
or may-bind variables. Answers are required to provide bindings for all must-bind variables,
and may provide bindings
for may-bind variables. Queries may
optionally provide or require information about the knowledge base used to
answer the query and impose constraints on the dynamics of the answering
process. Answers provided by the server
must conform to these requirements, but a server may restrict its answers to
certain classes of query patterns, to a certain class of knowledge bases, or to
a limited range of bindings.
A single query may have zero or more answers. The set of all answers provided by the server
in response to a query is called the response
set of that query. Not all the
answers in the response set need be produced at once: in general, answers will
be delivered in groups, each of which is called an answer bundle. A query may
specify an upper bound on the number of answers that are delivered in a single answer
bundle.
The set of DAML sentences that are used by the server in answering a query is
referred to as the answer KB. This may be an actual knowledge base or a conjunction
of actual knowledge bases, or it may be a virtual entity representing the total
information available to the server at the time of answering. However, all servers are required to be able
to provide a reference to a resource representing the answer KB. We will assume that such a reference to an
answer KB has the form of a URIref. In many cases, that URIref will be a URL that
can be used to access the KB or to communicate with the server about the KB,
but the URIref is not required to be a URL. A DQL query contains an answer KB pattern that is either a KB, a list each of whose items
is a reference to a KB, or a variable. If
a query’s answer KB pattern is a reference to a KB, then the conjunction
of the answer sentences in each query answer in the response set must be
entailed by that KB. If the answer KB
expression in a query is a variable, then the server is free to select or to
generate an answer KB from which to answer the query, but if the variable is a must-bind
variable, then the answer must provide a binding to this variable which
references the answer KB.
DQL specifies a core set of protocol elements that are to be used by a client
to obtain query answers from a server. Specifically, DQL specifies that a
client initiates a query-answering dialogue with a server by sending the server
a DQL query. The server is expected to respond by sending answers to the
client one or more at a time along with a server continuation that is
either a process handle which the client can use to request additional answers
or a token indicating that the server will not provide any more answers to the
query. A process handle is an atomic entity with no internal structure visible
to the client; its role is only to allow the server to record the state of its
answer search. The token can be 'none', meaning that the server is
claiming that there are no further answers entailed by the answer KB, or 'end', meaning that the server is making no claims as
to whether there are more answers entailed by the answer KB. Other token values
may be allowed, but in all cases it is required that a token be clearly
distinguishable from a process handle. No attempt is made here to specify
a complete inter-agent protocol (e.g., with provisions for time-outs, error
handling, resource budgets, etc.). Query answering servers are required
to support the specified core protocol elements and are not constrained by the
DQL specification as to how additional protocol functionality is provided.
The client initiates a dialog with the server by sending a query. The typical response is a bundle of answers plus a server continuation which can be send back by the client to the server. On receiving a server continuation, the server responds similarly until the continuation in the response is a termination token. The set of all answers in all groups sent from the server to the client between the query and the termination token is the response set of the query.
A DQL query necessarily includes:
A DQL query may also optionally include:
An answer to a query must contain:
Again, we went from collection of sentences in the query to
sentence in the specification of the answer. I'm also rather concerned about
the use of RDF blank nodes in this context. This may be OK if we are thinking
of the query as an RDF graph (which we didn't make clear up to now), so a
single variable gets replaced by a single blank node, but if it is a collection
of XML serialised triples, say, we need to be sure
that the a given variable is always replaced with the same new blank node or we
will loose the co-reference constraint on answers. (Also, by using RDF blank
nodes, aren't we precluding the case where a variable corresponds to a
property? Is this deliberate?) All in all, I think we need to be rather more
precise at this point.
Here is a rough cut at a MT style of defining what
constitutes a valid answer binding if we consider a query to be a KB:
Let K be the answer KB, U the set of unirefs
and literals occurring in K, Q a query KB in which some unirefs
and literals have been replaced with variables, V the set of variables in Q, Vm (a subset of V) the set of must-bind variables in Q, B a
binding that maps every element of Vm to an element
of U and zero or more elements of V-Vm to elements of
U, and B(Q) the KB that results from applying the binding B to the KB Q. A
model I of K satisfies B(Q) if the interpretation
function can be extended to any remaining variables in B(Q) in such a way that
I is a model of B(Q). K entails B(Q) if every model I
of K satisfies B(Q).
In addition, an answer may contain:
An answer bundle is a finite set of answers plus a tag
consisting of either a server continuationor one or more termination tokens.
The number of answers in an answer bundle given in response to a query must not
exceed the answer bundle size bound in the query, if present.
Server continuations are atomic entities with no internal structure visible to
the client. A server must continue the answering process when sent a server
continuation by sending back another answer bundle. Servers should encode
sufficient information in the server continuation to enable them to continue
the answering dialog even if they have been engaged in other activities since
sending the previous answer bundle. Every dialog should terminate after a
finite number of exchanges of server continuations and answer bundles between
the client and server. A dialog is said to have terminated when the
server sends a bundle containing a termination token. If a server is sent a
termination token as a server continuation, the server should reply with a
bundle containing no answers and the same termination token, thereby
terminating the dialog.
Clients must be able to distinguish termination tokens from server
continuations. Termination tokens may be used to convey information about the
response set; in particular, two termination tokens have fixed meanings. The
token "end" simply means that the server is unable to deliver
any further answers, but makes no claim of completeness. The token "none"
indicates that the answer KB does not entail any other answers not in the
response set. We note that the use of the "none" token should
be restricted to those cases where the server is able to make a positive
affirmation that no other answers exist, i.e., to provide a guarantee that
there are no other possible bindings to the query variables which would produce
an answer sentence that would be entailed by the answer KB. Other termination
tags may also be used, but the "end" tag is the recommended
way to indicate termination of a question-answering dialog.
We said above that a bundle ends with a single token; here
with one or more tokens. Which do we mean? I would say just one token, but
maybe you are thinking of cases where you want/need more. If only one, then we
can't say that "end" is the recommended terminator. If we allow more
than one, what would it mean if I say "none end" as opposed to
"end none"?
There is no provision in DQL for a query to indicate an upper bound on the
total number of answers in a dialog, but a client can terminate a question-answering
dialog at any time by sending the "end" token as a server
continuation, or simply by not requesting any further continuations.
While there are no global
requirements on a response set other than that all its members are correct
answers, it is recommended that servers ensure that answer bundles do
not contain duplicate or redundant answers, i.e. answers which are subsumed by
other answers. One answer subsumes another if its
bindings are a superset of the bindings in the other answer. Servers which are able to guarantee that their response sets
contain no duplicate answers can be called "non-repeating".
Servers which are able to guarantee that their response sets contain no duplicate or redundant answers can be called
"terse" or "non-verbose". Servers which are able to
guarantee that their response sets will be correctly terminated with "none"
can be called "complete".
The answer set of a query is the largest set of answers which are
entailed by the answer KB and none of which are entailed by any other answer.
Notice that this definition is semantic rather than operational. A complete
server is one whose response set contains the answer set of the query. A terse
complete server is one whose response set is precisely the answer set of the
query.
It may be impossible to implement a server that can guarantee
to be terse and complete for all KBs and query
patterns.
The specification of query
pattern above allows for arbitrary patterns of variable replacement in DAML
expressions. Particular servers, however, may restrict themselves to particular
such query patterns, or provide guarantees of giving meaningful answers only
when given particular kinds of query pattern. To allow for such cases we
introduce the notion of a query class, defined simply as a class of
patterns. Server specifications may refer to any well-defined query pattern
restriction and define their performance, in the terms of this standard, to the
case where all references to a query pattern are understood to refer only to
query patterns in that class.
If a server accepts only query patterns in a certain class, it is said to
"apply to" that class; the notions of completeness and terseness may
also be relativized to queries of a certain class,
when stating the conformance of a server to this specification.
For example, one class of query patterns might be those of one of the forms:
?x rdf:type CCC .
?x daml:subClassOf ?y .
where CCC is some DAML class expression, or of the
form
?x PPP ?y .
where PPP is some DAML property expression other than
those used in RDF(S) or DAML+OIL syntax.
When performance is defined relative to a query class, the termination token 'none'
should not be used to mean 'none relative to the class'. To avoid
confusion, the use of other termination tokens, each with a meaning defined
relative to the particular class, is required.
Future versions of this specification may define particular query classes and
corresponding termination tokens.
The language and protocol contains no explicit constructs for asking how many (or how many more) answers there are to a given query. Defining what is meant by “how many” is problematic in that there can be multiple bindings for a given distinguished variable that all denote the same object in the domain of discourse, so that how many answer bindings there are for a given distinguished variable will in general differ from how many answer objects in the domain of discourse that variable can denote. The core protocol could reasonably be extended to support “how many” queries, where “how many” means how many answers containing distinct sets of bindings can the server produce. The difficulty of a server determining how many answers it can produce to a query without actually producing the answers has been the primary rationale for not including a “how many” construct in the query language.
Servers are not required to
deliver answers to queries. Under various circumstances, a query may be phrased
in a form which makes it impossible for a server to respond with any answers;
for example, if the query specifies an answer KB which the server is unable to
access or use, or where no bindings are available for must-bind variables.
Under these circumstances, the server should terminate the dialog with a bundle
containing no answers and an appropriate termination tag or tags, one of which
should be 'end'.
-----
[1] We use “conjunction” informally in this Overview section since DAML+OIL does not have a logical connective for conjoining sentences or for conjoining knowledge bases. We consider a conjunction of sentences to be a sentence that is true if and only if all of its conjuncts are true. We consider a conjunction of knowledge bases to be a knowledge base that is logically equivalent to the knowledge base consisting of all the sentences in all the conjunct knowledge bases.
This archive was generated by hypermail 2.1.4 : 08/23/02 EDT