Query Language Issues

From: Richard Fikes (fikes@ksl.stanford.edu)
Date: 11/06/01


I said last week that I would send out a message articulating the issues
we have been discussing regarding query-answering.  This is that
message.  My apologies for not sending the message out sooner.  I am
back at home (and my father is recuperating well from his surgery), but
I did not have time to put this together sooner.

As per my earlier messages, my student Yulin Li has been significant
contributions to the material in this message.

Richard

-----------------------

This message is to articulate issues involved in specifying and
formalizing the information content of a "query-answering discourse"
between two agents in which one agent seeks information from a second
agent by sending a "query" to the second agent.

I will refer to the agent sending the query as the "client" and the
agent receiving the query as the "server".  I will refer to the response
sent by the server to the client as a "query result" and will assume
that a query result may contain one or more "query answers".


Knowledge Base

I assume a query is posed with respect to a knowledge base that is a
DAML+OIL representation of a logical theory.  Thus, a query needs to
include a reference to a DAML+OIL knowledge base.  I will refer to that
knowledge base as the "query KB".


Query Premise

We have discussed enabling a query to include a premise that is to be
added to the query KB so that the query is being asked of the query KB
unioned with the premise.  A premise essentially facilitates queries of
the form if-then while still remaining within the expressiveness of
DAML+OIL.

ISSUE: Do we want to enable the inclusion of a premise in a query, and
if so what is the form of a premise?  I recommend that we do allow
premises and that they be an arbitrary DAML+OIL knowledge base.


Query Pattern

I assume a query contains a "query pattern" that specifies relationships
among unknown sets of objects in a domain of discourse.  Each unknown
object is represented in the query pattern by a "query variable". 
Answering a query with query variables x1,,xn involves identifying
tuples of object constants such that for any such tuple  <c1,,cn>, if
ci is substituted for xi in the query pattern for i=1,,n, then the
resulting "query pattern instance" specifies a sentence that is entailed
by the query KB.  For each such ci, the pair [xi, ci] is called a "query
variable binding" for the corresponding query variable xi.  Each set of
query variable bindings specifies a candidate answer to the query.

ISSUE: What is the expressive power of the query pattern language?  For
example, we might decide that a query pattern instance can specify a
sentence that is a disjunction of a conjunction of RDF statements or is
a negation of a conjunction of RDF sentences, or .  My recommendation
is that we define the query pattern language so that a query pattern
instance specifies a conjunction of sentences each of which is
representable in DAML+OIL (i.e., a conjunction of RDF statements).


Answer Mapping Function

A query needs to specify a mapping of query variable bindings into query
answers.  In particular, a query answer may include or make use of
bindings to only a subset of the query variables.

ISSUE:  What is the nature of the answer mapping function language?  For
example, a query answer might be allowed to be any s-expression whose
atomic elements are bindings to specified variables.  (E.g., map the
bindings [x1, c1], [x2,c2], [x3,c3], and [x4,c4] to the s-expression
"(c1 (c2 c3) c4)".)  I think all that matters for our formalization and
core design work is that a query answer may make use of the bindings to
only a subset of the query variables.  So, my recommendation is that for
now we consider a query answer to consist of a set of bindings for a
subset of the query variables, and that the query specifies which query
variables are in that subset.  I will make that assumption in the
remainder of this document.

ISSUE:  What constants can be in query answer bindings?  In particular,
can a query variable be bound in a query answer to an anonymous node in
the RDF graph?  If so, what is the form and semantics of that binding? 
Also, can a query variable be bound in a query answer to an object that
is entailed in the knowledge base (e.g., by a cardinality constraint)
but whose identity is not known by the server?  If so, what is the form
and semantics of that binding?  


Uniqueness of Answers

In general, a query result may contain multiple query answers.

ISSUE: What guarantees do we want to make about the distinctiveness of
multiple query answers in a query result?  We may want to guarantee that
no two answers consist of identical sets of bindings.  That would say
that if A1 and A2 are query answers in a query result, then there exists
a query variable xi for which the binding for xi in A1 is not the same
constant as the binding for xi in A2.  A more difficult guarantee would
be that no two answers consist of equal sets of bindings.  That would
say that if A1 and A2 are query answers in a query result, then there
exists a query variable xi for which the binding for xi in A1 is not
equal to (i.e., does not denote the same object as) the binding for xi
in A2.  My recommendation is that we guarantee there are no identical
sets of bindings and that we enable a query to include an indicator as
to whether equal sets of bindings are acceptable.


Number of Answers

I assume our query language needs to enable a query to specify what is
being asked for and our query result language needs to enable a query
result to contain the information requested in a query.  That
specification in a query would include how many query answers are being
requested and what information is being requested about how many query
answers there are.  Also, the query result language needs to enable
including in a query result the information requested about how many
query answers there are.

ISSUE: Whether and how to include in a query the number of query answers
being requested.  For example, we might enable a query to include a "#
answers requested" that could be either a non-negative integer, the
constant "All", or the constant "Enumerator".  If the number of answers
requested is an integer n, the query is a request for as many query
answers as the server can deduce up to n.  If the number of answers
requested is "All", the request is for as many query answers as the
server can deduce.  If the number of answers requested is "Enumerator",
the request is for a process handle (continuation) that can be sent back
to the server in a subsequent message to request that the server produce
a "batch" of k query answers.  Some convention is needed for the case
where the number of answers requested is "All" and the server can deduce
an infinite number of answers.  Perhaps the convention would be that the
server returns a process handle in that case.

ISSUE: Whether and how to include in a query what information is being
requested about how many query answers there are and in a query result
the information requested.  For example, we might enable a query to
include a request for information as to how many query answers there are
("# answers?") and for a query result to include a specification of how
many query answers are entailed ("# answers entailed") in the form of an
ordered paired denoting a closed interval.  For example, a result
containing the ordered pair [4,0-0] as the "# answers entailed" means
that the server has determined that there are at least 4 answers to the
query.  (I am using "0-0" here to stand for the "infinity" symbol.) 
Note that a value of [0,0-0] for "# answers entailed" is always true. 
When a query includes a request of "# answers", the server is being
asked to deduce whatever it can about how many answers there are.   Note
that a query can specify that "# answers requested" is zero so that the
only information being requested is the number of answers.  Also, note
that a query result could contain a value for "# answers" even when that
information was not requested in the query.


Justification For Answers

Under the assumption that our query language needs to enable a query to
specify what is being asked for and our query result language needs to
enable a query result to contain the information requested in a query,
the query language needs to enable inclusion in a query a request for a
justification for each query answer and the query result language needs
to enable inclusion of such justifications in a query result.

ISSUE: Whether and how to include provisions for justifications to be
requested in a query and be included in a query result.  What kind of
justification language(s) do we include in our query results language?


Knowledge Base Structure Queries

There seems to be a clear need to ask queries about the knowledge base
itself as an artifact.  The most compelling example involves determining
the "direct" subclasses of a class or subproperties of a property.

ISSUE:  Whether and how to include such queries in our query language
and query results language.  The primary issue seems to be to what
extent we allow intermingling of "structural queries" with "entailment
queries".  That is, we probably don't want to allow a request for direct
subclasses to appear anywhere in a query pattern that an RDF statement
(with variables) could appear.


This archive was generated by hypermail 2.1.4 : 04/02/02 EST