Re: Query Language Issues

From: Richard Fikes (fikes@ksl.stanford.edu)
Date: 11/12/01


I have been doing other things for a few days and am now getting back to
the query language discussion.

In this message, I am attaching at the end an updated version of the
"Query Language Issues" message I sent out last Tuesday and am
presenting a proposal for how to handle query answers that include
bindings to constants created by the server (i.e., the query answering
agent).

The updated issues message primarily separates the statements of issues
from the proposals I made to deal with some of the issues.  It also
contains some minor updates in response to some of Pat's concerns about
the original message.

What follows is a proposal for discussion for how to handle query
answers that include bindings to constants created by the server.  Such
constants would either be the labels of anonymous RDF nodes or Skolem
constants that denote objects whose existence is entailed (e.g., by
cardinality restrictions).

In an e-mail discussion involving Pat and Ian, an example query ("What
person owns which red car?") was being considered whose query pattern
is, using their notation, ?[x,y](Px & RCy & Oxy).  I will assume here
that the request is for all answers to the query.

There was discussion of using a binding to "<blank>" when the binding
would be to a constant created by the server.  So, for example, an
answer might be [x/Ian, y/<blank>], meaning that Ian owns at least one
red car but that no names are available for the cars.  The following
examples were also given:

>  [[x/blank y/blank]] -Yes, someone owns a red car
>  [[x/Ian y/blank]] - Ian owns (at least one) red car
>  [[x/Ian y/Car4][x/Ian y/Car5]] - Ian owns at least two red cars
>  [[x/blank y/Car4]] - Somebody owns this red car

>  [[x/blank y/blank][x/blank y/blank]] - at least 2 different people own
>      red cars (warning, this may lead to infinite answers)
>  [[x/Ian y/Car4][x/Ian y/Car5][x/blank y/blank]] - Ian owns at least
>      two red cars, and someone else who is not Ian also owns a red car 

The notion of a binding to <blank> seems like a useful one.  However, it
also seems to need refinement.  Problems that I see involve the semantic
relationships among multiple occurrences of <blank> in a query answer
(e.g., in [[x/blank y/blank]], do the bindings to x and y necessarily
denote different objects?), semantic relationships between <blank> in
one query answer and bindings to both constants and to <blank> in other
answers to the same query (e.g., does the set of answers [[x/blank
y/blank][x/blank y/blank]] mean what the comment above partially
suggests that at least 2 different people own 2 different red cars"? and
does the set of answers [[x/Ian y/Car4][x/Ian y/Car5][x/blank y/blank]]
mean as the comment above suggests that ian owns at least two read cars
and someone else who is not Ian own yet a third red car?).

And how would the server communicate a conclusion that one unnamed
person owns at least two unnamed red cars?

It seems to me that the use of a single <blank> binding in query answers
is inadequate.  What I would suggest is needed is a vocabulary of blank
bindings, say <blank-1>, <blank-2>, ..., that can be used as bindings in
query answers and that have the semantics of existentially quantified
variables.  Such a vocabulary would be used in our example as follows:

A query answer of [x/Ian, y/blank-1] would mean that the following
sentence is entailed by the query KB:

   (exists b1) (P Ian & RC b1 & O Ian b1)

A query result of [[x/blank-1 y/blank-2]] would mean that the following
sentence is entailed by the query KB:

   (exists b1 b2) (P b1 & RC b2 & O b1 b2)

A query result of [[x/blank-1 y/blank-1]] would mean that the following
sentence is entailed by the query KB:

   (exists b1) (P b1 & RC b1 & O b1 b1)

That answer doesn't make sense in the intended interpretation of our
example query, but I assume it is clear that one could generate an
example query where such an answer would make sense. 

What about a query result of [[x/blank-1 y/blank-2][x/blank-3
y/blank-4]]?  Such a result seems to make sense only when the query asks
for answers each of which is known to denote a distinct set of objects. 
(See my proposal in the attachment regarding "Uniqueness of Answers".) 
If the only requirement in the query is that each set of bindings is
unique (rather than each set of objects denoted by the bindings), then
the server could generate any number of answers by simply using
different blank-i bindings.  If the query does indeed ask for answers
each of which is known to denote a distinct set of objects, then the
query result means that the following sentence is (sentences are?)
entailed by the query KB:

   (exists b1 b2 b3 b4) (P b1 & RC b2 & O b1 b2)
                        (P b3 & RC b4 & O b3 b4)

and that not both "(equivalentTo b1 b3)" and "(equivalentTo b2 b4)" are
entailed by the query KB.

We might decide that results such as [[x/blank-1 y/blank-2][x/blank-3
y/blank-4]] are beyond the scope of our query language.  However, there
seem to be cases that are more compelling, such as [[x/Ian
y/blank-1][x/Ian y/blank-2], meaning Ian owns two red cars, or
[[x/blank-1 y/blank-2][x/blank-1 y/blank-3], meaning someone owns two
red cars.

What do you think of this proposal?

Richard

-----------------------------------

The following is an articulation of open issues and proposals for
specifying and formalizing the information content of a "query-answering
discourse" between two agents in which one agent seeks information from
a second agent by sending a "query" to the second agent.

Refer to the agent sending the query as the "client" and the agent
receiving the query as the "server".  Refer to the response sent by the
server to the client as a "query result" and assume that a query result
may contain one or more "query answers".


Knowledge Base

A query is posed with respect to a knowledge base that is a DAML+OIL
representation of a logical theory.  Thus, a query needs to include a
reference to a DAML+OIL knowledge base.  Refer to that knowledge base as
the "query KB".


Query Premise

We have discussed enabling a query to include a premise that is to be
added to the query KB so that the query is being asked of the query KB
unioned with the premise.  A premise essentially facilitates queries of
the form if-then while still remaining within the expressiveness of
DAML+OIL.

ISSUE: Do we want to enable the inclusion of a premise in a query, and
if so what is the form of a premise?

PROPOSAL: A query may include a premise that is an arbitrary DAML+OIL
knowledge base.


Query Pattern

Assume a query contains a "query pattern" that specifies relationships
among unknown sets of objects in a domain of discourse.  Each unknown
object is represented in the query pattern by a variable.  Answering a
query with variables x1,,xn involves identifying tuples of object
constants such that for any such tuple  <c1,,cn>, if ci is substituted
for xi in the query pattern for i=1,,n, then the resulting "query
pattern instance" specifies a sentence that is entailed by the query
KB.  For each such ci, the pair [xi, ci] is a binding for the
corresponding variable xi.  Each set of such variable bindings specifies
a candidate answer to the query.

ISSUE: What is the expressive power of the query pattern language?  For
example, we might decide that a query pattern instance can specify a
sentence that is a disjunction of a conjunction of RDF statements or is
a negation of a conjunction of RDF sentences, or .

PROPOSAL: Define the query pattern language so that a query pattern
specifies a conjunction of RDF statements containing variables.  A query
pattern instance therefore specifies a collection of RDF statements that
are entailed by the query KB.


Answer Mapping Function

A query may specify a mapping of variable bindings into query answers. 
In particular, a query answer may include or make use of bindings to
only a subset of the variables in the query pattern.

ISSUE:  What is the nature of the answer mapping function language?

PROPOSAL: I think all that matters for our formalization and core design
work is that a query answer may make use of the bindings to only a
subset of the variables in the query pattern.  So, my recommendation is
that for now we consider a query answer to consist of a set of bindings
for a subset of the variables, call them "query variables", and that the
query specifies which variables are in that subset.  I will make that
assumption in the remainder of this document.

ISSUE:  What constants can be in query answer bindings?  In particular,
can a query variable be bound in a query answer to an anonymous node in
the RDF graph?  If so, what is the form and semantics of that binding? 
Also, can a query variable be bound in a query answer to an object that
is entailed in the knowledge base (e.g., by a cardinality constraint)
but whose identity is not known by the server?  If so, what is the form
and semantics of that binding?


Uniqueness of Answers

In general, a query result may contain multiple query answers.

ISSUE: What guarantees do we want to make about the distinctiveness of
multiple query answers in a query result?  We may want to guarantee that
no two answers consist of identical sets of bindings.  That would say
that if A1 and A2 are query answers in a query result, then there exists
a query variable xi for which the binding for xi in A1 is not the same
constant as the binding for xi in A2.  A more difficult guarantee would
be that no two answers consist of equal sets of bindings.  That would
say that if A1 and A2 are query answers in a query result, then there
exists a query variable xi for which the binding for xi in A1 is not
equal to (i.e., does not denote the same object as) the binding for xi
in A2.

PROPOSAL:  We guarantee there are no identical sets of bindings and that
we enable a query to include an indicator as to whether equal sets of
bindings are acceptable.


Number of Answers

I assume our query language needs to enable a query to specify what is
being asked for and our query result language needs to enable a query
result to contain the information requested in a query.  That
specification in a query would include how many query answers are being
requested and what information is being requested about how many query
answers there are (i.e., how many answers are entailed by the KB). 
Also, the query result language needs to enable including in a query
result the information requested about how many query answers there are.

ISSUE: Whether and how to include in a query the number of query answers
being requested.

PROPOSAL: Enable a query to include a "# answers requested" that could
be either a non-negative integer, the constant "All", or the constant
"Enumerator".  If the number of answers requested is an integer n, the
query is a request for as many query answers as the server can deduce up
to n.  If the number of answers requested is "All", the request is for
as many query answers as the server can deduce.  If the number of
answers requested is "Enumerator", the request is for a process handle
(continuation) that can be sent back to the server in a subsequent
message to request that the server produce a "batch" of k query
answers.  Note that if the query indicates that equal sets of bindings
are not acceptable, then "# answers requested" refers to the number of
sets of bindings no two of which are equal.  If the query indicates that
equal sets of bindings are acceptable, then "# answers requested" refers
to the number of distinct sets of bindings.  Some convention is needed
for the case where the number of answers requested is "All" and the
server can deduce an infinite number of answers.  Perhaps the convention
would be that the server returns a process handle in that case.

ISSUE: Whether and how to include in a query what information is being
requested about how many query answers there are and in a query result
the information requested.  

PROPOSAL: Enable a query to include a request for information as to how
many query answers there are ("# answers?") and for a query result to
include a specification of how many query answers are entailed ("#
answers entailed") in the form of an ordered paired denoting a closed
interval.  For example, a result containing the ordered pair [4,0-0] as
the "# answers entailed" means that the server has determined that there
are at least 4 answers to the query.  (I am using "0-0" here to stand
for the "infinity" symbol.)  Note that a value of [0,0-0] for "# answers
entailed" is always true.  When a query includes a request of "#
answers", the server is being asked to deduce whatever it can about how
many answers there are.   Note that a query can specify that "# answers
requested" is zero so that the only information being requested is the
number of answers.  Also, note that a query result could contain a value
for "# answers" even when that information was not requested in the
query.


Justification For Answers

Under the assumption that our query language needs to enable a query to
specify what is being asked for and our query result language needs to
enable a query result to contain the information requested in a query,
the query language needs to enable inclusion in a query a request for a
justification for each query answer and the query result language needs
to enable inclusion of such justifications in a query result.

ISSUE: Whether and how to include provisions for justifications to be
requested in a query and be included in a query result.  What kind of
justification language(s) do we include in our query results language?


Knowledge Base Structure Queries

There seems to be a clear need to ask queries about the knowledge base
itself as an artifact.  The most compelling example involves determining
the "direct" subclasses of a class or subproperties of a property.

ISSUE:  Whether and how to include such queries in our query language
and query results language.  The primary issue seems to be to what
extent we allow intermingling of "structural queries" with "entailment
queries".  That is, we probably don't want to allow a request for direct
subclasses to appear anywhere in a query pattern that an RDF statement
(with variables) could appear.


This archive was generated by hypermail 2.1.4 : 04/02/02 EST