(DQL) Expressiveness of Query Patterns

From: Richard Fikes (fikes@ksl.stanford.edu)
Date: 11/27/01


Here is a message we may want to discuss in today's telecon.

Richard



QUERY PATTERN

We seem to agree on the following: a query contains a "query pattern"
that specifies relationships among unknown sets of objects in a domain
of discourse.  Each unknown object is represented in the query pattern
by a variable.  Answering a query with variables x1,,xn involves
identifying tuples of term expressions such that for any such tuple 
<c1,,cn>, if ci is substituted for xi in the query pattern for i=1,,n,
then the resulting "query pattern instance" specifies a sentence that is
entailed by the query KB.  For each such ci, the pair [xi, ci] is a
binding for the corresponding variable xi.  Each set of such variable
bindings specifies a candidate answer to the query.

We need to decide on the expressive power of the query pattern language
and on what "term expressions" are acceptable as bindings.

Let's focus here only on issues related to the expressive power of the
query pattern.  I proposed that we define the query pattern language so
that a query pattern specifies a conjunction of RDF statements
containing variables.  A query pattern instance would then specify a
collection of RDF statements that are entailed by the query KB.

I raised the issue of whether DQL is going to support the notion that a
query answer could be some structure produced by a mapping of variable
bindings rather than the bindings themselves.  For example, a query
answer might simply be a list of the term expressions to which the
variables were bound.  That seems like an auxiliary issue that we can
ignore for now.  So, I proposed that at least for now we consider a
query answer to be a set of bindings for a subset of the variables and
that the query specifies which variables are in that subset.  Following
Ian's suggestion, let's call the variables in that subset the
"distinguished variables" and the remaining variables "non-distinguished
variables".

So, my working assumptions regarding the query pattern are as follows:

* A query contains a query pattern that specifies a conjunction of RDF
statements containing variables.

* A query contains a specification of which variables in the query
pattern are the distinguished variables.

* A query answer is a set consisting of bindings for each of the
distinguished variables.

Ian discussed the difficulties of answering queries containing cycles in
the non-distinguished variables and raised the issue of whether we
should restrict the expressiveness of the query pattern language to
prevent such cycles.  I do not think we should impose such a
restriction.  That is, I think we should stick with the straightforward
notion that a query pattern can specify any conjunction of RDF
statements containing either distinguished or non-distinguished
variables in any of the three positions of any of the statements, so
that a query pattern instance can specify any collection of RDF
statements that are entailed by the query KB.

In general, I don't think we should restrict the expressiveness of our
query language based on the capabilities of our reasoning algorithms. 
That is, I don't think we should try to prevent the representation of
queries that we don't currently know how to answer.  Such an expressive
query language would not prevent anyone from producing algorithms and
formal results that apply to a restricted subset of the query language,
or of producing query-answering systems that produce answers to only a
restricted subset of the query language.

Richard


This archive was generated by hypermail 2.1.4 : 04/02/02 EST