ON DQL AND RDF

From: Brandon Amundson (bamundson@bbn.com)
Date: 11/02/01


Message-Id: <200111021626.SAA06577@athena.ics.forth.gr>
From: Vassilis Christophides <christop@ics.forth.gr>
To: mailto:joint-committee@daml.org
CC: dp@ics.forth.gr, christop@ics.forth.gr
Subject: ON DQL AND RDF
Reply-to: christop@ics.forth.gr (Vassilis Christophides)

Dear All

Comments on Richard Fikes' comparison of RQL with DQL0.1

1. From the short description provided for DQL 0.1, it seems that
its expressive power leaves a lot to be desired from a language for
querying knowledge bases: only existential quantification is supported,
(safe) negation is not supported and disjunction is expressible only
through the implicit existential quantification.

>An instance of class Query represents a question posed to a reasoner.  A
>query instance consists of two parts: a query premise and a query
>pattern.  A query premise is a DAML+OIL KB that is effectively asserted
>to the queried KB for the duration of the query.  It is to contain
>assumptions particular to the current query.  The query premise can be
>empty to indicate the absence of such assumptions.  A query pattern is
>the question itself.  It is in effect a conjunction of one or more
>triples.  Each triple corresponds to an RDF Statement except that its
>predicate, subject, and/or object can be a variable.  Variables present
>in a query pattern, if any, are implicitly quantified existentially at
>the beginning of the pattern.  Syntactically, a query pattern is in xml
>markup.

2. Regarding the BNF definition of RQL and the comment about RQL
beeing too liberal or too unconstrained to be useful, one must keep in
mind that functional composition is NOT unconstrained due to RDF/RQL
typing. Also, something that may not be apparent from RQL's BNF
definition is that RQL has its roots in OQL and the ODMG-93
standard. RQL has a formal model as well as a well-specified set of
constraints and typing that define precisely valid compositions of RQL
queries (e.g., grouping using nested queries on the select clause).
Note that the 98% (no universal quantifiers) of RQL functionality has
been implemented in our Institution and a Web interface to ICS-FORTH
RQL Interpreter is available at: http://139.91.183.30:8999/RQLdemo/.
In this demo you can execute queries of varying complexity that are
required by the Semantic Web applications we are involved in.

>The full BNF for RQL, as given at
>http://139.91.183.30:9090/RDF/RQL/bnf.html, seems far too unconstrained
>to be useful, and I cannot determine how much of the full language is
>being implemented in the systems under development.   Given that caveat,
>................
>

3. Absence of "query premises" in RQL: Obviously the goals of a language
like RQL and DQL 0.1 are different.  DQL would probably be very well-suited
for some types of reasoning tasks (such as hypothetical reasoning), but what
is new here? Aren't description logics any good for that?

> * Query Premise: a DAML+OIL KB that is effectively asserted to the
>queried KB for the duration of the query.  It is to contain assumptions
>particular to the current query.  There is nothing corresponding to a
>query premise in RQL.  A query premise seems important in that it allows
>a query to hypothesize an object (e.g., "if Foo is a Person with two
>male siblings ?") and then ask questions about that hypothesized object.

4. Abscence of nesting in DQL: even if Boolean constraints are included
in a DQL query pattern, nested expressions are excluded, and so will be
aggregates.

>There is nothing in DQL 0.1 corresponding to the RQL Select clause.  A
>binding for each query variable is included in an answer and the
>bindings are in an unspecified order.  Adding a Select clause to DQL
>that, for example, specifies a pattern in the form of an s-expression
>containing some or all of the query variables so that each answer is an
>instance of that pattern, appears to be a nonproblematic modular
>addition.

5. One can argue for the usefulness of being able to distinguish
"proper" instances or "direct" subclasses, but what seems to be more
important is the ability to express in a closed query form transitive
properties. BTW what you mean about non-monotonic queries?

>A significant difference between RQL and DQL 0.1 is the difference
>between the expressive power of DQL's Query Pattern and RQL's From
>clause.  DQL allows only conjunctions of RDF statements, whereas RQL
>allows disjunctions and negations in addition to conjunctions.  In
>addition, RQL's From clause enables asking non-monotonic queries whose
>answers depend on the explicit sentences in the source knowledge base.
>In particular, one can ask for the "Proper" instances of a class C,
>meaning instances that are not also instances of any subclass of C, and
>the "Direct" subclasses of a class C, meaning those subclasses that are
>not also subclasses of any known subclass of C.  Analogously, one can
>also ask for the "Proper" values of a property at a subject and the
>"Direct" subproperties of a property.  My opinion is that we do not want
>to include these notions of "Proper" and "Direct" in DQL, but that we
>may want to expand the query pattern to include specification of
>disjunctions and negations of RDF statements.

6. Last but not least, the Sesame implementation is not the only RQL
implementation available. Sesame actually implements a subset of RQL
(no nested queries, set operations and typing).  RDFSuite is a suite
of tools for RDF metadata management including VRP (validating RDF
parser), RSSDB (a performant and scalable RDF store) and an RQL
interpreter. Online demo and more information (including papers on RQL
and RDFSuite (model, typing system, bnf, performance comparisons
etc.)) can be accessed at: http://139.91.183.30:9090/RDF.

Best Regards

Vassilis Christophides
and
Dimitris Plexousakis

PS: For a comparison between RQL and XQuery see the slides of our
presentation in the NSF-EU Workshop on the Semantic Web
(http://barbara.inrialpes.fr/swsw/slides/christophides/)

- -------------------------------------------------------------------
From: Richard Fikes (fikes@ksl.stanford.edu)
Date: 09/24/01

 Next message: Richard Fikes: "Information Exchanged During Query-Answering"

    Previous message: Peter F. Patel-Schneider: "url for w3c validation"
    Next in thread: Peter F. Patel-Schneider: "Re: DAML+OIL Query Language
and RQL"
    Reply: Peter F. Patel-Schneider: "Re: DAML+OIL Query Language and RQL"
    Reply: Frank van Harmelen: "Re: DAML+OIL Query Language and RQL"
    Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
    Mail actions: [ respond to this message ] [ mail a new topic ]



I agreed to provide to the committee a more easily understandable
description of the straw man proposal for representing queries to
DAML+OIL knowledge bases and the results of such queries, and to compare
the proposal to RQL.  Doing a detailed and complete comparison with RQL
would take substantial time, and I am not convinced such a comparison
would be worthwhile.  What Yulin Li (the graduate student who is working
with me on DAML+OIL query answering) and I have done in this message is
to provide a summary comparison with RQL and to make what will hopefully
be sufficient additional commentary on the straw man proposal to enable
the committee to move forward.

First of all, let's say that we are designing DQL, a DAML+OIL Query
Language, and that the straw man proposal is version 0.1 of DQL.

The text at the beginning of my initial message on DQL 0.1 was intended
to be an informal description of the language.  I don't think I can do
much better than that and so will include it again here:

- -----------------------------------

An instance of class Query represents a question posed to a reasoner.  A
query instance consists of two parts: a query premise and a query
pattern.  A query premise is a DAML+OIL KB that is effectively asserted
to the queried KB for the duration of the query.  It is to contain
assumptions particular to the current query.  The query premise can be
empty to indicate the absence of such assumptions.  A query pattern is
the question itself.  It is in effect a conjunction of one or more
triples.  Each triple corresponds to an RDF Statement except that its
predicate, subject, and/or object can be a variable.  Variables present
in a query pattern, if any, are implicitly quantified existentially at
the beginning of the pattern.  Syntactically, a query pattern is in xml
markup.

An answer to a query specifies an instance of the query pattern all of
those RDF statements are entailed by the KB being queried conjoined with
the query premise KB.  An instance of class QueryAnswer represents one
answer to a query.  A query answer instance consists of two principle
parts: the query posed and a set of bindings to the query variables
representing an instantiation of those query variables.

- -----------------------------------

To compare DQL 0.1 with RQL, consider the following summary description
of RQL:

The full BNF for RQL, as given at
http://139.91.183.30:9090/RDF/RQL/bnf.html, seems far too unconstrained
to be useful, and I cannot determine how much of the full language is
being implemented in the systems under development.   Given that caveat,
we can consider an RQL query to consist of three clauses: Select, From,
and Where, as follows:

 * Select Clause (required): an ordered list of query variables or
functions of a query variable the values of which are to be included in
a query answer in the order given.

 * From Clause (required): a specification of a collection of RDF
statements that constitute a candidate query answer.  The specification
contains query variables.  For each candidate answer, each query
variable is bound to the predicate, subject, or object of an RDF
statement in the collection of statements that constitutes the candidate
answer.

 * Where Clause (optional): a specification of additional boolean
constraints on variables previously bound in the From clause.  The
constraints can use the following operators: "<", "<=", "=", ">=", ">",
"!=" (meaning "not equal"), and "like" (a comparison operator on
strings).

Each RQL query answer is in effect a tuple of variable bindings, as
specified in the Select clause.

A tutorial introduction to RQL can be linked to from
http://sesame.aidministrator.nl/.

We can compare DQL 0.1 with the summary description of RQL given above
as follows.  A DQL 0.1 query consists of two parts, a Query Premise and
a Query Pattern:

 * Query Premise: a DAML+OIL KB that is effectively asserted to the
queried KB for the duration of the query.  It is to contain assumptions
particular to the current query.  There is nothing corresponding to a
query premise in RQL.  A query premise seems important in that it allows
a query to hypothesize an object (e.g., "if Foo is a Person with two
male siblings ?") and then ask questions about that hypothesized object.

 * Query Pattern: the Query Pattern corresponds to the From clause in
RQL.  It is a specification of a conjunction of RDF statements in the
form of a collection of triples, each of which corresponds to an RDF
statement except that its predicate, subject, and/or object can be a
variable.

There is nothing in DQL 0.1 corresponding to the RQL Select clause.  A
binding for each query variable is included in an answer and the
bindings are in an unspecified order.  Adding a Select clause to DQL
that, for example, specifies a pattern in the form of an s-expression
containing some or all of the query variables so that each answer is an
instance of that pattern, appears to be a nonproblematic modular
addition.

There is nothing in DQL 0.1 corresponding to the RQL Where clause.  All
bindings for the query variables produced from the Query Pattern are
considered to be results.  Since DAML+OIL now includes datatype
properties, the Boolean constraints on variables stated in an RQL Where
clause can be included in the DQL 0.1 Query Pattern (except for RQL's
"like" string comparison operator).  Therefore, a Where clause may not
be needed in DQL.

A significant difference between RQL and DQL 0.1 is the difference
between the expressive power of DQL's Query Pattern and RQL's From
clause.  DQL allows only conjunctions of RDF statements, whereas RQL
allows disjunctions and negations in addition to conjunctions.  In
addition, RQL's From clause enables asking non-monotonic queries whose
answers depend on the explicit sentences in the source knowledge base.
In particular, one can ask for the "Proper" instances of a class C,
meaning instances that are not also instances of any subclass of C, and
the "Direct" subclasses of a class C, meaning those subclasses that are
not also subclasses of any known subclass of C.  Analogously, one can
also ask for the "Proper" values of a property at a subject and the
"Direct" subproperties of a property.  My opinion is that we do not want
to include these notions of "Proper" and "Direct" in DQL, but that we
may want to expand the query pattern to include specification of
disjunctions and negations of RDF statements.

Hope this helps.

Richard
(with substantial contributions from Yulin Li)
- -----------------------------------------------------------------------
From: Frank van Harmelen (Frank.van.Harmelen@cs.vu.nl)
Date: 09/25/01

 Next message: Frank van Harmelen: "Re: Information Exchanged During
Query-Answering"

    Previous message: Peter F. Patel-Schneider: "Re: DAML+OIL Query Language
and RQL"
    In reply to: Richard Fikes: "DAML+OIL Query Language and RQL"
    Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
    Mail actions: [ respond to this message ] [ mail a new topic ]



Richard and Yulin, thanks for your remarks on comparing DQL and
RQL. This is exactly the kind of analysis we'll need more of in the
course of trying to converge the good parts of the various
query-language proposals!

Below is a quick reaction. It is the result of a (quick discussion by
email with Jeen Broekstra and Arjohn Kampman, the guys behind Sesame,
and therefore of necessity RQL experts.

1. We have the same doubt about the usefulness of DAML+OIL being the
   syntax for its own query language. But this point has been argued
   before, so we'll leave it at this.

2. We agree with you that the ICS-FORTH grammar for RQL is too
   loose. The grammar can be significantly tightened without loosing
   much expressiveness. That's just a matter of language engineering,
   there no deep issues here.

3. We think your description of the RQL "from" clause is a bit too
   ungenerous. You wrote:

    "A specification of a collection of RDF statements that constitute
     a candidate query answer."

It would be more appropriate to say that the "from"-clause is a
regular path expression through the RDF qraph, taking into account the
semantics of the RDF Schema primitives. We're still somewhat unclear
as to if all of that can be done in DQL. For example:

    Q1: "Return all resources of type Publication with author Frank."

    RQL:
        select R
        from   Publication{R}.author{N}
        where  N = "Frank"

QUESTION: what would this look like in DQL?

3. The notion of a Query Premisse is a difference. The meaning is
   clear, but we are somewhat unclear to the practical usage of this
   idea.

QUESTION: Can you give us some examples of where that would be useful?

4. You mention the possibility in RQL to ask for the direct
   descendants of a class/property (subClassOf^) instead of any
   descendant (subClassOf). You call this "nonmonotonic". The operator
   does not really add anything interesting to the language. It could
   be rephrased as a more complicated query without the "^" operator,
   because RQL contains negation of queries. So the issue is really
   negation of queries (and their interpretation), not the "^"
   operator.

We've worked on some applications where the notion of "direct
descendant" was crucial, for example semantic navigation through
web-sites, where you really wanted to know the most closely related
classes, not just all related classes. Another application was
query-refinement, where again you wanted to know the smallest possible
was to relax/narrow a query, not just all.

QUESTION: would you agree that such an operator (or any other way to
obtain the same effect) is crucial in practical DAML+OIL use?

5. You speak about "RQL allowing disjunctions and negations of RDF
statements".  We don't understand what you mean. It is true (and
useful) that RQL allows disjunctions and negations of >*queries*< (and
it is this that reduces "^" to syntactic sugar), but that's different
from "disjunctions and negations of RDF statements".

As you see, there is more in your analysis with which we agree then
disagree. Let's hope this leads to a useful integration of features
(you already pointed at possibilities to integrate a select clause in
DQL (which will certainly also help to make the database-folk
happier!)

Frank
   ----
(with significant input from Jeen Broekstra and Arjohn Kampman)
- --------------------------------------------------------------------------
--

From: Frank van Harmelen (Frank.van.Harmelen@cs.vu.nl)
Date: 09/18/01

 Next message: Mike Dean: "Joint Committee telecon today 18 September"

    Previous message: Deborah McGuinness: "w3c submission"
    Next in thread: Mike Dean: "Re: RDF query languages"
    Reply: Mike Dean: "Re: RDF query languages"
    Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
    Mail actions: [ respond to this message ] [ mail a new topic ]



During the last teleconf, we discussed extensively the point that any
query language for DAML+OIL should take into account work on query
languages for RDF Schema.

For more or less the same reasons, we should also take notice of query
languages for RDF proper (ie not implementing any of the RDF Schema
semantics.

Just yesterday, the rdf-interest list contained an announcement of an
implementation of the Squish query language for the Jena RDF API by
the folks from HP labs:

http://lists.w3.org/Archives/Public/www-rdf-interest/2001Sep/0048.html

See that msg for more details on the query language.

Below follow some notes from a PhD student of mine who had a quick
look at the relationship between this RDF query language and RQL
(which is as you know an RDF Schema query language).

Food for discussion tonight?

Frank.
   ----

RDQL vs. RQL

RQL:  http://sesame.aidministrator.nl/
RDQL: http://www-uk.hpl.hp.com/people/afs/RDQL/

1. Query clauses compared
- -------------------------
While both languages are loosely based on the familiar
SELECT-(FROM)-WHERE syntax from SQL/OQL, RQL and RDQL have different
views on what clause specifies what.

- - RQL uses three clauses: SELECT-FROM-WHERE:
    - SELECT clause (required): a projection over the bound variables, thus
      formatting the order and size of the result set.
    - FROM clause (required): a specification of the relevant part of the
graph
      model being queried, via the use of (regular) path expression.
      Variable binding takes place in this clause.
    - WHERE clause (optional): specification of additional boolean
constraints
      (such as string comparison or cardinal binary operators) on
      variables previously bound in the FROM clause.

- - RDQL uses five clauses: SELECT-SOURCE-WHERE-AND-USING:
    - SELECT clause (required): a projection over the bound variables, thus
      formatting the order and size of the result set.
    - SOURCE clause (optional): a specification of a source URI for
      identifying the model that is to be queried. RQL has
      no equivalent, it rather assumes that the query is
      being sent to a specific repository/model.
    - WHERE clause (required): specification of which variables to be bound
by
      means of triple template matching. This roughly corresponds to the
      FROM clause in RQL.
    - AND clause (optional): specification of boolean constraints
      on previously bound variables. This corresponds to the WHERE
      clause in RQL.
    - USING clause (optional): spefication of namespace
      prefix/identifier pairs. RQL currently has no equivalent for this.

2. RDF querying vs. RDF Schema querying
- ---------------------------------------

RQL views the RDF model/schema as a set of superimposed graphs and
offers native support for RDF Schema constructs (typing of variables,
class and property subsumption, domain and range restrictions, etc).
RDQL views the RDF model as a set of statements: it strictly adheres to
the RDF model and only understands triples.

For strictly querying RDF, RQL and RDQL offer about equal expressivity.
But when RDF Schema information is being queried, RQL is at an
advantage. For example, even in this relatively simple query:

Q1: "give me all resources of type Painter that have a first_name
     property with the value `Pablo'"

(note: in these examples I'm being rather sloppy with URIs and
       namespaces deliberately, to improve human readability. The
       essence of the query is correct).

RQL:
    select X
    from Painter{X}.first_name{Y}
    where Y like "Pablo"

In RDQL, because of the transitivity of subsumption relations between
classes, this query is not even fully expressible, because it may be
that Painter has an arbitrary number of subclasses that have painter
resources assigned to them. Upwards inheritance of instances in RQL
makes sure these resources are retrieved, but RDQL does not have this
support. The best RDQL can do is assume all Painters are explicitly made
a member of that class, or the query composer can explicitly query a
fixed number of subclasses (in this example, we go one subclass down):

RDQL:
    SELECT ?x
    WHERE (?x rdf:type ?t1), (?t2 rdfs:subClassOf Painter), (?x name ?y)
    AND ( ?t1 eq Painter || (?t2 eq ?t1) ) && (?y eq "Pablo")


3. Implementational aspects
- ---------------------------

The query engine for RQL as offered by the Sesame system is built on the
premise that result sets for queries can be arbitrarily large. To this
end, the API on which the query engine operates has been designed to
allow streaming evaluation of queries: the query engine breaks the
original RQL query down in elemental queries on the API, that in turn
translates these in queries to the underlying repository. Of each of
these subqueries, the result set is fed back in a streaming fashion
using iterators, thus minimizing memory load.

It is not clear to me whether Jena and thus RDQL offer the same type of
functionality.

- --
                               Vrije Universiteit, Faculty of Sciences
Jeen Broekstra              Division of Mathematics & Computer Science
jbroeks@cs.vu.nl                                    de Boelelaan 1081a
http://www.cs.vu.nl/~jbroeks        1081 HV Amsterdam, the Netherlands
- -------------------------------------------------------------------------
Re: a map to acronym space on query languages & storage devices

From: Frank van Harmelen (Frank.van.Harmelen@cs.vu.nl)
Date: 09/25/01

 Next message: Mike Dean: "Re: a map to acronym space on query languages &
storage devices"

    Previous message: Frank van Harmelen: "Re: Information Exchanged During
Query-Answering"
    In reply to: Frank van Harmelen: "a map to acronym space on query
languages & storage devices"
    Next in thread: Mike Dean: "Re: a map to acronym space on query
languages & storage devices"
    Reply: Mike Dean: "Re: a map to acronym space on query languages &
storage devices"
    Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
    Mail actions: [ respond to this message ] [ mail a new topic ]



After a question from me, Mike pointed out a misconception on my part:
DAMLDB is not a storage device for DAML+OIL, but is an RDF store (not
aimed at supporting any of the DAML+OIL primitives). As Mike
explained: the name reflects the project, not the language.

So I have to redraw my map:

STORAGE DEVICES

- - RDF: There are already quite a number of RDF storage devices out there.
  I will not bother to repeat them, see the RDF resource page at [4]
  (which includes Frodo by Stefan and others),
  plus the recently announced DAML DB by Mike at [6]
- - RDF Schema: I am aware of only one storage device that is tailored
  towards RDF Schema, namely Sesame, at [4]
- - DAML+OIL: no known storage devices yet (or should we count FaCT as
such?).

CONCLUSION: much work on RDF storage, only one (!) attempt at RDF Schema
storage,
no (!!) work yet on DAML+OIL storage (besides FaCT, which predates
DAML+OIL).

The rest of my map remains unchanged, and follows for completeness again.

Frank.
   ----

QUERY LANGUAGES

- - DQL: query-language for DAML+OIL, proposed by Richard Fikes et al in [1]
- - RQL: query-language for RDF Schema, proposed by the people from
Heraklion [2]
- - RDQL: query-language for RDF, from the folks at HP Labs Bristol [3]

Remark 1:

By saying that "X is a query language for Y", I mean that both the
syntax and the semantics of X provide facilities to deal with
modelling primitives from Y. So, in a trivial way, any RDF query
language is also an RDF Schema query language, but only in a trivial
way. Any propoer RDF Schema query language should support (for
instance) querying the subClassOf relation, taking into account its
transitivity.

Remark 2:

Since there are such clear containment relations between the language
RDF, RDF Schema, DAML+OIL, I would very much hope that it will turn
out to be possible to reflect this stacking of languages in the
corresponding query languages.

Remark 3:

In general, the storage devices are independent from a particular
query language, but of course a storage device for language X will
most likely have support for a query language for X. Eg: Sesame stores
RDF Schema, and supports RQL, but support for DQL (or RDQL) could well
be built on top of the same storage device.

Frank.
   ---

[1] DQL proposal: http://www.daml.org/listarchive/joint-committee/0572.html
[2] RQL tutorial: http://sesame.aidministrator.nl/doc/rql-babysteps.html
[3] RDQL home page: http://www-uk.hpl.hp.com/people/afs/RDQL/
[4] RDF storage tools: http://www.ilrt.bris.ac.uk/discovery/rdf/resources/
[5] Sesame home page: http://sesame.aidministrator.nl
[6] DAMLDB page: http://www.daml.org/2001/09/damldb/
- --------------------------------------------------------------------------
----
Re: a map to acronym space on query languages & storage devices

From: Mike Dean (mdean@bbn.com)
Date: 09/25/01

 Next message: Frank van Harmelen: "DAML search/query service by
Teknowledge"

    Previous message: Frank van Harmelen: "Re: a map to acronym space on
query languages & storage devices"
    In reply to: Frank van Harmelen: "Re: a map to acronym space on query
languages & storage devices"
    Next in thread: Frank van Harmelen: "DAML search/query service by
Teknowledge"
    Reply: Frank van Harmelen: "DAML search/query service by Teknowledge"
    Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
    Mail actions: [ respond to this message ] [ mail a new topic ]



Frank,

Another addition is the DAML search/query service that Adam
Pease and his group at Teknowledge Palo Alto have been
developing, which recently became available at [1].  It's a
multi-threaded servlet that uses XSB for execution, so it
presumably includes some inference capabilities at the RDFS
and/or DAML+OIL level.

        Mike

[1] http://plucky.teknowledge.com/daml/damlquery.jsp
- ----------------------------------------------------------------
Frank,

thanks for your summary.
You forgot the TRIPLE homepage, an RDF Query and Transformation language :-)
See: http://www.dfki.uni-kl.de/frodo/triple/

All the best,

         Stefan
------- End of forwarded message -------


This archive was generated by hypermail 2.1.4 : 04/02/02 EST