RDF query languages

From: Frank van Harmelen ([email protected])
Date: 09/18/01


During the last teleconf, we discussed extensively the point that any query language for DAML+OIL should take into account work on query languages for RDF Schema. 

For more or less the same reasons, we should also take notice of query languages for RDF proper (ie not implementing any of the RDF Schema semantics. 

Just yesterday, the rdf-interest list contained an announcement of an implementation of the Squish query language for the Jena RDF API by the folks from HP labs:

http://lists.w3.org/Archives/Public/www-rdf-interest/2001Sep/0048.html

See that msg for more details on the query language. 

Below follow some notes from a PhD student of mine who had a quick look at the relationship between this RDF query language and RQL (which is as you know an RDF Schema query language). 

Food for discussion tonight?

Frank.
   ----


RDQL vs. RQL

RQL:  http://sesame.aidministrator.nl/
RDQL: http://www-uk.hpl.hp.com/people/afs/RDQL/

1. Query clauses compared
-------------------------
While both languages are loosely based on the familiar
SELECT-(FROM)-WHERE syntax from SQL/OQL, RQL and RDQL have different
views on what clause specifies what.

- RQL uses three clauses: SELECT-FROM-WHERE:
    - SELECT clause (required): a projection over the bound variables, thus
      formatting the order and size of the result set.
    - FROM clause (required): a specification of the relevant part of the graph
      model being queried, via the use of (regular) path expression.
      Variable binding takes place in this clause.
    - WHERE clause (optional): specification of additional boolean constraints
      (such as string comparison or cardinal binary operators) on
      variables previously bound in the FROM clause.

- RDQL uses five clauses: SELECT-SOURCE-WHERE-AND-USING:
    - SELECT clause (required): a projection over the bound variables, thus
      formatting the order and size of the result set.
    - SOURCE clause (optional): a specification of a source URI for
      identifying the model that is to be queried. RQL has
      no equivalent, it rather assumes that the query is
      being sent to a specific repository/model.
    - WHERE clause (required): specification of which variables to be bound by
      means of triple template matching. This roughly corresponds to the
      FROM clause in RQL.
    - AND clause (optional): specification of boolean constraints
      on previously bound variables. This corresponds to the WHERE
      clause in RQL.
    - USING clause (optional): spefication of namespace
      prefix/identifier pairs. RQL currently has no equivalent for this.

2. RDF querying vs. RDF Schema querying
---------------------------------------

RQL views the RDF model/schema as a set of superimposed graphs and
offers native support for RDF Schema constructs (typing of variables,
class and property subsumption, domain and range restrictions, etc).
RDQL views the RDF model as a set of statements: it strictly adheres to
the RDF model and only understands triples.

For strictly querying RDF, RQL and RDQL offer about equal expressivity.
But when RDF Schema information is being queried, RQL is at an
advantage. For example, even in this relatively simple query:

Q1: "give me all resources of type Painter that have a first_name
     property with the value `Pablo'"

(note: in these examples I'm being rather sloppy with URIs and
       namespaces deliberately, to improve human readability. The
       essence of the query is correct).

RQL:
    select X
    from Painter{X}.first_name{Y}
    where Y like "Pablo"

In RDQL, because of the transitivity of subsumption relations between
classes, this query is not even fully expressible, because it may be
that Painter has an arbitrary number of subclasses that have painter
resources assigned to them. Upwards inheritance of instances in RQL
makes sure these resources are retrieved, but RDQL does not have this
support. The best RDQL can do is assume all Painters are explicitly made
a member of that class, or the query composer can explicitly query a
fixed number of subclasses (in this example, we go one subclass down):

RDQL:
    SELECT ?x
    WHERE (?x rdf:type ?t1), (?t2 rdfs:subClassOf Painter), (?x name ?y)
    AND ( ?t1 eq Painter || (?t2 eq ?t1) ) && (?y eq "Pablo")


3. Implementational aspects
---------------------------

The query engine for RQL as offered by the Sesame system is built on the
premise that result sets for queries can be arbitrarily large. To this
end, the API on which the query engine operates has been designed to
allow streaming evaluation of queries: the query engine breaks the
original RQL query down in elemental queries on the API, that in turn
translates these in queries to the underlying repository. Of each of
these subqueries, the result set is fed back in a streaming fashion
using iterators, thus minimizing memory load.

It is not clear to me whether Jena and thus RDQL offer the same type of
functionality.

-- 
                               Vrije Universiteit, Faculty of Sciences
Jeen Broekstra              Division of Mathematics & Computer Science
[email protected]                                    de Boelelaan 1081a
http://www.cs.vu.nl/~jbroeks        1081 HV Amsterdam, the Netherlands


This archive was generated by hypermail 2.1.4 : 04/02/02 EST