From: Frank van Harmelen (Frank.van.Harmelen@cs.vu.nl)
Date: 09/18/01
During the last teleconf, we discussed extensively the point that any query language for DAML+OIL should take into account work on query languages for RDF Schema. For more or less the same reasons, we should also take notice of query languages for RDF proper (ie not implementing any of the RDF Schema semantics. Just yesterday, the rdf-interest list contained an announcement of an implementation of the Squish query language for the Jena RDF API by the folks from HP labs: http://lists.w3.org/Archives/Public/www-rdf-interest/2001Sep/0048.html See that msg for more details on the query language. Below follow some notes from a PhD student of mine who had a quick look at the relationship between this RDF query language and RQL (which is as you know an RDF Schema query language). Food for discussion tonight? Frank. ---- RDQL vs. RQL RQL: http://sesame.aidministrator.nl/ RDQL: http://www-uk.hpl.hp.com/people/afs/RDQL/ 1. Query clauses compared ------------------------- While both languages are loosely based on the familiar SELECT-(FROM)-WHERE syntax from SQL/OQL, RQL and RDQL have different views on what clause specifies what. - RQL uses three clauses: SELECT-FROM-WHERE: - SELECT clause (required): a projection over the bound variables, thus formatting the order and size of the result set. - FROM clause (required): a specification of the relevant part of the graph model being queried, via the use of (regular) path expression. Variable binding takes place in this clause. - WHERE clause (optional): specification of additional boolean constraints (such as string comparison or cardinal binary operators) on variables previously bound in the FROM clause. - RDQL uses five clauses: SELECT-SOURCE-WHERE-AND-USING: - SELECT clause (required): a projection over the bound variables, thus formatting the order and size of the result set. - SOURCE clause (optional): a specification of a source URI for identifying the model that is to be queried. RQL has no equivalent, it rather assumes that the query is being sent to a specific repository/model. - WHERE clause (required): specification of which variables to be bound by means of triple template matching. This roughly corresponds to the FROM clause in RQL. - AND clause (optional): specification of boolean constraints on previously bound variables. This corresponds to the WHERE clause in RQL. - USING clause (optional): spefication of namespace prefix/identifier pairs. RQL currently has no equivalent for this. 2. RDF querying vs. RDF Schema querying --------------------------------------- RQL views the RDF model/schema as a set of superimposed graphs and offers native support for RDF Schema constructs (typing of variables, class and property subsumption, domain and range restrictions, etc). RDQL views the RDF model as a set of statements: it strictly adheres to the RDF model and only understands triples. For strictly querying RDF, RQL and RDQL offer about equal expressivity. But when RDF Schema information is being queried, RQL is at an advantage. For example, even in this relatively simple query: Q1: "give me all resources of type Painter that have a first_name property with the value `Pablo'" (note: in these examples I'm being rather sloppy with URIs and namespaces deliberately, to improve human readability. The essence of the query is correct). RQL: select X from Painter{X}.first_name{Y} where Y like "Pablo" In RDQL, because of the transitivity of subsumption relations between classes, this query is not even fully expressible, because it may be that Painter has an arbitrary number of subclasses that have painter resources assigned to them. Upwards inheritance of instances in RQL makes sure these resources are retrieved, but RDQL does not have this support. The best RDQL can do is assume all Painters are explicitly made a member of that class, or the query composer can explicitly query a fixed number of subclasses (in this example, we go one subclass down): RDQL: SELECT ?x WHERE (?x rdf:type ?t1), (?t2 rdfs:subClassOf Painter), (?x name ?y) AND ( ?t1 eq Painter || (?t2 eq ?t1) ) && (?y eq "Pablo") 3. Implementational aspects --------------------------- The query engine for RQL as offered by the Sesame system is built on the premise that result sets for queries can be arbitrarily large. To this end, the API on which the query engine operates has been designed to allow streaming evaluation of queries: the query engine breaks the original RQL query down in elemental queries on the API, that in turn translates these in queries to the underlying repository. Of each of these subqueries, the result set is fed back in a streaming fashion using iterators, thus minimizing memory load. It is not clear to me whether Jena and thus RDQL offer the same type of functionality. -- Vrije Universiteit, Faculty of Sciences Jeen Broekstra Division of Mathematics & Computer Science jbroeks@cs.vu.nl de Boelelaan 1081a http://www.cs.vu.nl/~jbroeks 1081 HV Amsterdam, the Netherlands
This archive was generated by hypermail 2.1.4 : 04/02/02 EST