Package com.hp.hpl.mesa.rdf.jena.rdb

A general relational database backend for persistent storage of jena models.

See:
          Description

Interface Summary
HasRDBID Signature of classes that cache access to RDB identifiers.
ICache Interface signature for cache implementations.
IConstraints Interface for objects which apply a set of additional constraints to a listStatements operation.
IDBConnection Encapsulate the specification of a jdbc connection, mostly used to simplify the calling pattern for ModelRDB factory methods.
IDBID Interface for database identifiers.
IRDBDriver Generic database interface used for implementing RDF Stores.
 

Class Summary
ConstraintsGeneric Implemention of the IConstraints interface used to specify search constraints on Jena statements.
DBConnection Encapsulate the specification of a jdbc connection, mostly used to simplify the calling pattern for ModelRDB factory methods.
DBIDHash Interface for database identifiers.
DBIDInt Interface for database identifiers.
DriverGenericAttribute Adaption of the base generic layout driver to use separate tables for specific predicates (called attributes).
DriverGenericGeneric Base database driver for implementing ModelRDB and StoreRDB.
DriverGenericGenericProc Adaption of the base generic layout driver to support databases with stored procedures that do the id allocation and duplication checking inline with the actual insert (for each of Resource, Namespace, Literal and Statement).
DriverGenericHash Adaption of the base generic layout driver to use unique hashes of as the index terms instead of relying on database sequences.
DriverGenericMMGeneric Adaption of the base generic layout driver to support multiple models in one database.
DriverGenericMMGenericProc Adaption of the base generic layout driver to support databases with stored procedures that do the id allocation and duplication checking inline with the actual insert (for each of Resource, Namespace, Literal and Statement).
DriverGenericMMHash Adaption of the base generic layout driver to support multiple models in one database.
DriverGenericProc Adaption of the base generic layout driver to support databases with stored procedures that do the id allocation and duplication checking inline with the actual insert (for each of Resource, Namespace, Literal and Statement).
DriverInterbaseHash Adaption of the base generic layout driver to the limitations of InterBase SQL.
DriverInterbaseMMHash Adaption of the base generic layout driver to the limitations of InterBase SQL.
DriverOracleMMGeneric Customize the MMGeneric driver for use with Oracle.
ModelRDB This implementation of the Model interface uses a relational database to hold the model statements.
PropertyImplRDB A variation on the default Property implementation that adds a unique database ID field.
ResourceImplRDB A variation on the default Resource implementation that adds a unique database ID field.
ResultSetIterator Iterates over an SQL result set returning each row as an ArrayList of objects.
ResultSetResourceIterator Version of ResultSetIterator that extracts database rows as resources, assuming that the SQL returns rows of form [id, localname, namespaceid].
ResultSetStatementIterator Version of ResultSetIterator that extracts database rows as statments assuming that the SQL returns rows of form:
ResultSetStringIterator Version of ResultSetIterator that extracts database rows as single strings.
SimpleCache Trivial implementation of the generic cache interface used to cache literals and resources.
SQLCache Stores a set of sql statements loaded from a resource file.
StatementImplRDB A variation on the default Statement implementation supports reference to a reified version via an RDB resource ID.
StoreRDB Generic store implementation for RDB backed RDF storage.
TestDBConnection  
TestGenericDriver Unit tests for the generic database driver.
TestJenaRegression Run jena regression tests on an RDB-backed model.
TestRDB Overall test harness for running all currently using rdb unit tests.
TestStoreRDB Test harness for StoreRDB and supporting classes.
 

Exception Summary
RDFRDBException Used to signal most errors with RDB access.
 

Package com.hp.hpl.mesa.rdf.jena.rdb Description

A general relational database backend for persistent storage of jena models.

Overview

The jena/rdb module provides an implementation of the jena model interface which stores the RDF statement information in a relational database. The implementation can support a variety of database table layouts and can customize the SQL code to cope with the vagaries of different database implementations.

Creating and accessing database instances

Database-backed RDF models are instances of the class jena.rdb.ModelRDB. As well as implementing the full jena.model.Model interface the static methods on ModelRDB provide means to create, extend and reopen database instances.

First consider the situation where we have an available database but as yet it has no RDF models stored in it and we want to format it for holding RDF statements. In that case we would use:

    DBConnection dbcon = new DBConnection(DATABASE_URI, user, password);
    ModelRDB model = ModelRDB.create(dbcon, LAYOUT_STYLE, DATABASE_TYPE);
The DBConnection class provides different methods for specifying the underlying database. In particular it can be specified, as in the example above, as a jdbc uri (e.g. jdbc:interbase:\\localhost:\databases\test.gdb) along with any required user name and password. Alternatively, the database connection can be opened using the standard jdbc calls and the resulting jdbc Connection object can be wrapped up as a DBConnection for passing on the ModelRDB.create.

The ModelRDB.create call takes two arguments in addition to the database connection itself. Firstly, the LAYOUT_STYLE is a string defining the type of database table structure to be used. Typical values for this include:
Generic General layout, all statements are stored in a single table. Resources and literals are indexed using integer id's generated by database sequence generators.
Generic Variant on the generic layout that uses stored procedures for all model updates, this can have a 30-50% performance advantage in some cases.
MMGeneric Similar layout to "Generic" but can support more than one jena model in a single database.
Hash Similar layout to "Generic" but uses MD5 hashes to generate the id's for resources and literals - this avoids relying on the database generators and is more portable and very similar performance.
MMHash Similar layout to "Hash" but can support more than one jena model in a single database.

The second argument DATABASE_TYPE is a string defining the type of the database. Whilst, jdbc offers good database independence most SQL code remains database-dependent - for example sequence generators, stored procedures and limitations on table indexes all vary across databases. The jena RDB modules cope with this by allowing implementors to customise the SQL code to suit the database server to be used. If using a portable layout such as "Generic" or "Hash" then the DATABASE_TYPE of "Generic" may work otherwise use a specific database name here. The distribution includes configuration files for "interbase", "mysql" and "postgresql". Others can be created.

The call to ModelRDB.create will create the appropriate database tables and record within the database a note of the layout chosen. This means that a previously created database can be reopened using:

    DBConnection dbcon = new DBConnection(DATABASE_URI, user, password);
    ModelRDB model = ModelRDB.open(dbcon);
Note that no layout of database information is needed this time - it is retrieved from the pre-formatted database.

Multiple models per database

Some database formats only support one jena model per database. Other layouts can support multiple models with a single database - these have slightly lower performance but can be more convenient. Thus if dbConnection is a connection to an already formated databasewhose layout supports multiple models then the call and:

    ModelRDB model = model.createModel(dbConnection, modelName);
will create an additional model within the same database. The modelName can be used to reopen the same model in the future using:
    ModelRDB model = model.open(dbConnection, modelName);
and
    Iterator it = ModelRDB.listModels(dbConnection);
will list the name of all the modesl stored in the database.

Constraints

The ModelRDB interface supports all the standard jena facilities for navigating the model. This allows us to, for example, find all statements with a given pattern of subject, property and object values. If we wish to performance partial matching on object literal values (e.g. finding all statements whose literal object value starts with "foo" or is an integer in the range [2,8), say) then we have to use the Selector mechanism. Unfortunately in this case all candidate statements with matching subject and property values will be retrieved and then filtered by the supplied Selector.test() code.

The RDB package allows us to use the underlying database implementation by providing an alternative mechanism for listing statements - that of constraints. For example,

    IConstraints constraints = modelrdb.createConstraints();
    constraints.addSubjectConstraint(foo)
               .addPropertyConstraint(prop);
    Iterator statements = modelrdb.listStatements(constraints);
will return an iterator overall statements in the model with subject foo and property prop. More interestingly the code:
    IConstraints constraints = modelrdb.createConstraints();
    constraints.addSubjectConstraint(foo)
               .addPropertyConstraint(prop)
               .addStringConstraint("NOT LIKES", "%bar%");
    Iterator statements = modelrdb.listStatements(constraints);
will list just that subset of the above statements whose object value is a literal string which does not contain the substring "bar". The first argument of the addStringConstraint call can be any standard SQL string match operation.

As well as string matching there is some experimental support for integer-valued literals. When and if jena is extended to support true typed literals a fuller match constraint mechanism might be possible. In the meantime, to support the common case of integer literals we note any literal in the database which could be interpreted as an integer. In this way we can support code such as:
    IConstraints constraints = modelrdb.createConstraints();
    constraints.addSubjectConstraint(foo)
               .addIntConstraint("<=", 42)
               .addIntConstraint(">", 4);
    Iterator statements = modelrdb.listStatements(constraints);
Note that in all these cases the constraints object can be reused which may avoid the overhead of generating and parsing the required SQL code (depending on the nature of the jdbc driver in use).

Implementing additional drivers

When a model is created with a given layout style (say Layout) and database type (say Dbtype) then the implemention attempts to find a class called DriverLayoutDbtype of type IRDBDriver which implements all the required storage operations. In this way additional layouts and database types can be supported by extending the existing implementations. Most such implementations would extend DriverGenericGeneric directly or perhaps the slighly more specialized DriverGenericMMGeneric for multimodel layouts.

The existing implementations gain extra modularity by moving most the raw SQL code out into a separate file. Thus tailoring an implementation to a new SQL dialect often just means generating a new SQL defintion file and the java driver class itself need only point to the driver file (see DriverInterbaseGeneric for example). The existing implementations store these SQL definition files in the classpath in a subdirectory called etc. See the javadoc for the SQLCache class for information on the format of these SQL defintion files and see the included code for example usage.

Notes

Some databases don't properly map java strings onto database strings. In particular, a Postgresql database must be created with "ENCODING = unicode" to cope with multibyte strings. If you create literals with multibyte characters and store them in a default SQL_ASCII postgresql database then some strange behaviour can result - statements involving that literal may be hard to list or delete correctly.

The classes Test* in this package are just used for unit testing. They are polluting the package namespace instead of being hidden away in a separate test package because some of them access protected methods which are not accessible outside of the package.

Postgresql



Copyright © 2001 Hewlett-Packard. All Rights Reserved.