Review

There have been many papers written about federated and heterogeneous databases. These papers propose a plethora of ideas for schema integration, query optimization and transaction processing. But what is possible today (1999)? The Garlic project at IBM shows us a working system that is one answer to this question. I like this system because it seems clean and practical.

The Garlic query language is an extension of SQL with support for path expressions, nested collections and methods. (So the query language is similar to what is now offered by IBM, Oracle and Informix "object-relational" systems.) Wrapper writers must model the contents of their repository (source systems are called "repositories" in this paper) as Garlic objects with attributes and methods. The wrapper must declare what predicates and projections its repository can handle. Wrappers must assign unique identifiers ("keys") to objects. As an example, if the repository is a relational database, the object types are called "interfaces" and correspond to tables. The attributes are columns and the primary key becomes the key part of the Garlic OID. Methods include "get" which obtains attribute values, given an OID, and "set" which changes values.

The Garlic optimizer identifies the largest query fragment involving only one repository and sends this to the wrapper and the wrapper sends back a plan for all or part of the query fragment, whatever it agrees to handle. Wrappers model as little or as much of the repository's capabilities as make sense. As a worst case, the repository can return only the OIDs of all the objects in a collection and the Garlic query engine does all the processing. As another example, the Garlic optimizer might request that the wrapper provide a plan for a join of two tables. If the repository can't or won't join two of its own tables, the Garlic optimizer must perform the join. Not all repositories are required to have the same capabilities.

The Garlic project has been developing tools to make it easier for developers to write wrappers. Wrappers have been written for DB2, Oracle, web site databases, chemistry molecules, image databases, text databases and Lotus Notes.

This is ongoing work, but a reading of this 1997 paper will give a good introduction to this project. Keep an eye out for further reports from Garlic.


a service of  Schloss Dagstuhl - Leibniz Center for Informatics