An RDF Query Language Based On DAML

Arnold deVos
adv@langdale.com.au

Revision 1.0
Date 2002/02/15

Introduction

This note describes a query language for RDF that has a simple basis but can handle very general queries by incorporating DAML [1] class expressions.

The design arose out of a need for a query language that could be expressed in RDF and be implemented over various simple, conventional (i.e. non-RDF) data sources. This is also the goal of the Object Management Group's CORBA interface for RDF queries [2,3]. The present proposal is the first step in creating a SOAP equivalent for that.

The requirements seem to eliminate many existing systems [4,5,6,7,8], which assume a flexible RDF repository and/or a generalised inference engine. Those systems can always be brought to bear once the RDF is extracted.

While simple, the solution should not be excessively limited. It is desirable to scale up to support more sophisticated data sources. But it is undesirable to invent yet another RDF query language from scratch. It would be better to scale up by incorporating some existing work. This is where DAML comes in. It provides a powerful set definition language that is expressed in RDF.

Simple Starting Point

The basis of the query language is the following form, expressed in RDF. (An informal syntax is used here, see the appendix for a DAML definition of the properties and classes that make up a query.)

	<Query>
		<select rdf:resource="property1"/>
		<select rdf:resource="property2"/>
		...
		<from rdf:resource="class"/>
	</Query>

The result of this query is the set of all statements with a subject of type class and a predicate of property1 or property2 etc.

The form of the query resembles a simple SQL SELECT statement rather than something that belongs in the Knowledge Representation world. This is deliberate: the goal is to build a bridge to non-RDF data sources that are not equipped with reasoners.

Even though it is simple, this query would be sufficient for extracting statements from many data sources.

Form of Results

The result of a query is a set of RDF statements. In a sense, the query is an expression that represents these statements. However, the form of the result statements is left unspecified. For example:

Since no facility is provided for structuring the results, sometimes the user of the results must do some processing to sort and group them. Note that a language that both selects statements and organises them as an XML hierarchy has previously been proposed [9].

Restrictions

In general, the from clause in the query can accept any RDF class. This includes classes constructed with RDF schema statements or, more powerful, DAML class expressions. As a first step, consider a restriction query:

	<Query>
		<select rdf:resource="property1"/>
		<select rdf:resource="property2"/>
		...
		<from>
			<daml:Restriction>
				<daml:onProperty rdf:resource="test-property"/>
				<daml:hasValue>test-value</daml:hasValue>
			</daml:Restriction>
		</from>
	</Query>

The result of this query is the set of statements for property1, property2, and so on, about any subject that has a test-property equal to test-value. The test-value could be a literal or a reference to a resource (an rdf:Description element).

Here is a more complicated restriction using a DAML class expression:

	<Query>
		<select rdf:resource="property1"/>
		<select rdf:resource="property2"/>
		...
		<from>
			<daml:Class>
				<daml:subClassOf rdf:resource="class"/>
				<daml:subClassOf>
					<daml:Restriction>
						<daml:onProperty rdf:resource="test-property1"/>
						<daml:hasValue>test-value1</daml:hasValue>
					</daml:Restriction>
				</daml:subClassOf>
				<daml:subClassOf>
					<daml:Restriction>
						<daml:onProperty rdf:resource="test-property2"/>
						<daml:hasValue>test-value2</daml:hasValue>
					</daml:Restriction>
				</daml:subClassOf>
				...
			</daml:Class>
		</from>
	</Query>

The result of this query is the set of statements for property1, property2, etc. about any subject whose type is class and has a test-property1 equal to test-value1 and test-property2 equal to test-value2, etc.

Joins

The test-value in each of the foregoing queries is explicitly given. But DAML also allows us to compare a property with a class expression. This produces a query that is analogous to a database join. The daml:hasClass term can be used, as follows:

	<Query>
		<select rdf:resource="property1"/>
		<select rdf:resource="property2"/>
		...
		<from>
			<daml:Restriction>
				<daml:onProperty rdf:resource="join-property"/>
				<daml:hasClass>
					<daml:Restriction>
						<daml:onProperty rdf:resource="test-property"/>
						<daml:hasValue>test-value</daml:hasValue>
					</daml:Restriction>
				</daml:hasClass>
			</daml:Restriction>
		</from>
	</Query>

The result instances for this query have at least one join-property whose value belongs to the set of things where test-property equals test-value. It is probably easier to see the effect with an example.

The following query might apply to a model of a department store. It returns the price of every item in the department whose name is "cosmetics":

	<Query>
		<select rdf:resource="#price"/>
		<from>
			<daml:Restriction>
				<daml:onProperty rdf:resource="#department"/>
				<daml:hasClass>
					<daml:Restriction>
						<daml:onProperty rdf:resource="#name"/>
						<daml:hasValue>cosmetics</daml:hasValue>
					</daml:Restriction>
				</daml:hasClass>
			</daml:Restriction>
		</from>
	</Query>

Other DAML Expressions

By allowing any DAML class expression in the from clause, the query language can support queries over Boolean combinations of classes. It is also possible to enumerate the instances to be queried. See the DAML documentation for the full details.

The general form of query that employs these features is:

	<Query>
		<select rdf:resource="property1"/>
		<select rdf:resource="property2"/>
		...
		<from>
			<daml:Class>
				<!-- body of a DAML class expression -->
			</daml:Class>
		</from>
	</Query>

Composite Queries

The foregoing queries all produce homogenous results, that is, the same properties are queried for every instance in the result set.

However, it is often desirable to query different properties from different sets of instances in a combined query. Moreover, it is usually desirable to batch several queries together if the query service is remote and performance is an issue.

The following envelope is defined to combine several queries and shared class expressions:

 	<Request>
		<declare>
			<daml:Class rdf:ID="class-expr1">
				<!-- body of a DAML class expression -->
			</daml:Class>
		</declare>
		...

		<evaluate>
			<Query>
				<!-- body of a Query -->
			</Query>
		</evaluate>
		...

	</Request>

A Request contains a series of declare and evaluate terms. Each declare term introduces a class expression but says nothing about it. It is expected that the nested class will specify an rdf:ID so that it can be referenced elsewhere in a class expression or query.

Each evaluate term contains a query. The result of a Request is the union of the results the individual queries.

Here is an elaboration of the department store example that returns the name of the department manager as well as the item prices:

	<Request>
		<!-- declarations for Departments, name, etc. omitted for brevity -->

		<declare>
			<daml:Class rdf:ID="InterestingDept">
				<daml:subClassOf rdf:resource="#Departments"/>
				<daml:Restriction>
					<daml:onProperty rdf:resource="#name"/>
					<daml:hasValue>cosmetics</daml:hasValue>
				</daml:Restriction>
			</daml:Class>
		</declare>
		<evaluate>
			<Query>
				<select rdf:resource="#manager"/>
				<from rdf:resource="#InterestingDept"/>
			</Query>
		</evaluate>
		<evaluate>
			<Query>
				<select rdf:resource="#price"/>
				<from>
					<daml:Restriction>
						<daml:onProperty rdf:resource="#department"/>
						<daml:hasClass rdf:resource="#InterestingDept"/>
					</daml:Restriction>
				</from>
			</Query>
		</evaluate>
	</Request>

References

  1. DAML+OIL, March 2001 http://www.daml.org/2001/03/daml+oil-index.html
  2. UMS Data Access Facility Information http://www.langdale.com.au/DAF
  3. UMS Data Access Facility OMG Standard http://www.omg.org/technology/documents/formal/UMS_Data_Access_Facility.htm
  4. Strawman DAML+OIL Query Language Proposa,l Richard Fikes http://www.daml.org/listarchive/joint-committee/0572.html
  5. RDQL - RDF Data Query Language, Hewlett-Packard Company http://www.hpl.hp.com/semweb/rdql.html
  6. SquishQL http://swordfish.rdfweb.org/rdfquery/
  7. The RDF Query Language (RQL) http://139.91.183.30:9090/RDF/RQL/
  8. RDF Query Specification, Ashok Malhotra, Neel Sundaresan http://www.w3.org/TandS/QL/QL98/pp/rdfquery.html
  9. Nexus Query Language, Arnold deVos http://www.langdale.com.au/RDF/NexusQueryLanguage.pdf

Appendix: Ontology

An RDF file for the query ontology is available here: http://www.langdale.com.au/RDF/daml-query.rdf

<rdf:RDF 
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
	xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
	xmlns:daml="http://www.daml.org/2001/03/daml+oil#"
>
<daml:Ontology rdf:about=""> <daml:equivalentTo rdf:resource="http://langdale.com.au/2002/1.0/daml-query"/>
<daml:versionInfo>2002-02-15</daml:versionInfo>
<daml:comment> An ontology for RDF queries that can incorporate DAML class expressions. </daml:comment>
<daml:imports rdf:resource="http://www.daml.org/2001/03/daml+oil"/>
</daml:Ontology> <daml:Class rdf:ID="Request">
<daml:comment>
A Request contains a series of declare terms and
evaluate commands. Each evaluate command
contains a query. The result of a Request is the
concatenation of the results the individual queries.
However, duplicate statements may be eliminated
from the result.
</daml:comment>
</daml:Class>
<daml:Property rdf:ID="declare">
<daml:comment>
The declare property introduces a class expression
but says nothing about it. It is expected that the
nested class will specify an rdf:ID so that it can be
referenced elsewhere in the Request.
</daml:comment>
<daml:domain rdf:resource="#Request"/>
<daml:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
</daml:Property>
<daml:Property rdf:ID="evaluate">
<daml:comment>
The evaluate property specifies a Query to be executed.
Repeated evaluate statements allow a single Request
to contain many Query's.
</daml:comment>
<daml:domain rdf:resource="#Request"/>
<daml:range rdf:resource="#Query"/>
</daml:Property>
<daml:Class rdf:ID="Query">
<daml:comment>
A query specifies a single class and any number of properties.
The result contains a URI for each instance of
the class and all statements about those instances
involving the specified properties.
</daml:comment>
</daml:Class>
<daml:Property rdf:ID="select">
<daml:comment>
Specifies a property to be queried. All statements for this property are returned for each result instance.
Any number of select's can be attached to a query. If no select is
given then at least the URI of each instance in the result is returned.
</daml:comment>
<daml:domain rdf:resource="#Query"/>
<daml:range rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
</daml:Property>
<daml:UniqueProperty rdf:ID="from">
<daml:comment>
The class to be queried. The result will contain each instance of
this class. The class may be explicitly named, or a given as a DAML
Class or Restriction expression. If not specified, the empty class
daml:Nothing is queried (which is probably only useful for testing).
</daml:comment>
<daml:domain rdf:resource="#Query"/>
<daml:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
</daml:UniqueProperty>
</rdf:RDF>

Copyright 2002 Arnold deVos, Langdale Consultants.