GraphPath

RDF Resource Description Framework Icon

GraphPath is a little-language for analysing graph-structured data, especially RDF. The syntax of GraphPath is reminiscent of Xpath. It has a python implementation that can be teamed up with your favourite python RDF API (e.g. Redland, rdflib, or your own API).

The implementation provides a query evaluator and a goal-driven inference engine, which work together to perform graph analysis.

Or read on for a preview and rationale.

Introduction

A GraphPath expression is a template that matches paths in a directed labeled graph (DLG). An expression looks like this:

Class(ex.Female)[Property(ex.parent)/Node(ex.nalda)]/Property(ex.firstName)

In the context of a hypothetical family tree this expression might match a set of paths such as ex:julie -- ex:firstName --> "Julie" and ex:lisa -- ex:firstName --> "Lisa".

By default, when a GraphPath expression is evaluated, the set of matching paths is collapsed to a set of terminal nodes: "Julie", "Lisa", ... which is frequently useful. In that mode, the GraphPath expression can be read as a query: females' (whose parent is Nalda) first names.

A set of matching paths can also be collapsed to its initial nodes, in which case the expression is read as a predicate. The expression Property(ex.parent)/Node(ex.nalda) is used as a predicate within the previous larger expression and is read as: anyone whose parent is Nalda. It matches paths such as ex:arnold -- ex:parent --> ex:nalda and ex:lisa -- ex:parent --> ex:nalda, whose initial nodes are sought.

Apart from extracting nodes from graphs, GraphPath expressions can extract mappings and relations. A mapping of initial nodes to terminal nodes is obtained by Map(expr) and can be plugged into numerous, existing directed graph (DG rather than DLG) algorithms. A relation, or set of tuples, is obtained with the projection operator, %, and is particularly useful as a way to couple graph data to (non-GraphPath) code.

Rules and Inference

GraphPath rule definitions augment the graph with derived data. They look like this:

rules[Property(ex.child)] = ~Property(ex.parent) # rule 1
rules[Property(ex.daughter)] = Property(ex.child)[Class(ex.Female)] # rule 2

These two rules define the concept of child in terms of parent and the concept of daughter in terms of female and child. Now the original example can be simplified to:

Node(ex.nalda)/Property(ex.daughter)/Property(ex.firstName)

Which reads as Nalda's daughters' first names.

The inference machinery can be used to great advantage with recursive, mutually recursive and even circular rules. Consider this common case:

rules[Property(ex.descendant)] = Property(ex.child) # rule 3
rules[Property(ex.descendant)] = Property(ex.descendent)/Property(ex.descendent) # rule 4

Here, rule 3 says a child is a descendant. Then rule 4 says that the descendant property is transitive. (Note that transitive properties are common enough that GraphPath also provides a transitive traversal operator, //.) With rules 3 and 4 in place it is possible to write queries such as all Jack's female descendent's:

Node(ex.jack)/Property(ex.decendent)[Class(ex.Female)]

In this way, GraphPath can efficiently generate the entailments of simple ontologies and incorporate them into query results. When contemplating rules that might generate a large number of entailments it is worth noting that the inference machinery is goal-driven. It only evaluates the rules that are relevant to the queries actually presented.

Relationship to RDF API's

The python implementation of GraphPath is intended as an add-on rather than a complete wrapper for existing RDF handling systems. In other words, you use RDF API objects such as resources, values and models with GraphPath. It is therefore reasonable to mix RDF API calls and GraphPath queries.

There is only a narrow dependency between the GraphPath implementation and the underlying graph data representation. This makes it practical to adapt GraphPath to application specific data representations, including data that is non-RDF in form.

The implementation is in terms of python classes with operator overloading, which is useful if you want to manipulate queries as python objects. GraphPath queries are effectively a native part of your python program and it is easy to extend the GraphPath primitives in python.

Alternatives for Querying

There are other XPath-inspired RDF query languages. Versa is probably the most well-developed and could be regarded as a benchmark in this area. Like GraphPath, Versa has a python implementation.

RxPath is another language to look at in this vein. It reuses XPath syntax for RDF via a clever mapping of the underlying data models.

You might consider using GraphPath over the alternatives:

Alternatives for Inference

CWM is the well-known python tool for RDF inference. There are a couple of notable differences between CWM and GraphPath's inference machinery.

CWM uses a data driven, or forward chaining, algorithm to compute inferences while GraphPath is goal driven or backward chaining. This means that, unlike CWM, GraphPath does not need a distinct think phase and does not need to compute all possible inferences to produce useful results. See also Euler, which is a backward chaining RDF reasoner.

Another difference is that GraphPath rules are written as expressions defining new properties and classes. CWM rules are more general formulae. As a rough guide, GraphPath aims to be powerful enough to handle any implication that can be written in OWL DL, but it is not at that point yet.

You might consider using GraphPath over the alternatives: