RDF Schema Support in GINF

This document describes the which elements of the RDF Schema are supported in the current GINF implementation and points out some implementation details.

Basic model

GINF specifies an abstract Model interface which allows to access "dumb" RDF models. A dumb model does not know anything about the semantics of resources and statements (triples) stored in it and provides a simple means to navigate in a directed labeled graph. Information stored in a Model is accessed using a general method find. For example,

Model result = sourceModel.find( null, new Resource("...#mypredicate"), null )

finds all triples having predicate "...#mypredicate". If the triples were stored in a relational database DB(subject, predicate, object) the above statement would map onto the following SQL statement:

SELECT triple FROM sourceModel WHERE predicate = "...#mypredicate"

Schema model

Quering

On the other hand, there is a notion of an "intelligent" model that has a set of rules to derive statements not explicitly present in the model. Such models can be accessed using the same Model interface, but can also provide specific extensions. SchemaModel is such an "intelligent model". SchemaModel extends the basic Model interface and provides an additional method to validate a schema-aware model against an RDF schema.

Imagine that we have a schema model tomModel

Subject Predicate Object

Tom RDF.type Cat

Tom furColor "Hazel"

and the following corresponding schema information:

Subject Predicate Object

Cat RDFS.subClassOf Animal

furColor RDFS.subPropertyOf animalFeature

Than, the "query"

(1) Model result = tomModel.find( null, RDF.type, Animal )

would deliver the result:

Subject Predicate Object

Tom RDF.type Cat

and the "query"

(2) Model result = tomModel.find( null, animalFeature, null )

would deliver the result:

Subject Predicate Object

Tom furColor "Hazel"

Note that we try to deliver tuples that are already present in the original model rather than create new tuples where possible. However, there are cases where new tuples have to be created in the result model. Consider the following query:

(3) Model result = tomModel.find( Tom, RDF.type, null )

The response would be

Subject Predicate Object

Tom RDF.type Cat

Tom RDF.type Animal

We tried to defined the semantics of the find operation using the intended interpretation, i.e. the one that seems the most appropriate and natural. However, the SchemaModel that we use is only one possible realization. Thus, for example, an alternative implementation would require to return all statements that are can be derived from a given model and schema.

Further "intended" behavior includes retrieving all the resources used in the model:

(4) Model result = tomModel.find( null, RDF.type, RDFS.Resource )

delivers:

Subject Predicate Object

Tom RDF.type RDFS.Resource

Cat RDF.type RDFS.Resource

Note that the literal "Hazel" is not a resource and therefore is not included in the result model. In the above example we decided not to reuse existing triples for simplicity.

Validation

Our default SchemaModel implementation (SchemaModelImpl) performs validation according to the following algorithm:

foreach triple in model {
if triple.predicate == RDF.type
    /* check whether triple.object is an RDFS.Class */
if triple.predicate is a RDFS.ContainerMembershipProperty
    /* check whether triple.subject is an RDFS.Container --- [RDFSchema] specs seems not to require it */
foreach d in validDomains(triple.predicate)
    /* check whether triple.subject is RDF.type of one such d */
if validRange(triple.predicate) != null    // there can be only single range specification
    /* check whether triple.object is RDF.type of validRange(triple.predicate) */
}

Our implementation loads both RDF Model/Syntax and RDF Schema vocabulary and uses in for schema access and validation. However, there are still some special cases left which had to be dealt with extra, i.e. RDFS.Literal as domain or range specification.

Currently, unsupported constraints are ignored.

Both RDF Model/Syntax and RDF Schema can be successfully validated with our implementation.

Realization

In order to evaluate RDFS.subClassOf and RDFS.subPropertyOf it is necessary to compute the transitive closure on both of these predicates. For efficiency reasons, we compute the closure on demand, i.e. when a find query is stated we check whether the precomputed closure is still valid (it is invalidated when additional schema information is fetched). If not, the closure is computed.

This approach requires SchemaModel to work very closely with the SchemaRegistry where the schema information is stored. Therefore, our default implementation of SchemaModel is based on the default implementation of SchemaRegistry (i.e. one of them cannot be replaced independently).

We propose that "intelligent models" are constructed as wrappers of "dumb" models. For example, the default implementation wraps a dumb model as follows:

SchemaModel sm = new SchemaModelImpl( registry, dumbModel );

Conclusion

"Intelligent" RDF models implement inference rules which may vary in complexity, efficiency and completeness. For example, we found that complete answer sets are rather an obstacle than advantage for simple schema access. If would be helpful for developers to have a standard set of inference rules for RDF schema implementation.

Subject	Predicate	Object
Tom	RDF.type	Cat
Tom	furColor	"Hazel"