Semantic Web Research Group
OWL Species Validation in Pellet
By Michael Grove

Species validation in Pellet is divided into two main steps, type processing and triple processing. When we load a document for validation, we first read it into a triplestore, and then find all imported documents, read them in, and merge them with the parent document. This way we get the proper behavior for owl:imports and can come up with the correct species for the document in question.

Next, we collect all the rdf:type triples in the document and iterate over them collecting the different properties and classes used in the document. During this stage we can easily find any errors made while declaring resources in the document. If something is declared as a class, and then later as a property, we will catch that in this stage and mark the document appropriately. Some constructs require us to wait until the main processing loop is done before we proceed because of the inherent out of order processing nature of the list of triples. Two of these such constructs are owl:FunctionalProperty and owl:InverseFunctionalProperty.

Also during this step, we do out first structure sharing check. If we find a b-node declared, we check its declaration for any cycles, see (http://www.w3.org/2002/03owlt/I5.26/consistent006) for an example. The basic algorithm is we walk the graph from the b-node vertex and see if we end up back at the b-node vertex at any point. If the b-node uses itself during its own declaration, we’ll find the cycle and mark the document as OWL Full.

After we process this last set of rdf:type triples, we then construct all the lists used in the document. It is during this set that we do the second of our structure sharing checks. While processing the lists, if there is any b-node in the list, as the object of an rdf:first triple, and the list is the target of an owl:intersectionOf or owl:unionOf construct, we must mark the b-node dirty. If it is already dirty from use in another list, we have found a structure sharing violation and the document is no longer in OWL DL. We use this list of used, or dirty, b-nodes throughout the rest of species validation.

Our last step in the processing of the types is we construct all the restrictions used in the document. During this time we validate the owl:onProperty of the restriction (if it is present) to make sure the target of the property is indeed a property.

The next main step in the parsing process is handling the rest of the triples in the document. Here we iterate through the list of triples not already processed in our previous step. For each triple in this list, the first thing we do is another structure sharing check. If the target of the triple is a b-node and the predicate is one of the set rdf:type, owl:complementOf, owl:someValuesFrom, or owl:allValuesFrom, we must check to see if that b-node is dirty. If so, we have found a case of structure sharing and the document is marked as OWL Full, if not, we mark the b-node as dirty and continue with the processing.

During the actual processing of the triple we do the standard language checks, such as upon seeing an owl:oneOf predicate, we mark the document as OWL DL. It is also during this time we do the rest of the handling for owl:imports. If an OWL document importing another document is not specified as an owl:Ontology itself, it is not in OWL Lite. We also check to see if a construct is used within the OWL namespace but is not actually an OWL term. When we come across this case the document is marked Full.

Our last step in the process is the most difficult, detecting structure sharing in owl:disjointWith and owl:equivalentClass nodes as well as finding any cliques within the file. While processing the triples, if a triple is found with either a owl:disjointWith or owl:equivalentClass predicate, it is added to a graph of the disjoint or equivalent classes respectively. These graphs are constructed during parsing and are what we use when finished to find any structure or clique violations.

We first iterate through the nodes in the owl:equivalentClasses graph. For each b-node in the graph, we check to see if that b-node has been previously marked dirty. If so, we have found another structure sharing violation and the document is marked as OWL Full. We also check to see if the b-node is also in the owl:disjointWith graph. If so, this again is a structure sharing violation as the two graphs are not allowed to share b-nodes.

Our last step is the most difficult, handling the owl:disjointWith graph. The first step in the process is to break up the graph into any subgraphs. The situation can arise where there are two (or more) separate, unconnected subgraphs within the larger owl:disjointWith graph. In these cases, we process each subgraph separately. For each graph we find, we first inspect each named node. Every named node in the graph needs to be connected to all other named nodes via at least one edge. If there are two named nodes that have no connection between them, the subgraph is not fully connected and the document is no longer OWL DL.

We then iterate over the collection of b-nodes in the graph, for each of these b-nodes, we collect all other b-nodes it is connected to without going through a named node. When collecting the nodes, the first named node you come across you stop, whatever nodes you have collected up to that point are the nodes the current b-node is connected to. For each node in this collection (the current b-node included) you must check to see if there is an edge between it and all other nodes. If you find any one node that is not connected to all the other b-nodes in its collection, the document may no longer be in OWL DL.

However, if it does not pass this case, we then construct an undirected graph from the current subgraph. If this graph is fully connected, then the document is still within OWL DL. If it is not, then the document is in OWL Full.

Lastly, we iterate over the set of nodes in the subgraph and see if any have been marked dirty. If so, we have found a structure sharing violation and the document is OWL Full.

I found that adding the extra checks for structure sharing and clique’s, especially the graph analysis, adds very little overhead to the overall processing of a file. I conducted several runs of our test program with and without these checks in and performance did not suffer from having them. There was an average increase of less than one second for a test run of well over 350 separate documents.

The most difficult part of the implementation was trying to understand exactly what the structure sharing and clique cases involved as there was very little documentation surrounding the tests in question. If it had not been for extensive discussions with a friend and Sean Bechhofer’s short writeup on the subject, it would have been near impossible to figure out all the requirements and nuances of each of the tests. It is my hope that by explaining how and why I did what I did to detect structure sharing and the cliques, it will be easier for others to use what I wrote, combined with that Sean wrote and what the Working Group provides, to understand and implement their own validators.

MINDSWAP is a W3C member