Storage And Retrieval

Group Members

Project Summary

The goal of this project is to do a survey of and test storage/retrival tools available for RDF triples and RDF XML.

We are looking at the following:

An important feature in the implementation of the Semantic Web will be the ability for users to search for relevant Semantic Web entities. Once information is marked up in RDF, statements should be stored in a database that will accept user queries and return statements that match these queries. We look here at some of the query languages and storage/retrieval tools that are being introduced to allow searching of the Semantic Web.

Query Languages

RQL
ICS-FORTH RQL is a query language for RDF that allows functional composition of basic queries and iterators to query RDF schema information along with individual resource descriptions. This means that the query language “knows” the structure of RDF and can exploit that knowledge to make inferences about data in resource descriptions. For instance, eg:Female is described in a resource description to be a subClassOf eg:Person, and all eg:Persons have an eg:name property.

<daml:Class rdf:about="#Female"> 
        <rdfs:subClassOf rdf:resource="#Person"/> 
    </daml:Class> 
 
<rdf:Property rdf:ID="name";> 
        <rdfs:domain rdf:resource="#Person"/> 
</rdf:Property> 
Using RQL, one could get the names of people (and therefore females) with query such as:
        SELECT Y 
        FROM X {name} Y 
        WHERE X=eg:Person 

SquishQL
SquishQL is another query language that uses syntax similar to SQL to query RDF. SquishQL uses the graph syntax of RDF to model queries, including nodes and arcs (represented as URIs) as well as literals. Only inferences stated explicity in an RDF graph will be recognized by SquishQL; therefore, transitivity is not expressed. The query language selects RDF statements based on the similarity of a query graph pattern to a statement graph pattern. The results of the selection are then filtered according to values for the variables specified in the query.

SELECT ?articleid 
FROM http://location.of.data.store 
WHERE (?subject, <art:subject>, geology) 
USING art FOR <http://location.of.our.future.article.ontology#> 

The ? in front of a query element indicates a variable, where arguments not preceded by question marks or delimited by < > search explicit values. The search above would result in a list of article ids for any articles in the collection of Nature articles (stored in a database) whose subject - as defined in the indicated ontology - is geology. The query also specifies that the namespace art: points to a specific URI. The WHERE clause, which defines the query filter constraints, is stated as a triple in (subject, predicate, object) form.

RDQL
RDQL is based on SquishQL but was adapted by the makers of the Jena toolkit (Hewlett-Packard) to be compatible with the Java API that comprises Jena.

Implementing RDF Storage and Retrieval Tools: The ''Nature'' Files

A storage and retrieval system for the Nature data set would be extremely useful in facilitating user access to articles that match their queries. For example, if someone wanted to find articles in Nature about geology, she could query the subject headings assigned to all of the Nature documents. First, the xml markup of the Nature articles should be converted to RDF, using an ontology that describes publications as they exist in Nature (not in nature!). Next, RDF instances could be created for each article, describing each article in terms of its structure as defined in the ontology. These structural elements include properties such as <articleid>, <pubdate>, <title>, <subject>, and others.

Searches will be against the metadata of the Nature articles, as this data has already been identified and tagged in XML. In order to search the information contained in the text of the articles, the natural language of the article text would have to be translated into RDF. While querying the information within natural langugae documents might be an ideal, current statement and query languages do not support much more beyond the metadata search. However, RDF and its query languages are doing so well at representing metadata that RDF/XML has been adopted in a [WWW]Dublin Core Metadata Initiative recommendation.

Tools Tested

We picked Sesame, Jena, and Inkling and tested storing and quering the NatureDataSet.




Test Data

Links

Tools:

Papers:

Notes

Questions/Suggestions Area

We welcome your suggestions.


[FrontPage] [TitleIndex] [WordIndex]