This part involves mapping the data structures in the database to DAML ontologies. Varying approaches are possible for this. We are currently using very straightforward approach; we are basically mapping the tables directly to DAML classes. This results in a pretty direct mapping between the two spaces. Table rows become instances and each row can be viewed as a resource where columns are literals or pointers to other resources. The mapping is summarized in the following table.
Relational Database ==> Semantic Web, table ==> DAML class, table column ==> DAML property, value of table column ==> literal or resource, foreign key ==> DAML property pointing to other resource, table row ==> instance of DAML class
The mapping is stored in text files. The resulting ontologies are stored some semantic repository, which is again a text file for the time being. We are considering RDF storers as alternatives.
The implementation is done in Java. The sample relational database is Northwind, an MS Access database. We connect to it via JDBC, retrieve the database metadata and create the corresponding DAML ontologies using Jena’s DAML libraries.
Given the mapping from the tables in the relational database to DAML ontologies, it becomes feasible to read the data from the tables and create the instances corresponding to each table row.
A Java program reads the mapping, then accesses the Northwind database via JDBC, and uses the mapping the write out the instances. The Jena libraries are used for the implementation.
Currently, the created instances are output to text files. We are considering using one of the available RDF storers instead. This may make the whole process more effective by using storage geared RDF/DAML instances. Since we are also querying the instances in the following parts of the project, this appears to be the choice preferable to plain text files. For this purpose, we are considering a number of choices, which include Berkeley DB and relational databases supported by Jena. As it makes sense in terms of economy of resources, we are currently inclined to use the already available Northwind database, backed by Jena relational database support.
Although creating the ontologies directly from the database metadata is quite effective and straightforward, this may not be the most appropriate way to carry out this task. There are two issues relevant to this.
We may want to create a more efficient representation or a more general representation based on the statistics of the database. As a simple example, we may want to omit the columns that are mostly empty. More complex scenarios are also possible.
We may want to have the resulting ontologies as compatible as possible with the existing ontologies. For example, if a product ontology already exists, why not just use that or extend it, instead of writing another from scratch. Of course, it is really a difficult problem to determine the level of compatibility or to decide when to adopt the existing ontology. OntoMerge appears to be a tool that may be of use within this context. But we need to examine this choice.
Given the ontologies and instances stored in a semantic repository, there are a number of related tasks that may be performed. These are mostly of exploratory nature; some modifications may also be necessary to improve the semantic structures.
For a semantic agent that is programmed to make some use of the data, it is sufficient that the data is in semantic form. In contrast, a user interface should also be provided for a human user, who may want to manipulate or simply explore the data in this form. Different functions are required for the two main forms of data, i.e. the ontologies and the instances.
Given an ontology obtained from the relational database, the user should be given the ability to visualize it and to modify (edit) it if necessary. The user may find it necessary to match the new ontology with existing ontologies, not to reinvent what already exists.
Below are details of what approaches are planned to be incorporated for these functionalities.
This includes visualizing the ontology and editing it. There are already a significant number of good tools for this purpose, and trying to create a good alternative exceeds the scope of task, apart from being unnecessary. So the approach is to make use of the existing tools.
For the purpose of visualization and editing, three tools are considered: IsaViz, RIC, and SMORE. All three are good choices as they are user-friendly tools with versatile functionality.
IsaViz is a visual environment for browsing and authoring RDF models, represented as directed graphs. Resources and literals are the nodes of the graph (ellipses and rectangles respectively), with properties represented as the edges linking these nodes.
RDF Instance Creator (RIC) is a general use tool for creating semantic mark-up. One can read in ontologies found online, or on the hard drive, and create mark-up using the terms from the ontologies imported into the program. A form based user interface enables users to quickly and easily create mark-up for their homepage, for work, or for research.
Again SMORE is a tool that allows users to markup their documents in RDF using web ontologies in association with user-specific terms and elements. The key features of SMORE that make it a desirable markup tool are the following: fully featured WYSIWYG HTML editor and web browser, triple shortcut window, data classification window, ontology management window, ontology creation window, advanced search feature, comprehensive help.
Although these tools are more than sufficient in terms of functionality for ontology manipulation purposes, it is still desirable to create an integrated environment, rather than requiring the user to invoke several other programs. This direction will also be explored to see if it is possible to come with a solution where the tools may be incorporated within the environment of the rest of the project.
Manipulating the ontologies is the definitely important, but the ontologies are no use if they are not used to create instances that encapsulate data in a meaningful way. Hence, it is crucial to devise means of exploring the semantic data, or simply making the instances available. Note that, within the scope of this project, the point of interest is making the read-only semantic data available. Aiming to set up an interface to insert or update data may turn this project into a database project. On the contrary, what we wish to do is to put the data stored in relational databases into semantic form so that human users or semantic agents can exploit them within the context of the Semantic Web.
We plan to implement three functionalities for human users: listing instances, browsing instances and semantic search. For the semantic agents, it is enough that the data is stored in a semantic repository.
The first function simply lets the user view the instances of a chosen ontology. The user connects to the semantic repository. The program presents the available ontologies to her and the user selects one of them so that the program displays the listing of instances corresponding to that ontology.
{{ SEE THE LISTING SNAPSHOT }}
The browsing function allows the user to traverse the RDF graph represented by the ontologies stored in the repository. The user chooses an initial class to start with. Then she can choose to follow the properties that point to other classes (or resources). She can stop at a class and have all the instances belonging to that class displayed. She can also filter the instances displayed by entering the values of some properties so that only that match the criteria are displayed.
{{ SEE THE BROWSING SNAPSHOT }}
The semantic search function lets the user perform complex searches on the instance repository. The user is able to choose some relations and put some restrictions on these relations so that the outcome of the search is the list of the desired instances. Specifically, the user is given the list of properties, a subset of which can be used as the search relations. The user can provide subject or object values for these properties, or just leave them as wildcards, or use the provided temporary variables to relate them. The user also allowed to impose constraints on these variables. The result of the search is displayed as a list.
{{ SEE THE SEARCH SNAPSHOT }}
The functionality explained in this section is implemented from scratch in Java. Jena is the main resource for this purpose. RDQL integrated in Jena is used for much of the querying explained above.
We welcome your suggestions.