Over the course of the last four decades of the twentieth century, use of databases grew in all enterprises. The Internet revolution in the late 1990s sharply increased direct user access to databases. Most organizations implemented web interfaces to their databases and made a variety of services and information available online. Today, with the emergence of the Semantic Web, the Internet is on the verge of a new revolution. A key ingredient of this is the ability to make the databases available semantically, that is, to find an automated and meaningful way of expressing the structure and semantics of the databases. Within this perspective, we are aiming at a tool or a set of tools that will automatically create the ontologies (and possibly instances) corresponding to the content of the database and make these available for humans and machines. This results in the two following major tasks:
This part involves mapping the data structures in the database to DAML ontologies. Varying approaches are possible for this. We are currently using a very straightforward approach; we are basically mapping the tables directly to DAML classes. This results in a pretty direct mapping between the two spaces. Table rows become instances and each row can be viewed as a resource where columns are literals or pointers to other resources. The mapping is summarized in the following table.
|Relational Database||Semantic Web|
|table column||DAML property|
|value of table column||literal or resource|
|foreign key||DAML property pointing to other resource|
|table row||instance of DAML class|
The resulting ontologies are stored some semantic repository, which is again a text file. We have considered RDF storers as alternatives. But the current choices of the toolbox we have used do not provide support for DAML storage.
The implementation is done in Java. The sample relational database is Northwind, an MS Access database. We connect to it via JDBC, retrieve the database metadata and create the corresponding DAML ontologies using Jena’s DAML libraries.
Given the mapping from the tables in the relational database to DAML ontologies, it becomes feasible to read the data from the tables and create the instances corresponding to each table row.
A Java program reads the mapping, then accesses the Northwind database via JDBC, and uses the mapping to write out the instances. The Jena libraries for RDF manipulation are used for the implementation.
Currently, the created instances are processed in memory. For experimentation purposes, this has turned out to be the most efficient choice. But we have also considered using one of the available RDF storers instead. This may make the whole process more effective by using storage geared RDF instances. Since we are also querying the instances in the following parts of the project, an appropriate method for RDF storage seems called-for. For this purpose, we have considered a number of choices, which include Berkeley DB and relational databases supported by Jena. As it makes sense in terms of economy of resources, we are currently inclined to use the already available Northwind database, backed by Jena relational database support. The change from the RDF model in memory to RDF model in the relational is indeed trivial. It just requires changing a few lines in the code.
Although creating the ontologies directly from the database metadata is quite effective and straightforward, this may not be the most appropriate way to carry out this task. There are two issues relevant to this.
We may want to create a more efficient representation or a more general representation based on the statistics of the database. As a simple example, we may want to omit the columns that are mostly empty. More complex scenarios are also possible.
We may want to have the resulting ontologies as compatible as possible with the existing ontologies. For example, if product ontology already exists, why not just use that or extend it, instead of writing another from scratch. Of course, it is really a difficult problem to determine the level of compatibility or to decide when to adopt the existing ontology. OntoMerge appears to be a tool that may be of use within this context. But we examined this choice. But, within the limited scope of the project, it has appeared that this is not a very promising path to follow. Hence the current option is to allow the user to edit the ontologies manually.
Given the ontologies and instances stored in a semantic repository, there are a number of related tasks that may be performed. These are mostly of exploratory nature; some modifications may also be necessary to improve the semantic structures.
For a semantic agent that is programmed to make some use of the data, it is sufficient that the data is in semantic form. In contrast, a user interface should also be provided for a human user, who may want to manipulate or simply explore the data in this form. Different functions are required for the two main forms of data, i.e. the ontologies and the instances.
Given an ontology obtained from the relational database, the user should be given the ability to visualize it and to modify (edit) it if necessary. The user may find it necessary to match the new ontology with existing ontologies, not to reinvent what already exists.
Below are details of what approaches are planned to be incorporated for these functionalities.
This includes visualizing the ontology and editing it. There are already a significant number of good tools for this purpose, and trying to create a good alternative exceeds the scope of this task, apart from being unnecessary. So the approach is to make use of the existing tools.
For the purpose of visualization and editing, three tools are considered: IsaViz, RIC, and SMORE. All three are good choices as they are user-friendly tools with versatile functionality.
IsaViz is a visual environment for browsing and authoring RDF models, represented as directed graphs. Resources and literals are the nodes of the graph (ellipses and rectangles respectively), with properties represented as the edges linking these nodes.
RDF Instance Creator (RIC) is a general use tool for creating semantic mark-up. One can read in ontologies found online, or on the hard drive, and create mark-up using the terms from the ontologies imported into the program. A form based user interface enables users to quickly and easily create mark-up for their homepage, for work, or for research.
Again SMORE is a tool that allows users to markup their documents in RDF using web ontologies in association with user-specific terms and elements. The key features of SMORE that make it a desirable markup tool are the following: fully featured WYSIWYG HTML editor and web browser, triple shortcut window, data classification window, ontology management window, ontology creation window, advanced search feature, comprehensive help.
Although these tools are more than sufficient in terms of functionality for ontology manipulation purposes, it is still desirable to create an integrated environment, rather than requiring the user to invoke several other programs. This direction has also been explored to see if it is possible to come with a solution where the tools may be incorporated within the environment of the rest of the project. We have not found a reasonable way of incorporating this rich functionality in our program. Still, we think it is good to give the end user the option to edit the ontology, so we provide the interface shown below. The ontologies can be viewed, edited and saved.
Manipulating the ontologies is the definitely important, but the ontologies are no use if they are not used to create instances that encapsulate data in a meaningful way. Hence, it is crucial to devise means of exploring the semantic data, or simply making the instances available. Note that, within the scope of this project, the point of interest is making the read-only semantic data available. Aiming to set up an interface to insert or update data may turn this project into a database project. On the contrary, what we wish to do is to put the data stored in relational databases into semantic form so that human users or semantic agents can exploit them within the context of the Semantic Web.
We have implemented two functionalities for human users: listing instances, and semantic search. For the semantic agents, it is enough that the data is stored in a semantic repository.
The first function simply lets the user view the instances of a chosen ontology. The user connects to the semantic repository.
The program presents the available ontologies to her and the user selects one of them so that the program displays the listing of instances corresponding to that ontology.
The user can have all the instances belonging to that class displayed. She can also filter the instances displayed by entering the values of some properties so that only that match the criteria are displayed.
The semantic search function lets the user perform complex searches on the instance repository. The user is able to choose some relations and put some restrictions on these relations so that the outcome of the search is the list of the desired instances. Specifically, the user is given the list of properties, a subset of which can be used as the search relations. The user can provide subject or object values for these properties, or just leave them as wildcards, or use the provided temporary variables to relate them. The user also allowed to impose constraints on these variables. The result of the search is displayed as a list.
The functionality explained in this section is implemented from scratch in Java. Jena is the main resource for this purpose. RDQL integrated in Jena is used for much of the querying explained above.
Below is the code developed for the purposes we have described above. DBAdapter is the main class, the program that runs code. DBTable encapsulates the functionality to retrieve the structure and data from each table, i.e. the ontologies and the triples. DBSchema envelopes all the information from the database.
We welcome your suggestions.