The semantic search service will enable people to access information beyond simple keyword search. Agent-enabled semantic search is a powerful approach to implement this. In this scenario, the user gives a query to the search agent and the agent looks for the answer using the terms in the ontologies available. The agent may communicate with other agents and exchange information to broaden the search results.
An example system is Agent Semantic Communication Service at Teknowledge (ACCS) from Teknowledge and Ontology Works. ACCS is a DAML-based search system for the semantic web. It allows a user to make precise queries for information encoded in DAML, by specifying each query as a triples clause (using the subject-predicate-object model). Its primary components include a Semantic Search Engine (SSE) which uses multiple agents that collaborate and carry out the search, and a Semantic Translation Service (STS) that does translation between ontologies. ASCS also supports several kinds of simple inference to support query broadening or relaxation and can also be used directly by web based agents to support semantic search and ontology translation.
The main aim of the project is to extend this idea to enable searching diverse data sources such as web pages, relational databases and web services. Agents with specialized capabilities are used to query different data sources and the results from seperate sources as if data is coming from one big DAML repository. Some other improvements are using RDQL language for better query representation, a rule-based ontology translation for translation where the concepts in two ontologies does not have exact mapping and message-passing based agent communication instead of remote method invocation.
The Semantic Search Service (SSS) system is an agent based system where each agent is responsible for querying a different data resource. There is one agent assigned to the user which retrieves the user queries through a user interface and primarily responsible of query processing. This agent sends the user queries to other agents, combines the returned results and presents the final outcome. The main search operation is done on a DAML repository by the PrologAgent. The repository is constructed by crawling the semantically marked pages. These data is stored in a Prolog knowledge base so that PrologAgent can use the Prolog DAML reasoner for querying the knowledge base. The ServiceAgent is the one that searchs the Web Service descriptions published in a repository. The web services that are described in DAML-S gives the semantic information about the service and can be used to find the results of the query. The SQLAgent is capable of querying the relational database according to the mapping between the database and the ontologies. The idea is based on the approach of representing databases structure with ontologies as presented in this project. The RDBMS is used to store the data as in its original form without being converted to a RDF representation. The RDQL queries are translated to SQL queries according to the mapping between the database and the ontology used in the query. Therefore, the data in the database can be queried as an ordinary triple store without any need to explicit data conversion.
Figure 1 The agent architecture of the system
Agents pass the RDQL queries to each other to search all the avilable data sources. However, the ontologies an agent uses may not be adequate to process the query. If the query includes concepts that cannot be repsented wit the agent's ontology then the TranslatorAgent is contacted to convert the concepts from one ontology to another.
The system's user interface is shown in Figure 2. The interface allows the user to construct a query. There are three basic parts of the RDQL query: a 'select' clause to specify the variables to be found; a 'where' clause to give a conjunction of triple patterns; and an 'and' clause to put some constraints on the variables. A special 'using' clause defines some namespace mappings to abbreviate the URI's. Please refer to Jena RDQL for details.
Figure 2 The user interface to enter the RDQL queries
The agents are developed in Java Agent Development Framework (JADE). The agents communicate using (FIPA) ACL messages which also supports a RDF based content language. The agets live in containers which may be distributed to several different machines. Details of agents are described in the following sections.
PrologAgent uses a reasoner built-on Prolog to find the answers to the given query. Prolog knowledge base contains the triples in a quadruple format rdf(Subject, Predicate, Object, SourceURI). Resoner defines additional axioms to find the entailments of DAML. Converting the RDQL queries to Prolog is quite straightforward since the triple patterns can be directly used for matching the facts in the knowlege base. Some preprocessing is required to handle the constraint clause.
Relational databases are used to store stuctured data. The data stored in a RDBMS table can be converted to DAML instances if a mapping is defined between the table structure and the DAML class. Another project in this course is focused on this idea and crate the realtionship between databases and ontologies. In summary, the idea is to map the tables directly to DAML classes. Table rows become instances and each row can be viewed as a resource where columns are literals or links to other resources.
Once this mapping is generated special agents that use this information may query the database as if the data is converted to DAML instances. This can be quite beneficial id the data stored in the database is rapidly changing and we want to provide the data to Semantic Web enabled agents where at the same time standard database programs work on the same source.
An example of the translation from RDQL to SQL is given in the following example. There are three ontologies that defines mountains, countries, and continents. Each mountain has property inCountry that links to a Country instance and each Conutry instance is linked toa Continent instance. The equivalent information is stored in three tables in the database where the relationship is shown as follows.
Figure 3 Relationship between database tables
Let's consider an example query to express a question such as "Which mountains in Europe are higher than 'Mont Blanc'?". The following code shows a sample RDQL to answer this question and an equivalent SQL translation which can be done using the above mapping information.
SELECT ?MountainName, ?Height SELECT t0.name , t0.height WHERE (?Mountain, <mountain:name>, ?MountainName), FROM mountains AS t0, (?Mountain, <mountain:height>, ?Height), country AS t1, (?Mountain, <mountain:inContinent>, <continent:Europe>), mountains AS t2, (?MontBlanc, <mountain:name>, "Mont Blanc"), continent AS t3 (?MontBlanc, <mountain:height>, ?MtHeight) WHERE t0.inCountry = t1.ID AND ?Height > ?MtHeight AND t3.id like 'Europe' USING mountain for <http://www.mindswap.org/~evren/mountains1.daml#>, AND t1.inContinent = t3.ID continent for <http://www.mindswap.org/~evren/continent.daml#> AND t2.name like 'Mont Blanc' AND t0.height > t2.height
The algorithm that makes this translation is given here
RDQLtoSQL(map) for every distinct varaible or constant V in RDQL.WHERE refering a Class tableName = map(V.Class) alias(V)= create a unique alias for TableName append(SQL.FROM, tableName as alias(V)) if V is constant append(SQL.WHERE, alias(V).id = V) for every (subject, predicate, object) in RDQL.WHERE columnName= map(predicate) if alias(object) is not computed alias(object) = alias(subject)+"."+ columnName else append(SQL.WHERE, alias(subject).columnName = alias(object)) for every (subject, op, object) in RDQL.AND columnName= map(predicate) if alias(object) is not computed alias(object) = alias(subject)+"."+ columnName else append(SQL.WHERE, alias(subject) op alias(object)) for every variable V in RDQL.SELECT append(SQL.SELECT, alias(V))
The answer to the query given by the user may also be retrieved through web services. A WSDL description of a web service defines the inputs and outputs of the service using XML Schema datatypes. This definition may not be directly used in conjunction with the RDQL query where everything is expressed in triples. However, a DAML-S description of a web service provides a similar description where inputs and outputs of the service is defined using DAML instances.
Let's consider the question in the previous section about the mountains. If we have the following two services defined in DAML-S, FindCountry that gets and input of a continent name and returns a list of Country instances, and another service namedFindMountain that gets input of a Country instance and returns list of Mountain instances. These services may be combined together to answer the requested query. A simple algorithm that will use the web services to answer a query is givenin the following code segment. The idea is to construct a graph which will tell how the variables depend on each other, find a service that will ground the variables and propagate the information through the dependency graph. So, when there is a triple in the query (S, P, O), the algorithm will look which one of subject or object is known and then find a service that gets an input type of the other one. The algortihm will fail if there is no direct web service that satisfies the condition.
Composition(serviceList) create dependency graph for triples in RDQL.WHERE assign numbers to every object and subject denoting the order of execution MaxNum= maximum number assigned time=0; results=empty; while time<=MaxNum do newResults= empty for every distinct ?subject | (?subject, predicate, ?object) in RDQL.WHERE and number(?subject)> number(?object) service = find a service such that output is list ?subject and mandotary service inputs are a subset of set of ?object where number(?subject)> number(?object) for every binding B in results Instantiate the service input from B or query inputs (constants) partialResult= execute service for every binding C in partialResults newResults.add(B union C) results=newResults for every binding B in results for every (first, op, second) in RDF.AND if B does dot satisfy (first op second) remove B from results for every variable V in B if V is not in RDF.SELECT remove V from B
The similar concepts defined in seperate ontologies may be linked through daml:sameClassAs and daml:samePropertyAs constructs. This enables the semantic web agents to use these concepts interchangably. Translation of these concepts may be done as a simple find/replace operation. A DAML-capable reasoner would not even require this explicit translation since the links between the concepts give the same information.
However, a property in one ontology may correspond to a composition of multiple properties in another ontology. Figure 4 shows such an example. This case will require a different solution than the previous brute force replacement. The mapping between these concepts should be expressed in a way similar to N3 rules. TranslatorAgent basically uses a rule-based translation system to help the other agents communicate with ach other. When an agent recieves a query request with ontologies it cannot handle the query is translated with its registered TranslatorAgent.
Figure 4 Uncle property may be expressed with a rule that combines father and brother relations
UserAgent is the gateway between the user and the agent system. It is used to transport messages between the user and agents. The user should enter the query in a RDQL format or select one of the prepared examples. It does not provide any means to go from the free text query to the structured RDQL query.
Figure 5 The user agent combines the information coming from different agents
The semantic search service implemented in this project provides a way to search the semantic information that can be retrieved from various different sources such as web pages, relational databases and web services. This functionality provides a powerful search ability to the users.
However, there are still a lot of issues that needs to be addressed. The requirement to specify the queries in RDQL makes it quite hard for the users to construct a query. The algorithm to find the web services makes a blind search that fails unless the service has a very similar input/output structure as the query requires.
The agent code is written in Java with a simple reasoner implemented in Prolog. Full source code with example ontologies, services and database is included in the package.
You need to install SWI Prolog, JPL Java-Prolog interface and JADE in order to use the program. To run the system unzip the package to the JADE installation directory and run the script "run.bat".
Please contact me if you have any questions regarding the program.