Agents for Semantic Search

Group Members

Project Description

The semantic search service will enable people to access information beyond simple keyword search. Agent-enabled semantic search is a powerful approach to implement this. In this scenario, the user gives a query to the search agent and the agent looks for the answer using the terms in the ontologies available. The agent may communicate with other agents and exchange information to broaden the search results.

An example system is Agent Semantic Communication Service at Teknowledge ([WWW]ACCS) from Teknowledge and Ontology Works. ACCS is a DAML-based search system for the semantic web. It allows a user to make precise queries for information encoded in DAML, by specifying each query as a triples clause (using the subject-predicate-object model). Its primary components include a Semantic Search Engine (SSE) which uses multiple agents that collaborate and carry out the search, and a Semantic Translation Service (STS) that does translation between ontologies. ASCS also supports several kinds of simple inference to support query broadening or relaxation and can also be used directly by web based agents to support semantic search and ontology translation.

The main aim of the project is to extend this idea to enable searching diverse data sources such as web pages, relational databases and web services. Agents with specialized capabilities are used to query different data sources and the results from seperate sources as if data is coming from one big DAML repository. Some other improvements are using RDQL language for better query representation, a rule-based ontology translation for translation where the concepts in two ontologies does not have exact mapping and message-passing based agent communication instead of remote method invocation.

System Architecture

The Semantic Search Service (SSS) system is an agent based system where each agent is responsible for querying a different data resource. There is one agent assigned to the user which retrieves the user queries through a user interface and primarily responsible of query processing. This agent sends the user queries to other agents, combines the returned results and presents the final outcome. The main search operation is done on a DAML repository by the PrologAgent. The repository is constructed by crawling the semantically marked pages. These data is stored in a Prolog knowledge base so that PrologAgent can use the Prolog DAML reasoner for querying the knowledge base. The ServiceAgent is the one that searchs the Web Service descriptions published in a repository. The web services that are described in DAML-S gives the semantic information about the service and can be used to find the results of the query. The SQLAgent is capable of querying the relational database according to the mapping between the database and the ontologies. The idea is based on the approach of representing databases structure with ontologies as presented in this [WWW]project. The RDBMS is used to store the data as in its original form without being converted to a RDF representation. The RDQL queries are translated to SQL queries according to the mapping between the database and the ontology used in the query. Therefore, the data in the database can be queried as an ordinary triple store without any need to explicit data conversion.

Agents pass the RDQL queries to each other to search all the avilable data sources. However, the ontologies an agent uses may not be adequate to process the query. If the query includes concepts that cannot be repsented wit the agent's ontology then the TranslatorAgent is contacted to convert the concepts from one ontology to another.

The system's user interface is shown in Figure 2. The interface allows the user to construct a query. There are three basic parts of the RDQL query: a 'select' clause to specify the variables to be found; a 'where' clause to give a conjunction of triple patterns; and an 'and' clause to put some constraints on the variables. A special 'using' clause defines some namespace mappings to abbreviate the URI's. Please refer to [WWW]Jena RDQL for details.

Implementation

The agents are developed in Java Agent Development Framework ([WWW]JADE). The agents communicate using ([WWW]FIPA) ACL messages which also supports a RDF based content language. The agets live in containers which may be distributed to several different machines. Details of agents are described in the following sections.

Prolog Agent

PrologAgent uses a reasoner built-on Prolog to find the answers to the given query. Prolog knowledge base contains the triples in a quadruple format rdf(Subject, Predicate, Object, SourceURI). Resoner defines additional axioms to find the entailments of DAML. Converting the RDQL queries to Prolog is quite straightforward since the triple patterns can be directly used for matching the facts in the knowlege base. Some preprocessing is required to handle the constraint clause.

Database Agent

Relational databases are used to store stuctured data. The data stored in a RDBMS table can be converted to DAML instances if a mapping is defined between the table structure and the DAML class. Another [WWW]project in this course is focused on this idea and crate the realtionship between databases and ontologies. In summary, the idea is to map the tables directly to DAML classes. Table rows become instances and each row can be viewed as a resource where columns are literals or links to other resources.

Once this mapping is generated special agents that use this information may query the database as if the data is converted to DAML instances. This can be quite beneficial id the data stored in the database is rapidly changing and we want to provide the data to Semantic Web enabled agents where at the same time standard database programs work on the same source.

An example of the translation from RDQL to SQL is given in the following example. There are three ontologies that defines [WWW]mountains, [WWW]countries, and [WWW]continents. Each mountain has property inCountry that links to a Country instance and each Conutry instance is linked toa Continent instance. The equivalent information is stored in three tables in the database where the relationship is shown as follows.

Let's consider an example query to express a question such as "Which mountains in Europe are higher than 'Mont Blanc'?". The following code shows a sample RDQL to answer this question and an equivalent SQL translation which can be done using the above mapping information.

 SELECT ?MountainName, ?Height                                           SELECT t0.name , t0.height  
 WHERE (?Mountain, <mountain:name>, ?MountainName),                      FROM  mountains AS t0,   
       (?Mountain, <mountain:height>, ?Height),                                country AS t1, 
       (?Mountain, <mountain:inContinent>, <continent:Europe>),                mountains AS t2, 
       (?MontBlanc, <mountain:name>, "Mont Blanc"),                            continent AS t3   
       (?MontBlanc, <mountain:height>, ?MtHeight)                        WHERE t0.inCountry = t1.ID  
 AND    ?Height > ?MtHeight                                              AND   t3.id like 'Europe' 
 USING mountain for <http://www.mindswap.org/~evren/mountains1.daml#>,   AND   t1.inContinent = t3.ID  
       continent for <http://www.mindswap.org/~evren/continent.daml#>    AND   t2.name like 'Mont Blanc'  
                                                                         AND   t0.height > t2.height  

The algorithm that makes this translation is given here

 
RDQLtoSQL(map) 
 
for every distinct varaible or constant V in RDQL.WHERE refering a Class 
  tableName = map(V.Class) 
  alias(V)= create a unique alias for TableName  
  append(SQL.FROM, tableName as alias(V)) 
  if V is constant 
   append(SQL.WHERE, alias(V).id = V) 
 
 for every (subject, predicate, object) in RDQL.WHERE 
   columnName= map(predicate) 
   if alias(object) is not computed 
    alias(object) = alias(subject)+"."+ columnName 
   else 
    append(SQL.WHERE, alias(subject).columnName = alias(object)) 
 
  for every (subject, op, object) in RDQL.AND 
   columnName= map(predicate) 
   if alias(object) is not computed 
    alias(object) = alias(subject)+"."+ columnName 
   else 
    append(SQL.WHERE, alias(subject) op alias(object)) 
 
  for every variable V in RDQL.SELECT 
     append(SQL.SELECT, alias(V))    

Service Agent

The answer to the query given by the user may also be retrieved through web services. A WSDL description of a web service defines the inputs and outputs of the service using XML Schema datatypes. This definition may not be directly used in conjunction with the RDQL query where everything is expressed in triples. However, a DAML-S description of a web service provides a similar description where inputs and outputs of the service is defined using DAML instances.

Let's consider the question in the previous section about the mountains. If we have the following two services defined in DAML-S, [WWW]FindCountry that gets and input of a continent name and returns a list of [WWW]Country instances, and another service named[WWW]FindMountain that gets input of a [WWW]Country instance and returns list of [WWW]Mountain instances. These services may be combined together to answer the requested query. A simple algorithm that will use the web services to answer a query is givenin the following code segment. The idea is to construct a graph which will tell how the variables depend on each other, find a service that will ground the variables and propagate the information through the dependency graph. So, when there is a triple in the query (S, P, O), the algorithm will look which one of subject or object is known and then find a service that gets an input type of the other one. The algortihm will fail if there is no direct web service that satisfies the condition.

Composition(serviceList) 
 
create dependency graph for triples in RDQL.WHERE  
assign numbers to every object and subject denoting the order of execution 
MaxNum= maximum number assigned  
time=0; 
results=empty; 
while time<=MaxNum do 
  newResults= empty 
  for every distinct ?subject | (?subject, predicate, ?object) in RDQL.WHERE  
                                 and number(?subject)> number(?object) 
  service = find a service such that output is list  ?subject and 
            mandotary service inputs are a subset of set of ?object  
            where number(?subject)> number(?object) 
  for every binding B in results 
     Instantiate the service input from B or query inputs (constants) 
     partialResult= execute service 
     for every binding C in partialResults 
        newResults.add(B union C) 
  results=newResults 
 
for every binding B in results 
  for every (first, op, second) in RDF.AND  
      if B does dot satisfy (first op second)   
         remove B from results  
  for every variable V in B  
     if V is not in RDF.SELECT 
       remove V from B 

Translator Agent

The similar concepts defined in seperate ontologies may be linked through daml:sameClassAs and daml:samePropertyAs constructs. This enables the semantic web agents to use these concepts interchangably. Translation of these concepts may be done as a simple find/replace operation. A DAML-capable reasoner would not even require this explicit translation since the links between the concepts give the same information.

However, a property in one ontology may correspond to a composition of multiple properties in another ontology. Figure 4 shows such an example. This case will require a different solution than the previous brute force replacement. The mapping between these concepts should be expressed in a way similar to N3 rules. TranslatorAgent basically uses a rule-based translation system to help the other agents communicate with ach other. When an agent recieves a query request with ontologies it cannot handle the query is translated with its registered TranslatorAgent.

User Agent

UserAgent is the gateway between the user and the agent system. It is used to transport messages between the user and agents. The user should enter the query in a RDQL format or select one of the prepared examples. It does not provide any means to go from the free text query to the structured RDQL query.

Conclusion

The semantic search service implemented in this project provides a way to search the semantic information that can be retrieved from various different sources such as web pages, relational databases and web services. This functionality provides a powerful search ability to the users.

However, there are still a lot of issues that needs to be addressed. The requirement to specify the queries in RDQL makes it quite hard for the users to construct a query. The algorithm to find the web services makes a blind search that fails unless the service has a very similar input/output structure as the query requires.

Download

The agent code is written in Java with a simple reasoner implemented in Prolog. Full [WWW]source code with example ontologies, services and database is included in the package.

You need to install [WWW]SWI Prolog, [WWW]JPL Java-Prolog interface and [WWW]JADE in order to use the program. To run the system unzip the package to the JADE installation directory and run the script "run.bat".

Please contact [EMAIL]me if you have any questions regarding the program.




[FrontPage] [TitleIndex] [WordIndex]