Agents and The Semantic Web Portal
Group Members
Project Description
The Semantic Web is defined as "an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation" This definition brings the idea of software agents that can work autonomously and do the tasks in behalf of people. It is quite reasonable to say that the real power of semantic web will be revealed when the agents are capable of accomplishing tasks without the guidance of a human being.
The nature dataset is particularly interesting from the semantic web perspective because fo the range of topics it covers. With a good coverage of the range of academic literature, these articles provide us with an oportunity to study the overlaps and interrelationships between fields that may not have been previously available.
Implemented, this idea of providing links based on common semantic content is the Semantic Web Portal. For such an implementation, several different types of semantic web agents will be necessary
- Crawler - to spider pages and collect markup data
- Ontology Merging/Similarity - to find the similarities in ontologies when explicit links are missing
- Trust - to decide if the information sent by other agents is trustworthy, or to rate the quality of links
Existing Systems and Tools
These are descriptions of semantic web agents currently available on the web. Their applicability to the semantic web protal and nature dataset will be discussed later in this document.
- Trust Bot: In a distributed system such as semantic web environment an agent cannot trust all the information recieved from other agents. There should be a way for the agent to assign trust levels to the information sources and process their data according to this criteria.
- Semantic Similarity Agent: The semantic web will more likely consist of many ontologies that define similar ontological terms in different, and not necessarily consistent, ways. For the agents to agree on a vocabulary, the requirement to share the same ontologies may be too restrictve. Having a tool that can find the similarities between differnt terms and help the agents that use differnt ontologies reach a partial agreement will increase the efficiency.
- xTALKS: The main purpose of xTALKS is to provide a customized access to IT related events based on user or company's interests, current location and current schedule. The xTALKS accomplishes this task by allowing interaction between various agents. First, there is a personal agent working on behalf of each user. Everybody also has a personal preference file written in DAML describing the user's interests about the the talks using the terms defined in a talks ontology. Second important component in the system is the central xTalks agent which stores the information about the talks. This agent also provides a web based portal, where users can browse talks based on keywords, topics, speaker names, and other information. When a new talk is registered to the xTalks agent each personal agent recieves a notification message describing the new talk. Using the DAML based preference file personal agent tries to decide if the user will be interested in this talk. Each agent may also communicate with another agent to learn other people's opinion about the talk. Combining this feedback with the user's preferences the agent finds an estimate for the user's interest level for the talk. If the talk is found to be of high interest to the user, personal agent contacts the Map Quest agent to learn the talk's location and also asks the calendar agent if the schedule is ok for the given time slot. The talks that satisfy the user's requirements with respect to these parameters are added to the schedule by the calendar agent.
- Retsina Semantic Web Calendar Agent: The agent provides interoperability between RDF based calendar descriptions on the web, and Personal Information Manager (PIM) Systems such as Microsoft's Outlook. The agent allows users to browse Sematic Web schedules and events, and import the selected schedules to the user's personal schedule. Sharing and importing the schedules may also be done autonomously by the agent itself. Calendar Agents have the abilty to negotiate possible meeting times based on user?s schedule and preferences.
- Agent Semantic Communication Service (ASCS): ASCS developed by Teknowledge and Ontology Works is a DAML-based search system for the semantic web. It allows a user to make precise queries for information encoded in DAML, by specifying each query as a triples clause (using the subject-predicate-object model). See an online demo at this web site: http://plucky.teknowledge.com/daml/damlquery.jsp Its primary components include a Semantic Search Engine (SSE) which uses multiple agents that collaborate and carry out the search, and a Semantic Translation Service (STS) that does translation between ontologies. ASCS also supports several kinds of simple inference to support query broadening or relaxation and can also be used directly by web based agents to support semantic search and ontology translation. Main Limitations: 1. Restricted to textual data 2. Need for user to create complex DAML queries rather than specify it in natural language. (tedious & time consuming)
- Activity Based Search: The TAP Project at Stanford is a search application for the Semantic Web that extends traditional search (eg. Google) by associating search terms with concepts from an associated Knowledge Base (TAP-KB). See an online demo at this web site: http://tap.stanford.edu:8000/tap Currently, the TAP KB contains basic lexical and taxonomic information about a wide range of popular objects such as Music, Movies, Sports, Companies, Places, Toys, Health etc. When the user provides a search string, the TAP system tries to make links with concepts in the KB and uses the inferred knowledge to return more relevant results. For example, searching for 'A Few Good Men' returns results obtained from posting the query to Google and simultaneously displays results obtained from TAP at the side - these include links to purchasing the DVD/Video of the Movie from Ebay etc, based on the conceptual link that 'A Few Good Men' is a 'Movie' (defined as a class in some ontology in the TAP KB). Though the idea is innovative and works well in the limited domain (is useful for advertising as well), its main limitations are: 1. Restricted database - As of May 2002, it covers an average of about 15% of the search terms encountered on a search site such as www.dmoz.org 2. Since the TAP KB has upper/mid level ontologies (which are pretty abstract), the search application is limited in that low-level instance data is not present. For instance, though it determined that 'A Few Good Men' is a 'Movie', additional information such as Year the movie was released, Director, Awards etc is not available. Also its restricted to one concept per search, with no boolean operations supported.
- OntoMerge: Agent communication will eventually require the ability for two or more agents to connect together even though they were not explicitly designed to connect together. There are several facets to this problem, but the hardest part is translating from the data structures of one agent to the data structures of another. This tool looks at the idea of connecting two agents when it has been proposed by some other entity, such as a "matchmaker." They assume that the provider of the agent has specified the content of its inputs and outputs using DAML, the DARPA Agent Markup Language. This project is developing ways to ensure that the messages these agents exchange are meaningful to each, by translating between their published DAML ontologies, and planning tools that can use DAML service descriptions to compose sequences of messages that accomplish user goals.
- FOAFBot: An IRC bot that builds a databose of information from the FOAF ontology. Using the IRC interface, it can provide information like name, mail address, homepage, other nicknames, URL's of pictures, etc about anyone in it's database. A trust component is also added which appends an "according to" to each bit of information. Verified sources - files that have a valid digital signature - have a source name, while unverified sources are listed as "anonymous"
Semantic Web Portal
A Semantic Web portal is several steps beyond today's search engine. Instead of requiring users to enter
a series of keywords, the SWPortal can generate results for a term or collection of terms specified in an ontology.
That step alone removes much of ambiguity from search. This can be extended further by incorporating it into editors like SMORE. As users edit their pages, a panel dedicated to the Portal can be returning pages with similar markup, related images and data, or references to other material.
Here is a fictitious screenshot of the Semantic Search Portal as it would be implemented in SMORE:
It represents the basic idea and is by no means complete. There are two main stages involved. In the first stage, the user provides associations between keywords in the search string and actual ontological references (classes/properties/instances or in the worst case the user leaves it blank providing incomplete information). Secondly, the software performs a combined search using multiple agents (bots) associated with different ontologies that parse the rdf data based on the query, exchange information with one another and use inference rules to further extend the search domain.
As can be seen in the example, when the user searches for 'Java Programmer in College Park' the result of
the search returns a 'Professor (of Programming Languages) at the University of Maryland' (note: no keyword match at all) simply because the user provides associations such as 'programmer is a class...', 'college park' is an instance of 'city' etc..and leaves the rest to the search agents that filter data and make inferences based on the rdf query.
Agents play a role in several stages of this portal. First, it is safe to assume that users will have their own, specific ontologies and that articles will not all be marked up with the same ontology. Thus, to make the first judgement about whether two concepts are related, there needs to be some merging or reconciliation between the ontologies. Doing this manually would be far too difficult and tedious, so there needs to me an agent that can do this resolution. *OntoMerge* agent takes steps toward achieving this, and the *Semantic Similarity Agent* also begins to address some of these issues.
Secondly, we would like a method for giving credibility to sources. Users should have an ability to rank how much credit (or discred) specific colleagues receive. Using algorithms in the same ilk as those described in *TrustBot*, results of a search can be ranked according to how much a user should trust the results. A similar result could be achieved by an agent such as *WebMate* which stores and builds a notion of user preferences based on their previous web experience
To build this knowledge of articles and their semantic contents, a crawler, such as the *DAML Crawler* or *OCRA* are needed to spider files and build a repository of knowledge about pages and their corresponding RDF.
On the Nature Dataset
As someone is marking up an article (in a portal enabled editor), the user could find related articles in potentially different research areas (assuming there is access to all of the nature articles which are properly marked up). This could provide additional references, images, or just a general link to research in related disciplines that may be of use.
Links
Existing Systems and Tools
Tools for Agent Development
Semantic Search Tools
Related Papers
[FrontPage]
[TitleIndex]
[WordIndex]