Intent of Work-2003

Teknowledge, Palo Alto

Adam Pease, John Li

Technical goal / Accomplishment

We will provide the Agent Semantic Communications Service - a DAML enabled search and translation service capable of coping with the semantic heterogeneity of the web. The system will be highly distributed and will scale to the size of the web of both today and tomorrow. It will have performance equivalent to today’s text-based search engines. ASCS will be more powerful and effective to use than today’s search engines because it will search for content that has the necessary meaning – not just surface syntax – to answer the query posed to it.

The current DAML ASCS system (http://plucky.teknowledge.com/daml/damlquery.jsp) allows users to search all the existing DAML content on the web, and do so in a reasonable amount of time (usually under 10 seconds). It supports several kinds of simple inference that makes search even more powerful. Generalization and specialization for terms and relations is provided. ASCS first returns exact matches and then can (if the user has asked for generalization or specialization) generalize or specialize terms in order to return a broader set of results. ASCS also employs the DAML “inverse” relation to broaden queries. For example, ASCS use (childOf Bill Mary) to return values for the query (motherOf Mary ?X).

The primary research challenges are ones of algorithmic complexity and interface design. We need to deploy a search system that will scale up handling DAML content that is comparable to the size of current HTML content. Using inference makes the search system more powerful than current web search engines, but adds complexity that must be addressed if the system is to be practical. An additional complexity is in the user interface. A typical web search interface is very simple – just a text area for typing keywords. ASCS has to have a way to specify ontology, subject, predicate, object, types of inference, etc.

A key enabler for an improved interface is being able to make queries in a single rich ontology or vocabulary that the user can gain familiarity with over time. However, such a query must then be translated to all other ontologies, so that a complete search of the DAML web can be conducted. Most DAML ontologies do not have mappings to other ontologies so we have to undertake that mapping task ourselves. Results of the mappings will be posted as DAML files so that other users can take advantage of them. In order to create the mappings we will also create a lightweight and practical tool to support a human user who is creating the mappings.

We will take two paths to creating an improved user interface. One path is to integrate with the system SRI has developed to perform a type of natural language understanding in the ARDA AQUAINT program. A second path is to integrate our own natural language understanding system that employs a restricted English grammar. The SRI integration has minimal project cost but given the risk of both approaches, an appropriate risk mitigation strategy is to pursue both.

The key metric for our system is time to answer a query. We have performed a preliminary study on different types of queries on different system configurations. This is available from our web site at http://reliant.teknowledge.com/DAML/DAML.ps . We expect to do broader and more stressful performance testing throughout the remainder of the contract.

A second key metric will be the number of complete mappings between DAML ontologies (caveat to the extent that there is semantic overlap between each pair). In order to avoid the problem of having to create N^2 mappings, where N is the number of DAML ontologies, we will map to a single large ontology that will serve as the “hub” and source language for the query engine.

DAML Experiment

As long as the distributed ontology developers use the samePropertyAs relation to show the correspondence between ontologies, we can provide our search service that lets users query all the content of all the ontologies. To the extent that we are provided access to the ontologies used in the DAML experiment, we will create the mappings ourselves. The experiment plan states that “The DAML Experiment will use multiple, distributed ontologies to represent the complex set of knowledge related to the ONA process.” We will provide a translation component that allows DAML statements to be translated from one ontology to another by employing the DAML [j1] equivalentTo relation. We will also provide our search agent that uses the translation agent to allow the user to search all the DAML content used in the experiment, and state search criteria in his ontology of choice. We will also provide an improved NL interface for our system to the DAML Experiment.

Next Steps

In the coming 3 years, we plan to

Provide a natural language interface to ASCS. This will require an ontology mapping effort as described below. We will provide a restricted natural language interface based on prior work conducted for ARDA that employs SUMO and our SUMO to WordNet mappings to translate NL to logic.
Continue to address performance issues. We can handle several simultaneous users with response times that are less than ten seconds on most queries. We need to conduct a more thorough study as to the performance of specific classes of queries. We also need to address scale up issues. Crawling current DAML content results in index files of several hundred megabytes. We need to extrapolate what this would mean if there were as much DAML content as there is currently HTML content and determine what additional implementation techniques are required. We will need to study performance with external databases. We have a simple load balancing system that can farm out queries to multiple servers using a weighted round-robin selection scheme. We need to explore how well that approach works as we scale up the number of simultaneous users and the number of servers.
Development of ontology mapping content that will “seed” the DAML web so translations can occur. Many DAML ontologies do not currently use the same…as relations that make translation possible. By seeding the DAML web with ontology equivalences, we can help show the value of linking to other ontologies and motivate more developers to include this information on their own. In order to make this effort possible, we will continue development of an ontology mapping tool that employs

linguistic mappings based on our WordNet to SUMO mappings
ontology mappings based on the class structure of SUMO
structural mappings based on work in similarity-based logical situation matching we have conducted in the ARDA NIMD program

Participate in transition activities. This may include transfer of technology to programs such as NIMD, as well as distribution of open source software to interested parties via our web site. IET and Oakland University are actively using ASCS. We will also include effort to make the software bundle easy to understand, deploy and maintain.
Continue implementation of the distributed ASCS architecture. Currently, we are concentrating on one search agent and one translation agent. By expanding to multiple agents, each of which is only concerned with DAML content in a particular ontology, we will gain considerable benefits in speed and storage efficiency. Crawling the entire DAML web results in an index file that will soon become unmanageable on a single machine, as the volume of DAML content continues to increase. By distributing the crawling activity and generation of indexes to multiple agents on multiple machines we can ensure scalability of the system. We will need to restructure our system to handle a change from searching the entire contents of the DAML in one large index file, to a more distributed approach where search agents only contain an index for one particular ontology, but communicate with other agents through translation agents.