Teknowledge, Palo Alto
Adam Pease, John Li
We will provide the Agent Semantic Communications Service - a DAML enabled search and translation service capable of coping with the semantic heterogeneity of the web. The system will be highly distributed and will scale to the size of the web of both today and tomorrow. It will have performance equivalent to today’s text-based search engines. ASCS will be more powerful and effective to use than today’s search engines because it will search for content that has the necessary meaning – not just surface syntax – to answer the query posed to it.
The current DAML ASCS system (http://plucky.teknowledge.com/daml/damlquery.jsp) allows users to search all the existing DAML content on the web, and do so in a reasonable amount of time (usually under 10 seconds). It supports several kinds of simple inference that makes search even more powerful. Generalization and specialization for terms and relations is provided. ASCS first returns exact matches and then can (if the user has asked for generalization or specialization) generalize or specialize terms in order to return a broader set of results. ASCS also employs the DAML “inverse” relation to broaden queries. For example, ASCS use (childOf Bill Mary) to return values for the query (motherOf Mary ?X).
The primary research challenges are ones of algorithmic complexity and interface design. We need to deploy a search system that will scale up handling DAML content that is comparable to the size of current HTML content. Using inference makes the search system more powerful than current web search engines, but adds complexity that must be addressed if the system is to be practical. An additional complexity is in the user interface. A typical web search interface is very simple – just a text area for typing keywords. ASCS has to have a way to specify ontology, subject, predicate, object, types of inference, etc.
A key enabler for an improved interface is being able to make queries in a single rich ontology or vocabulary that the user can gain familiarity with over time. However, such a query must then be translated to all other ontologies, so that a complete search of the DAML web can be conducted. Most DAML ontologies do not have mappings to other ontologies so we have to undertake that mapping task ourselves. Results of the mappings will be posted as DAML files so that other users can take advantage of them. In order to create the mappings we will also create a lightweight and practical tool to support a human user who is creating the mappings.
We will take two paths to creating an improved user interface. One path is to integrate with the system SRI has developed to perform a type of natural language understanding in the ARDA AQUAINT program. A second path is to integrate our own natural language understanding system that employs a restricted English grammar. The SRI integration has minimal project cost but given the risk of both approaches, an appropriate risk mitigation strategy is to pursue both.
The key metric for our system is time to answer a query. We have performed a preliminary study on different types of queries on different system configurations. This is available from our web site at http://reliant.teknowledge.com/DAML/DAML.ps . We expect to do broader and more stressful performance testing throughout the remainder of the contract.
A second key metric will be the number of complete mappings between DAML ontologies (caveat to the extent that there is semantic overlap between each pair). In order to avoid the problem of having to create N^2 mappings, where N is the number of DAML ontologies, we will map to a single large ontology that will serve as the “hub” and source language for the query engine.
As long as the distributed ontology developers use the samePropertyAs relation to show the correspondence between ontologies, we can provide our search service that lets users query all the content of all the ontologies. To the extent that we are provided access to the ontologies used in the DAML experiment, we will create the mappings ourselves. The experiment plan states that “The DAML Experiment will use multiple, distributed ontologies to represent the complex set of knowledge related to the ONA process.” We will provide a translation component that allows DAML statements to be translated from one ontology to another by employing the DAML [j1]equivalentTo relation. We will also provide our search agent that uses the translation agent to allow the user to search all the DAML content used in the experiment, and state search criteria in his ontology of choice. We will also provide an improved NL interface for our system to the DAML Experiment.
In the coming 3 years, we plan to