Intent of Work

"Automated Tools for Mapping Between Ontologies"

Principal Investigators:

Drew McDermott, Yale University drew.mcdermott@yale.edu

Mark Burstein, BBN, burstein@bbn.com

Doug Smith, Kestrel Institute, smith@kestrel.edu

Here is our view of the role of ontologies in DAML:

An ontology is the framework for a dataset. Both of these things consist of the following parts:

A set of datatypes
A set of declared symbols, together with their types
A set of axioms, subclass relationships, and the like which link and constrain the meanings of the symbols.
A set of facts about the declared objects

The difference between an ontology and a dataset is a matter of degree. An ontology has more general rules and a dataset has more particular facts. Either can inherit from a superontology, so that there can be a graded hierarchy from general framework to facts about a particular situation. We will use the word theory to cover ontologies, datasets, and intermediate objects in this hierarchy. We have developed a notation for theories that is based on the PDDL notation for planning domains, but allows a much more flexible syntax, and uses a deeper type theory.

Theories can be instantiated at multiple syntactic levels. Abstractly they can be characterized as particular mathematical structures. In computers they are organized as

hierarchically organized data structures. They have a concrete syntax as Lisp S-expressions, XML documents, or some other hierarchical text structure. At the lowest level they can be represented as character strings in ASCII or Unicode.

In our proposal we talked about translating between ontologies and within a single ontology. It is now clear how to describe the tool we will be building for translating between ontologies. Given two ontologies, O1 and O2, a mapping between them should take any subtheory D1 that is meaningful with respect to O1, and transform it into an equivalent D2 that is meaningful with respect to O2. By "equivalent" we mean that any term or formula in D1 should be mapped into a term or formula in D2 with the same denotation. There are some subtleties. Some expressions in D1 may exist only "internally," in the sense that they should not be visible outside D1; this will probably be indicated by explicit "export" statements that make it clear which conclusions or terms of D1 are intended to be visible to users of D1 and O1. The mapping is often partial, in that for some D1’s there will be no unique corresponding D2, or the user of the mapping may be asked to fill in parameters from the context.

The ontology translation tools cannot on their own guarantee that an expression in one ontology is equivalent to an expression in another, because in general it will not have access to any independent way to gauge the reliability of a translation. However, the tools will check for the internal coherency of a translation. For example, if the translation allows a term to be translated into different terms via different paths, then the two target terms must be provably equal in the target ontology. For propositions, if there is a notion of consistency in O2, then any consistent dataset D1 in O1 should be mapped only to a consistent dataset D2 in O2.

What we have accomplished so far is to define a type checker for a polymorphic, Lisp-based type system. Our ontology system, called DEDUC, exists in a preliminary implementation being used at BBN, although we have not yet combined the type system with DEDUC. We plan to do that in the immediate future, and anticipate no problems. The type checker will allow for a fast coherency check for mappings, because if a source dataset is type-correct, then the target dataset must be type-correct as well.

Here are the other projects that we are engaged in and plan to complete over the next year:

Dejing Dou, a graduate student at Yale, is building an ontology translator for planning domains. PDDL , the Planning Domain Definition Language, is the gold standard for this area, which means that there are lots of "legacy" planning systems, such as Prodigy, whose notations can serve as sources or targets for translation to and from PDDL. In addition, there are different variant domains (e.g., different versions of logistics domain) that provide interesting test cases for ontology translation. (4/01 – 8/01)
In collaboration with other DAML researchers from CMU, KSL, and SRI, we are defining DAML-S, a DAML ontology for describing services. This work draws on our previous work in defining a general-purpose process notation as part of PDDL, and on BBN’s experience in system integration. Whether or not DAML-S succeeds in becoming a Web standard, there will be plenty of places where it impacts on domain-specific vocabularies. For example, if a web agent promises to deliver medical information, it will inevitably use terms drawn from medicine as well as general-purpose terms about service offerings and protocols. The ontology translator will be used to smooth the seam between them when necessary. (4/01 – 12/01)
We are also looking at existing vocabularies for describing medical domains. These tend to be huge and fairly shallow. They often compete to cover the same domains, leading to tools such as the National Library of Medicine’s "metathesaurus." We believe that it will be fairly straightforward to adapt our tools to handle this area. (6/01 – 10/01)
We have created a preliminary version of a glue-code generator that takes descriptions of two agents’ data structures within a single ontology and automatically produces code for translating between them. The current version is built on top of the Kestrel Specware system. The current system assumes that we have a description of a one-shot interaction between agents. We are studying how to extend that to a series of interactions. (4/01 – 2/02)

To answer the enumerated questions:

1. What is the technical goal/accomplishment you are hoping to achieve?

To develop an ontology-translation tool, which facilitates the writing of translators between specific ontologies. The actual translators will be written by human domain experts interactively, in conjunction with the tool. Only a human can judge the intent of a formalization well enough to debug an evolving translation. The tool will aid them by:

Keeping track of which symbols have no mapping rules, or are covered by multiple inconsistent rules.

Recording potential flaws in the rules. For instance, ontology O1 might specify an action such as (goto l1 l2). O2 might specify an action like (move v l1 l2). O1 involves path planning for a single agent, whereas O2 can talk about multiple vehicles moving through an area. The mapping (goto ?x ?y) <->(move ?v ?x ?y)is "defective" in that variable ?v shows up on only one side. The ontology mapper can accept it, but notice that in going left-to-right it must make up a vehicle name or fill it in from information in the target dataset; and in going right-to-left it must bind all the vehicle arguments to the same (implicit) entity.

Finding compositions of ontologies when necessary. If a request is made to translate between two ontologies O1 and O3 for which there is no mapping, the ontology manager will search for an O2 such that O1 <-> O2 <-> O3, and compose the two existing mappings, including the "flaw managers" discussed above.

2. Who are you grouping with in the DAML program and/or who is the intended

military user?

We are grouping with the "DAML for Services Coalition": Katia Sycara at CMU, Sheila McIlraith at Stanford KSL, David Martin, Jerry Hobbs and Srini Narayanan at SRI.

We would be very interested in engaging with a military partner, if you have any suggestions.

3. Who is the intended user of what you are developing, and why would he/she

use it (i.e. the "lifecycle" thing)?

There are two entirely separate communities of users: The "end user" is a human or automatic agent that is attempting to compose two other automated agents in order to carry out a task. The two agents use different ontologies to communicate. The end user will ask the ontology-translation service to find a mapping between them. The mapper will engage in a scripted dialogue with the end user about the potential flaws in the mapping, explaining in domain-specific terms what the problems are, and prompting for parameters (e.g., vehicle ids) to fill in the blanks.

The other user community are the experts who build the ontology mappings. They will propose rules for translating terms and formulas in one ontology into the other, be informed of the potential flaws, and supply the explanations of the significance of those flaws that the end users will see. If a flaw is so serious as to render the mapping useless, the tool will refuse to file the mapping until the mapping is corrected, or the expert gives up trying to fix it.

The motivations of the end users and mapping builders are different. We have to assume that the end user can see two agents that ought to be able to cooperate, and that he or she is sophisticated enough to try to put them together. At that point, he or she will be motivated to use the ontology mapper because the mapper will be an integral part of the agent composer, and the composition won’t work unless the mapper does. There will, we assume, be competing composition agents, and we can’t guarantee that ours will win the competition. (We said above that the end user might be an automated agent. In principle, this raises no new problems, but in practice an automated agent will not be able to engage in a dialogue, canned or otherwise, with the ontology mapper about how to fix flaws, except in the simplest cases.)

The motivation of the mapping builder depends on whether he or she knows there is a demand for a mapping. One plausible scenario is that the designer of the mapping between O1 and O2 is one of the developers of O1, who has been informed by the ontology matchmaker that it has had several requests for interaction between agents that use O1 and agents that use O2. At that point the O1 developer can see a potential new market for agents using O1 if O2 could be made compatible with it. A tool that makes this compatibility easier to arrange would be very helpful.

From the point of view of ontology developers, one of the major lifecyle expenses is keeping ontology mappings up to date. This process will be especially costly if there are complex dependencies among ontologies. O1 may "import" O9, so that a change in O9 might change a mapping between O1 and O2. At the least our mapping tools must keep track of version information, so that users can be sure the mapping they are using is compatible with the versions they have access to. There are other possible roles for our tools to play. If the mapping from O1 to O2 needs to be revised, the tools can start from the original mapping, and show new flaws that arise (if any) due to the changes in O1 or its components.

In some cases the "imports" relation between O1 and one of its components may itself be mediated by a translation. (O1 may wish to use its own name for an O9 predicate; or to suppress some degree of freedom present in O9 .) The difference between this and the "external mapping" case is that an ontology<-> component mapping must be "flawless." Hence the mapping tool can be rerun on the changed version O9, and will either verify that the old mapping still works, or reveal flaws that the O1 ontology manager must eliminate.

4. What do you think the next logical step will be in the 1-2 years after?

The next logical step would be to put the ontology mapper on the Web and look for real vocabularies to work with, from the burgeoning XML world or other sources. If ontologies ever become indispensable, ontology translation is going to be a big industry.

There are many possible technical directions to go in, but it’s hard to foresee which are going to be the most important. We are still uncertain about the role of category theory in ontologies and maps between them. There are unanswered (or unposed) questions about the importance of consistency on the Web. There are substantial HCI questions about how to present formalized semantic content to users, or whether to collect and use it without their knowing it exists. Which of these directions will be the most crucial to address is not clear.