Our primary technical objective for 2003 is to directly link our MS Word and MS PowerPoint software for DAML markup production and consumption with the HORUS ontologies and (unclassified) knowledge base. This activity was postponed in 2002 due to HORUS funding limitations.
Our primary research objectives are to satisfactorily deal with two problems of scale that arise in this application context: large ontologies and large knowledge bases. The scaling problems are not related to software performance, but to controlling the volume and organization of information displayed interactively to a user.
In 2002 we developed a reference architecture for our MS Word-resident markup tool. That architecture calls for markup to be visualizable, but often hidden, as ActiveX controls embedded within the document itself. It calls for interactively created markup to be produced through GUI extensions to MS Word itself, with three external, GUI-free, servers providing input to Word. An ontology server (OS) allows Word to enumerate terms from the ontology being used, to ask about subclass/subproperty/disjointness relationships among terms, and determine domain and range restrictions on properties. A knowledge base server (KBS) provides an external collection of instances that might need to be referenced from the markup. Finally, a natural language analyzer (NLA) suggests plausible markup for English text fragments submitted to it by Word. A demonstration was provided with limited-capability placeholders for each of the three external servers.
In 2003 one of our objectives is to instantiate the architecture with realistic implementations of each of the three servers. In each case, the target server is existing software produced elsewhere in the DAML or HORUS community. Each of the three will be addressed below.
The HORUS tool developers’ current tools for producing and consuming markup exhibit certain scaling problems that are equally apparent in our existing tools. These relate to the size of ontologies and to the volume of data in a knowledge base.
Ontology size is a problem when a user must manually select a term (class or property) to use in markup. With ontologies as large as HORUS’s, the available context (e.g., knowing the identity of the domain object when choosing a property to assert about it,) together with the logical constraints on consistency imposed by the ontology itself, are generally insufficient to produce a small number of possibly valid choices. As a result, manual selection from a (sorted) list of legal possibilities – the easiest mechanism to implement – is not a realistic approach.
An analogous problem arises when a large knowledge base is the source of potential instances for a user to include in markup. The constraints imposed by local context (e.g., knowing the identity of a property when choosing the domain object for it,) together with the logical constraints on consistency imposed by the ontology itself, are generally insufficient to produce a small number of possibly valid choices.
We will introduce several mechanisms into our MS Word markup tool that deal with these issues. Most significant is the use of markup templates. A markup template is a prepackaged combination of incomplete text and incomplete markup. The user’s job is to fill in the missing components of the markup. As he does so, the tool will fill in the missing text based on his selections. The text thus supplied may still be manually edited to provide acceptable quality. Templates take advantage of knowing more about a user’s document composition task than simply “write text and mark it up using terms from this ontology”. Templates avoid some selections entirely because ontology terms and instance references are already present. Missing slots of templates have associated constraints that restrict the possible fillers. Since these constraints will generally be more restrictive than those imposed by the ontology alone, the size of a set of allowable fillers – whether ontology terms or instance references – will be reduced.
Initially we will support local constraints in our templates. These are restrictions on what filler may be chosen for a single occurrence of an ontology term or instance reference. The initial version will also support equality constraints – requiring that the same term or reference be used to fill multiple slots in the template.
Formally, a constraint on an instance reference is simply a (DQL) query with a single variable. Constraints on the selection of ontology terms can be expressed as anonymous classes or properties in DAML (OWL), implying that only a subclass/subproperty of the constraining class/property can be chosen as the filler.
We anticipate that further leverage will accrue from introducing cross-slot constraints in the future. Once it is formally defined and supported by tools, the DAML Rules language should enrich the space of constraints that can be expressed.
The target application for templates is production of markup for the summaries that must be included in HORUS documents. Management determines requirements for the content of these summaries. These requirements determine the content of the templates. We will provide a GUI, based on our existing interface for markup in MS Word, for template authoring. Although a template could be saved as pure DAML, it makes more sense to save it as an MS Word “.dot” document. This will make it straightforward for the template author to support authors by providing (initial) formatting information in the textual portion of the template. This approach is analogous to our use of PowerPoint to produce the graphic templates used by the Briefing Associate.
The DQL specification, though still lacking sufficient detail to allow independent implementation of clients and servers, is now sufficiently fleshed out that we feel comfortable in trying to use it as the basis for our query clients. Querying is central to several activities in the production of marked-up documents, such as importing data into a briefing and providing menus of relevant instances.
In order to provide a realistic test platform for the HORUS community, we must permit the HORUS knowledge base query server to fill the role of KBS in our architecture. This server does not support, and is not likely (any time soon) to be converted to support the DQL specification. We will write a simple “wrapper” for the HORUS server that provides the necessary DQL client interface, converting queries to the form required by HORUS and converting answers from HORUS to the form required by DQL. This will not provide the performance advantages of the DQL protocol, but will ensure that our core software can interoperate with any DQL-compliant server, rather than with HORUS alone.
Several tools are available that parse and internalize DAML ontologies. Although none is specifically designed to answer questions about ontologies, several provide correct, if sometimes incomplete answers to useful questions about ontologies. Our demo software used Ontomat as on ontology server. The 2003 software will likely replace that with BBN’s Jena.Net. If performance proves to be problematic with the HORUS ontology, we will enhance the Jena.Net implementation with better indexing for answering these queries.
We have been using, and will continue to use, Lockheed’s AeroDaml (in its client-server, not web-based, configuration) as our NLA. Unlike ontology servers and knowledge bases, there are unlikely to be many implementations of NLAs, so we will continue to use AeroDaml’s idiosynchratic client-server interface. To date, we have used a “generic” AeroDaml natural language ruleset, so the markup produced is tied to the HORUS ontology (with the help of some simple term mapping) only at some very general concepts -- person, place, organization. In 2003, we will ask for permission to use the ruleset (developed by Lockheed) that tailors AeroDaml’s output directly to HORUS. This, together with the HORUS KB connection, will enable us to explore GUI and tools that are likely to be needed for refining NLA-produced markup in realistic scenarios.
We have experimented with BBN’s “Jena.Net” library as a replacement for our reliance on the Java-based RDF API. This provides us with substantial technical simplification, since our own software is .Net-, rather than Java- based. We hope that BBN will add an OWL vocabulary layer to this tool, which will leave very little for us to do to make our software OWL-, rather than DAML- compliant. If BBN chooses not to make this extension near-term, we will make sufficient extensions ourselves to enable the production of OWL output.
Work in 2002 brought the visual ontology editing capabilities of the Briefing Associate into much closer compliance with DAML. (The ontology editing capabilities are required to visually annotate an ontology, a prerequisite for using the onotology to produce marked-up briefings.) The definition of the OWL-Lite language subset removes most of the remaining conceptual mismatches. We intend to achieve and maintain a direct correspondence between OWL-Lite and the Briefing Associate’s ontology editor.