Pedro Szekely, Bob Neches, Martin Frank
Information Sciences Institute, University of
Southern California
4676 Admiralty Way, Marina del Rey, CA 90245
(310) 822-1511, szekely@isi.edu,
neches@isi.edu, frank@isi.edu
Our work to date has produced a system that excels at letting users integrate independently-developed distributed DAML data into a single report. In doing so, the users do not just accomplish this immediate task, but also implicitly align the concepts and attributes of DAML data from multiple ontologies - information that can then be exploited for more automated ontology translation in the future.
The current
system benefits both consumers and producers of information. It benefits
consumers because it makes it easy to produce automatically updated reports
pulling information from multiple DAML sources defined using multiple ontologies.
It also benefits producers because a by-product of building a report is an
articulation ontology that captures equivalences among classes and properties
form all ontologies from where data was pulled.
Our plans for the coming year are to leverage these results with a refocus that retains these benefits but also adds strong support for (1) low-overhead generation of semantic mark-up on legacy materials that lack such markup; (2) rapid convergence on lightweight ontologies that support the needs of a rapid response team. By this we mean activities where a small group of potentially distributed users is tasked to put quickly together a knowledge base about a focused topic. For example, in the DAML Experiment Operational Net Assessment (ONA) scenario, a small group of people may get asked to put together a knowledge base about bridges in Afghanistan in order to support evaluation of courses of action for an operation.
Rapid response knowledge production efforts would not attempt to build a comprehensive knowledge base about a topic, but rather, one that is tailored to the task at hand. For example, in the Afghanistan bridge scenario it might be important to relate bridges to caves and to routes that link caves, and issue that would obviously not have been considered before. However, a side effect of their focused effort should be reusable knowledge bases that can help other tiger teams’ jump-start their efforts.
We will apply, demonstrate, and evaluate this work by focusing on supporting tiger teams for rapid response to emergent knowledge requirements within the DAML ONA scenario (as well as other technology transition opportunities as time permits).
Our original conception of WebScripter was to make it easy to build reports of existing DAML sources. We quickly realized that a WebScripter report is not just a report. When users put together a WebScripter report pulling information from multiple DAML sources, they implicitly define relationships that can be used to align different ontologies. WebScripter makes these relationships explicit as an aritculation ontology. In this phase of the work we propose to take WebScripter reports one step further and use them as a knowledge entry device.
When users build a WebScripter report, they build a structure that makes explicit the relationships between information elements that are important and relevant to a task. The columns in the report are the important relationships, and the ones missing are not. Pulling information to fill in the report makes explicit relationship among data elements that perhaps didn't exist in the original sources (this is what the articulation ontologies do). Some, or many cells in the report will be left empty. These empty cells are place holders for missing information.
We propose to make WebScripter reports editable so that users can fill in the empty cells, and thus record new knowledge. We propose to do so in two ways.
First, with straight knowledge entry the user types in the value or a reference to another DAML entity. Even though this is similar to traditional instance editors, it differs in one significant way. Traditional instance editors load an instance object from a knowledge base and edit all its properties. Our tool enables the user to 1) enter a value for just one attribute of an object that they otherwise do not own, 2) Define a new property about an instance (the column name is the new property) and enter a value for it.
Second, with semi-structured knowledge entry the user inserts into a cell a reference to a segment of a Web page. For example, suppose the user is entering information about the status of a bridge. Instead of entering a value (e.g., no-vehicle-traffic-possible), the user could enter a reference to a fragment of a Web page (maintained by someone else) that has information about the status of targets. When viewed by end users, the report cell would show the latest version of the fragment of HTML being referenced. Thus, the user sees the latest information. In addition, WebScripter would create an external DAML markup for the source page to record that it contains information about the status of a particular bridge.
The semi-structured knowledge does not support completely automated reasoning by agents. However, the recorded DAML tells the agent that the information can be made available, and if needed for a decision, the agent can prompt a user showing them the report cell that contains the semi-structured value, and asking him or her to provide a formal specification of the value that can be used for reasoning.
The proposed WebScripter work supports tiger teams for rapid response to emergent knowledge requirements because it enables a team of people to work as follows:
The reports that tiger teams would be able to produce using WebScripter are both for human and machine consumption. Humans can view the reports using the current report web pages. The reports are also for automated agent consumption because of the knowledge by-products that WebScripter generates when users build a report:
The proposed WebScripter work plays a part in the integration experiments by supporting creation of knowledge base instances AND ontologies at a very low cost. By using the semi-structured knowledge, it enables quick DAMLization of Web sources. Should agents need the information for automated reasoning, humans would be alerted to finish the formalization of the knowledge. This enables users to focus the knowledge formalization efforts on the aspects of the knowledge that agents will use. Finally, the technology for constructing the semi-structured knowledge can be used to put together knowledge portals that combine information for multiple reports about a common topic. These portals would be of interest to the ONA work (e.g., Afghanistan bridge and cave portals), and are of particular interest to the INTELINK community.
Since WebScripter
is intended to be a general-purpose tool, it is useful for a wide range of
applications. To the extent that time
permits, we intend to pursue demonstrations of utility for additional
applications, e.g., the following.
Intelligence
reports and much other data on the government Intelink will likely never be
fully machine-readable, yet we see great value in DAML even when used purely as
meta-data as outlined above. We especially believe that this classification of
existing HTML-based information streams will be helpful in building and
maintaing information portals for specific locations (e.g. the U.S. embassy in
Argentina) and for specific tasks (e.g. assessing the viability and
availability of regional seaports for re-supplying ships). That is, it enables
technical user support organizations to provide many, many more custom portals
than they could provide if they would manually build them.
We are currently
building SNAP, a system for Marine Corps Harrier squadron flight schedule
management which is in experimental use at Marine Air Group 13 in Yuma,
Arizona. Flight scheduling is currently done by each squadron in isolation
based on external assignments of resources such as simulators and ranges that
are published well in advance, typically monthly. Consequently, many opportunities
for a more optimal utilization of the shared resources are missed, such as
trading of range slots between squadrons or with the several range management
authorities involved (some of that happens by telephone at the moment, but that
inherently limits the search space exploration).
With the low-cost
setup and integration of dynamic DAML data sources outlined above, it would be
possible for each of the range management authorities to set up a dynamic DAML
source to which interested squadrons subscribe, and that notifies them when
e.g. a squadron releases a range slot or which ranges slots a squadon is not
currently planning to use and willing to trade. These notifications can
initially be interpreted strictly by humans, and can then be increasingly
automated.
The complete WebScripter package will consist of our current tools for authoring reports that retrieve and present information from DAMLized sources plus a DAML creation tool with unique capabilities:
We propose to evaluate the new WebScripter technology by taking measurements on distributed teams of users generating required knowledge bases. We argue that our tools are unique in the level of support for distributed creation of knowledge, so just getting distributed teams of users to produce knowledge bases will provide an initial measure of success. We expect that several teams will use our tools, and we propose to measure the size of the teams, the start-up time to agree on an initial structure for the knowledge and a division of labor, the size of the knowledge-base created, the time to complete a satisfactory version, and the number of re-formalizations (changes to the ontology) that occur while instances are being created.
We expect WebScripter to be effective for small teams of less than a dozen people creating knowledge-bases containing a few concepts, several dozen properties and hundreds of instances. As we understand it, the DAML ONA experiment will provide many examples of such knowledge-bases (almost all leaf classes in the proposed ontology are a good fit).