WebScripter 2002 Intent Of Work (IOW)

Pedro Szekely, Bob Neches, Martin Frank

Information Sciences Institute, University of Southern California

4676 Admiralty Way, Marina del Rey, CA 90245

(310) 822-1511, [email protected], [email protected], [email protected]

Intro/Overview

Our work to date has produced a system that excels at letting users integrate independently-developed distributed DAML data into a single report. In doing so, the users do not just accomplish this immediate task, but also implicitly align the concepts and attributes of DAML data from multiple ontologies - information that can then be exploited for more automated ontology translation in the future.

The current system benefits both consumers and producers of information. It benefits consumers because it makes it easy to produce automatically updated reports pulling information from multiple DAML sources defined using multiple ontologies. It also benefits producers because a by-product of building a report is an articulation ontology that captures equivalences among classes and properties form all ontologies from where data was pulled.

Our plans for the coming year are to leverage these results with a refocus that retains these benefits but also adds strong support for (1) low-overhead generation of semantic mark-up on legacy materials that lack such markup; (2) rapid convergence on lightweight ontologies that support the needs of a rapid response team. By this we mean activities where a small group of potentially distributed users is tasked to put quickly together a knowledge base about a focused topic. For example, in the DAML Experiment Operational Net Assessment (ONA) scenario, a small group of people may get asked to put together a knowledge base about bridges in Afghanistan in order to support evaluation of courses of action for an operation.

Rapid response knowledge production efforts would not attempt to build a comprehensive knowledge base about a topic, but rather, one that is tailored to the task at hand. For example, in the Afghanistan bridge scenario it might be important to relate bridges to caves and to routes that link caves, and issue that would obviously not have been considered before. However, a side effect of their focused effort should be reusable knowledge bases that can help other tiger teams’ jump-start their efforts.

We will apply, demonstrate, and evaluate this work by focusing on supporting tiger teams for rapid response to emergent knowledge requirements within the DAML ONA scenario (as well as other technology transition opportunities as time permits).

Approach

Our original conception of WebScripter was to make it easy to build reports of existing DAML sources. We quickly realized that a WebScripter report is not just a report. When users put together a WebScripter report pulling information from multiple DAML sources, they implicitly define relationships that can be used to align different ontologies. WebScripter makes these relationships explicit as an aritculation ontology. In this phase of the work we propose to take WebScripter reports one step further and use them as a knowledge entry device.

When users build a WebScripter report, they build a structure that makes explicit the relationships between information elements that are important and relevant to a task. The columns in the report are the important relationships, and the ones missing are not. Pulling information to fill in the report makes explicit relationship among data elements that perhaps didn't exist in the original sources (this is what the articulation ontologies do). Some, or many cells in the report will be left empty. These empty cells are place holders for missing information.

We propose to make WebScripter reports editable so that users can fill in the empty cells, and thus record new knowledge. We propose to do so in two ways.

First, with straight knowledge entry the user types in the value or a reference to another DAML entity. Even though this is similar to traditional instance editors, it differs in one significant way. Traditional instance editors load an instance object from a knowledge base and edit all its properties. Our tool enables the user to 1) enter a value for just one attribute of an object that they otherwise do not own, 2) Define a new property about an instance (the column name is the new property) and enter a value for it.

Second, with semi-structured knowledge entry the user inserts into a cell a reference to a segment of a Web page. For example, suppose the user is entering information about the status of a bridge. Instead of entering a value (e.g., no-vehicle-traffic-possible), the user could enter a reference to a fragment of a Web page (maintained by someone else) that has information about the status of targets. When viewed by end users, the report cell would show the latest version of the fragment of HTML being referenced. Thus, the user sees the latest information. In addition, WebScripter would create an external DAML markup for the source page to record that it contains information about the status of a particular bridge.

The semi-structured knowledge does not support completely automated reasoning by agents. However, the recorded DAML tells the agent that the information can be made available, and if needed for a decision, the agent can prompt a user showing them the report cell that contains the semi-structured value, and asking him or her to provide a formal specification of the value that can be used for reasoning.

Support For Tiger Teams For Rapid Response To Emergent Knowledge Requirements

The proposed WebScripter work supports tiger teams for rapid response to emergent knowledge requirements because it enables a team of people to work as follows:

The team can quickly agree on initial report structures to enter information because they know that the structures can be changed if needed and quickly reconciled.
If existing knowledge is available about the topic it can be pulled in using the report filling capabilities of WebScripter.
Users can divide up the work flexibly. They can divide the work by columns, i.e., each person specializes on particular relationships, even though they can enter knowledge anywhere. They can divide the work by rows, i.e., specialize by instance (e.g., divide work of collecting knowledge about bridges by geographical region).
Users stay synchronized because each time reports are refreshed users see the knowledge that others have entered.
Users can add new columns to the report when the need arises, e.g., to record a feature of a bridge that is relevant to the task, but was not envisioned when the initial structure was set up.
When users review each other’s work, they can collapse newly added columns with the same meaning, thus aligning the knowledge entered in them.

The reports that tiger teams would be able to produce using WebScripter are both for human and machine consumption. Humans can view the reports using the current report web pages. The reports are also for automated agent consumption because of the knowledge by-products that WebScripter generates when users build a report:

Ontology classes and properties: each report defines a class and each column defines a property.
Articulation ontologies: if the report pulls in information from other DAML sources it generates an articulation ontology that links the new ontology to other ontologies.
Instance data: the value of each cell in a report is part of an instance.
DAML markup for previously non-DAMLized web sources used in the report (semi-structured knowledge).

The proposed WebScripter work plays a part in the integration experiments by supporting creation of knowledge base instances AND ontologies at a very low cost. By using the semi-structured knowledge, it enables quick DAMLization of Web sources. Should agents need the information for automated reasoning, humans would be alerted to finish the formalization of the knowledge. This enables users to focus the knowledge formalization efforts on the aspects of the knowledge that agents will use. Finally, the technology for constructing the semi-structured knowledge can be used to put together knowledge portals that combine information for multiple reports about a common topic. These portals would be of interest to the ONA work (e.g., Afghanistan bridge and cave portals), and are of particular interest to the INTELINK community.

Additional Target Applications

Since WebScripter is intended to be a general-purpose tool, it is useful for a wide range of applications. To the extent that time permits, we intend to pursue demonstrations of utility for additional applications, e.g., the following.

Intelink Portal Automation

Intelligence reports and much other data on the government Intelink will likely never be fully machine-readable, yet we see great value in DAML even when used purely as meta-data as outlined above. We especially believe that this classification of existing HTML-based information streams will be helpful in building and maintaing information portals for specific locations (e.g. the U.S. embassy in Argentina) and for specific tasks (e.g. assessing the viability and availability of regional seaports for re-supplying ships). That is, it enables technical user support organizations to provide many, many more custom portals than they could provide if they would manually build them.

Distributed Marine Corps Flight Schedule Management

We are currently building SNAP, a system for Marine Corps Harrier squadron flight schedule management which is in experimental use at Marine Air Group 13 in Yuma, Arizona. Flight scheduling is currently done by each squadron in isolation based on external assignments of resources such as simulators and ranges that are published well in advance, typically monthly. Consequently, many opportunities for a more optimal utilization of the shared resources are missed, such as trading of range slots between squadrons or with the several range management authorities involved (some of that happens by telephone at the moment, but that inherently limits the search space exploration).

With the low-cost setup and integration of dynamic DAML data sources outlined above, it would be possible for each of the range management authorities to set up a dynamic DAML source to which interested squadrons subscribe, and that notifies them when e.g. a squadron releases a range slot or which ranges slots a squadon is not currently planning to use and willing to trade. These notifications can initially be interpreted strictly by humans, and can then be increasingly automated.

Planned Results and Products

The complete WebScripter package will consist of our current tools for authoring reports that retrieve and present information from DAMLized sources plus a DAML creation tool with unique capabilities:

Easy to use because knowledge entry is done in the context of the final products that users work on. Users won’t have to learn specialized knowledge editing tools.
Support for distributed users. Each user can work on particular aspects of the knowledge, and also see the contributions from others as reports get populated.
Support for creating DAML markup for external sources of information.
Support for formalizing knowledge in stages.
Ability to quickly start knowledge formation tasks by reusing available knowledge and aligning and extending it for the particular task at hand.

We propose to evaluate the new WebScripter technology by taking measurements on distributed teams of users generating required knowledge bases. We argue that our tools are unique in the level of support for distributed creation of knowledge, so just getting distributed teams of users to produce knowledge bases will provide an initial measure of success. We expect that several teams will use our tools, and we propose to measure the size of the teams, the start-up time to agree on an initial structure for the knowledge and a division of labor, the size of the knowledge-base created, the time to complete a satisfactory version, and the number of re-formalizations (changes to the ontology) that occur while instances are being created.

We expect WebScripter to be effective for small teams of less than a dozen people creating knowledge-bases containing a few concepts, several dozen properties and hundreds of instances. As we understand it, the DAML ONA experiment will provide many examples of such knowledge-bases (almost all leaf classes in the proposed ontology are a good fit).