DARPA DAML IOW

Franklin W. Olin College of Engineering

Lynn Andrea Stein, PI

February 1, 2002

1. What is the technical goal/accomplishment you are hoping to achieve?

2. How are you planning to fit into the context of the DAML Experiment?

3. What do you think the next logical step will be in the 1-2 years after?

This project group has two distinct but interrelated contribution areas.

1. DAML infrastructure involves the adaptation of a substantial body of work in knowledge representation to the dynamic needs of a scalable, distributed infrastructure as represented by both the World Wide Web and DOD’s own HUMINT networks. As an internationally known expert in knowledge representation, the PI of this grant has worked closely in recent years with the leadership of the World Wide Web Consortium to bridge the gap between these two fields. She was author (with Dan Connolly of the Web Consortium) of the original DAML-O specification on which the current DAML+OIL and future DAML Experiment will be built. Future efforts will continue this focus on sound fundamental underpinnings for the DAML infrastructure.

2. DOD operations are critically dependent on HUMINT. This grant group’s efforts are focused on document management ontologies and tools to support information management and decision-making. We will deliver one such toolset as described below.

Contributions to Fundamentals

DAML must be built on sound foundations; this is something that the KR community has long understood. But DAML must also be pragmatic and effective; this is something that is integral to the success of the World Wide Web, software development generally, and the DOD above all. The PI brings an unusual background in knowledge representation, software development, and web/information management that uniquely positions her to be a bridge between the several communities on which DAML development efforts build. Her role as co-PI of the MIT/W3C DAML grant acknowledges this, although she now receives funding primarily through this separate Rome AFB grant to Olin College. To date, she has contributed DAML-O (the original DAML Ontology Language) and an agent-based critique of DAML-S (the DAML Services Infrastructure) and plays a critical role in DAML-JC as well as Web Consortium working sessions on these issues.

In the next year, these efforts will continue, including the development of models of reasoners consistent with DOD needs and the explication and dissemination of these efforts to both theoretical and pragmatic audiences. The effort involved in the foundational piece of this IOW is less directly tied to the DAML Experiment, though the fundamental issues she continues to address are essential to the foundation on which that and all other DAML efforts will build.

Contributions to Tools and Applications

Suppose that intelligence analysts are producing intelligence reports about Afghanistan. There are several types of reports with varying review cycles and timeliness requirements. All the reports are archived using a DAML-based system such as the one that we are developing. Using document-life-cycle enhanced DAML tools such as we are building, consumers of intelligence could get answers to questions like:

· What is the most recent report on road traffic in Afghanistan?

· How often have we revised our estimates on fuel supplies in the Northeast region?

· How many reports are based on the raw data given in document ID xxx?

· Which documents have been entered into the Operational Net Assessment database?

In the next year, we will develop and deploy a set of DAML-enhanced document management tools to provide functionality such as that described in the above scenario. These tools will be based on standard COTS systems modified for DAML compatibility. In addition to their suitability for HUMINT information management applications, these tools will be appropriate for software development and software process management applications as similar source and life-cycle issues arise in that domain.

In particular, we will:

1. Select an ontology describing document life cycle (based on existing DAML ontologies or military intelligence practices)

2. Map versioning (CVS repository) data to this ontology.

3. Develop a tool to extract data from a CVS repository into a DAML [DFG1] data structure.

4. [Optional] Full text scan CVS document text to further categorize the documents. (Links to other documents, noting people names, other heuristics)

5. Develop a viewer for the DAML data (perhaps in collaboration with other DAML groups).

6. Develop applications for the DAML data

A. Human Intelligence Evaluation

· Version histories of documents

· Chains of document references

B. Software Engineering Metric generation

· who has most frequent check-ins?

· who has most new lines of code?

· what files are the most checked in?

C. Collaborative Workspace (wiki) monitoring

· who is discussing a topic

· what pages generate the most links and follow-ups

Code will be developed primarily in Python. Some would be hooked into CVS infrastructure, some would be standalone, some would be hooked into collaborative workspace (wiki CGI).

In addition to the above deliverables, we intend to continue our collaboration with the MIT/W3C group and to integrate our tools and theirs to the extent feasible. One tool likely to come out of that group and particularly amenable to this integration is the Haystack Personal Information Environment. Like our own efforts at Olin, Haystack addresses issues of information management. However, as a standalone system, Haystack includes additional content extraction features that would provide a superb complement to the document lifecycle and collaborative aspects of the Olin work.