closed available at https://www.daml.org/homework/1/lessons/bbn.html 2000-10-31 [email protected]

BBN: Homework 1 Lessons Learned

This page describes how we produced our homework ontology and content and our experiences in doing so.

Data Model

We essentially modelled the problem domain using an Entity/Relationship/Attribute model commonly found in the database world:

Entities: attributes
- Person: givenName, surname
- Organization
  - Agency: name
  - Company: name
  - School: name
- Location: name, in
  - City
  - State
  - Country
- WorkLocation: street, city, zip
- Program: agency
Relationships (Between): attributes
- Employee (Person and Organization): title, work_address, work_email, work_phone
- Degree (Person and School): degree, field, year
- Contract (Company and Program): role
- Role (Contract and Employee): role

We made Employee a relation since a Person may simulaneously work for multiple organizations, e.g. Jim Hendler at DARPA and UMaryland.

We designed Location as a partonomy, principally to make it easy to ask queries like "works in Cambridge" or "works in Massachusetts" or "works in US". WorkLocation probably should have been a subclass of Location. Zip codes don't always align with cities. It would have been nice to draw on an existing Location ontology.

name should have been defined at the Organization level rather than its subclasses.

We thought about using "enumerations" for project roles (pi, pm, developer, etc.) but were concerned that these would be too constraining. Such enumerations might have provided a nice opportunity for mapping between ontologies, e.g. our pi is equivalent to someone else's PrincipalInvestigator.

The overloading of "role" is unfortunate.

Given our modelling approach, the ability to define this DAML ontology in UML would have probably been nice!

Content Generation

We used a schema-based graph editor, paramedit, that we had previously developed for the DARPA JFACC program and modified it to generate RDF and RDF Schema using RDF API (119 lines of Java code to generate RDF and 159 lines to generate RDFS). RDF API worked quite well, although we didn't discover how to make it generate ID rather than about attributes for Classes, etc.

The schema was coded in Java (available here).

Data was persistently stored in XML. There were some complications here, primarily related to the of XML ID/IDREF attributes to implement the graph structure:

the inability to put spaces in ID attributes
the need to generate identifiers for relation objects (Employee, Degree, etc.)

Maintaining the objects directly in DAML would have avoided these problems.

Running paramedit resulted in 1 .rdfs file and 1 .rdf file. We manually distributed the statement blocks from the .rdf file among our .html pages. The .rdfs file was renamed to have a .daml extension.

Issues

Outstanding issues that we identified include:

mixing content
1. When mixing HTML and RDF, what <!DOCTYPE> should be used? We couldn't find a way to apply different doctypes to separate sections of the document.
2. RDF API makes nice use of XML entities to make its output more readable. When mixing HTML and RDF, we couldn't find a way to define the entities in such a way that the resulting content was correctly processed by both RDF API and Netscape. We ended-up manually expanding the entities in the HTML content.
3. We're beginning to conclude that mixing HTML and RDF in a single document is undesirable. Most clients are going to be interested in HTML (browsers for people) or DAML (agents/programs), but not both. This also inhibits content negotiation.
naming
1. Even when you own a chunk of Internet namespace (e.g. http://www.daml.org/projects/integration/...) coming up with a pleasing and effective schema for generating URIs is still harder than one might initially expect. We hope that some "Best Practices" for naming will emerge from (other people's) homework assignments.
2. What URIs should we use for well-known external entities such as Verizon or Stanford University?
3. An artifact of our automated URI generation scheme caused multiple Degrees for a single Person to get combined into one resource. Even though we did some visualization on the RDF graph using AT&T GraphViz, we didn't notice this until we started distributing the statements among the .html pages. A similar effect caused the unintended merging of project roles, although that behavior might be viewed as reasonable/correct.
4. We like the idea, borrowed from W3C (e.g. PR-rdf-schema-19990303), of using "date space" to distinguish among ontology versions (e.g. projects-20000911.daml).
linking
1. RDF's use of URIs rather than URLs for resources doesn't really provide any definitive information about where one could look to find additional statements about a resource. It might be useful to have some means within the DAML language (e.g. daml:about vs. daml:aboutURL) differentiating between the use of abstract URIs (names only) and URLs (where an object is defined as an ID on the specified page, along with other statements about that object).
2. It wasn't clear where to put some of the statements (e.g. City, shared WorkLocation's, etc.). We moved the statements that pertained to a specific individual (typically around 22 statements) to their home page, and left all of the others on our project page.
typing
1. We'd like to have more specific typing on Property range's than Literal, e.g. to identify strings, numbers, URIs, URLs, etc. Building subclasses of Literal based on XML Schema would probably be a good start.
validation
1. It would be desirable to have a DAML validation layer above RDF API that included such features as:
  - checking that all namespaces are accessible (warning)
  - ensuring that all Classes and Properties used are defined (error)
  - checking domain and range constraints on Properties (error)
  - identifying resources used as objects that are not also subjects (warning)
  - checking any cardinality constraints (error)
  - ...

Possible Future Work

There are several things we would have liked to do, but didn't get around to:

Use Person from daml-ex rather than defining our own.
Our HTML and DAML content were essentially developed independently. Some facts that exist in the HTML (e.g. spouses and children, former projects, etc.) aren't currently in our ontology.
Update to use daml-ont rather than RDF Schema.
We met the spirit of the assignment, but not the letter. We currently have 107 "ontology statements" describing 15 Classes and 26 Properties and 271 "instance statements" (as reported by the ListSchema.java and CountStatements.java utilities we developed based on RDF API), but have only actually marked up 5 HTML pages.

$Id: lessons1.html,v 1.11 2000/11/01 17:02:07 mdean Exp $