BBN: Homework 1 Lessons Learned
This page describes how we produced our
homework
ontology and
content
and our experiences in doing so.
Data Model
We essentially modelled the problem domain using an
Entity/Relationship/Attribute model commonly found
in the database world:
- Entities: attributes
- Person: givenName, surname
- Organization
- Agency: name
- Company: name
- School: name
- Location: name, in
- WorkLocation: street, city, zip
- Program: agency
- Relationships (Between): attributes
- Employee (Person and Organization): title, work_address, work_email, work_phone
- Degree (Person and School): degree, field, year
- Contract (Company and Program): role
- Role (Contract and Employee): role
We made Employee a relation
since a Person may simulaneously work for multiple organizations,
e.g. Jim Hendler at DARPA and UMaryland.
We designed Location as a partonomy,
principally to make it easy to ask queries like
"works in Cambridge" or "works in Massachusetts" or "works in US".
WorkLocation probably should have been a subclass of Location.
Zip codes don't always align with cities.
It would have been nice to draw on an existing Location ontology.
name should have been defined at the Organization level rather than its
subclasses.
We thought about using "enumerations" for project roles
(pi, pm, developer, etc.)
but were concerned that these would be too constraining.
Such enumerations might have provided a nice opportunity for mapping
between ontologies, e.g. our pi is equivalent to someone else's
PrincipalInvestigator.
The overloading of "role" is unfortunate.
Given our modelling approach,
the ability to define this DAML ontology in UML would have probably been nice!
Content Generation
We used a schema-based graph editor, paramedit,
that we had previously developed for the DARPA JFACC program
and modified it to generate RDF and RDF Schema using RDF API
(119 lines of Java code to generate RDF and 159 lines to generate RDFS).
RDF API worked quite well,
although we didn't discover how to make it generate ID rather than about
attributes for Classes, etc.
The schema was coded in Java
(available here).
Data was persistently stored in XML.
There were some complications here,
primarily related to the of XML ID/IDREF attributes to implement the
graph structure:
- the inability to put spaces in ID attributes
- the need to generate identifiers for relation objects (Employee, Degree, etc.)
Maintaining the objects directly in DAML would have avoided these problems.
Running paramedit resulted in 1 .rdfs file and 1 .rdf file.
We manually distributed the statement blocks from the .rdf file
among our .html pages.
The .rdfs file was renamed to have a .daml extension.
Issues
Outstanding issues that we identified include:
- mixing content
- When mixing HTML and RDF,
what <!DOCTYPE> should be used?
We couldn't find a way to apply different doctypes to separate sections
of the document.
- RDF API makes nice use of XML entities to make its output more readable.
When mixing HTML and RDF,
we couldn't find a way to define the entities in such a way that the resulting
content was correctly processed by both RDF API and Netscape.
We ended-up manually expanding the entities in the HTML content.
- We're beginning to conclude that mixing HTML and RDF in a single
document is undesirable.
Most clients are going to be interested in HTML (browsers for people)
or DAML (agents/programs),
but not both.
This also inhibits content negotiation.
- naming
- Even when you own a chunk of Internet namespace
(e.g. http://www.daml.org/projects/integration/...)
coming up with a pleasing and effective schema for generating URIs
is still harder than one might initially expect.
We hope that some "Best Practices" for naming will emerge from (other people's)
homework assignments.
- What URIs should we use for well-known external entities such as Verizon
or Stanford University?
- An artifact of our automated URI generation scheme
caused multiple Degrees for a single Person to get combined into one
resource.
Even though we did some visualization on the RDF graph using AT&T GraphViz,
we didn't notice this until we started distributing the statements
among the .html pages.
A similar effect caused the unintended merging of project roles,
although that behavior might be viewed as reasonable/correct.
- We like the idea, borrowed from W3C (e.g. PR-rdf-schema-19990303),
of using "date space" to distinguish among ontology
versions (e.g. projects-20000911.daml).
- linking
- RDF's use of URIs rather than URLs for resources
doesn't really provide any
definitive information about where one could look to find additional
statements about a resource.
It might be useful to have some means within the DAML language
(e.g. daml:about vs. daml:aboutURL)
differentiating between the use of abstract URIs (names only)
and URLs (where an object is defined as an ID on the specified page,
along with other statements about that object).
- It wasn't clear where to put some of the statements
(e.g. City, shared WorkLocation's, etc.).
We moved the statements that pertained to a specific individual
(typically around 22 statements)
to their home page, and left all of the others on our project page.
- typing
- We'd like to have more specific typing on Property range's than Literal,
e.g. to identify strings, numbers, URIs, URLs, etc.
Building subclasses of
Literal based on XML Schema would probably be a good start.
- validation
- It would be desirable to have a DAML validation layer above RDF API that included such features as:
- checking that all namespaces are accessible (warning)
- ensuring that all Classes and Properties used are defined (error)
- checking domain and range constraints on Properties (error)
- identifying resources used as objects that are not also subjects (warning)
- checking any cardinality constraints (error)
- ...
Possible Future Work
There are several things we would have liked to do,
but didn't get around to:
- Use Person from
daml-ex
rather than defining our own.
- Our HTML and DAML content were essentially developed independently.
Some facts that exist in the HTML (e.g. spouses and children, former projects,
etc.)
aren't currently in our ontology.
- Update to use
daml-ont
rather than RDF Schema.
- We met the spirit of the assignment, but not the letter.
We currently have 107 "ontology statements"
describing 15 Classes and 26 Properties
and 271 "instance statements"
(as reported by the ListSchema.java and CountStatements.java utilities
we developed based on RDF API),
but have only actually marked up 5 HTML pages.
$Id: lessons1.html,v 1.11 2000/11/01 17:02:07 mdean Exp $