This directory contains a
DAML+OIL (March 2001)
representation of the 2001
CIA World Fact Book.
Example
Here is information about the United States in
HTML,
DAML,
and
HyperDAML
formats.
Approach
We processed the
downloadable HTML pages
using:
Notable features include:
- properties are added to existing
FIPS country instances
rather than creating new objects
- information for each country is put on a separate page,
to minimize retrieval time and parallel the HTML representation
- subclasses of
factbook:AdministrativeDivision
(e.g.
State
,
Province
,
and Territory
)
are created and populated for each country
- parenthesized comments about statements are captured using
reification.
Structured comments such as
(1997 est.)
are reified using specific properties.
- unit-of-measure information is associated with numeric properties
- each country is linked to its bordering countries and dependent areas
- countries are linked to shared objects representing
commodities,
international organizations,
religions,
etc.
A list of the resulting pages is available
here.
These pages currently total about 134,000 DAML statements.
About 75% of the World Fact Book
fields
are currently being processed.
Possible Future Directions
- extend
gendaml.java
and
factbook-ont.daml
to systematically capture information that cannot currently be parsed
- extend the unit-of-measure representation to support reasoning about measures
- try to get access to the original database from which the HTML pages were presumably generated
- update to the 2002 edition of the CIA World Factbook
Related Work
Stanford KSL
processed
the SGML from the 1995 CIA World Fact Book CD-ROM
for the DARPA HPKB program,
and subsequently made that information available in
DAML-ONT
and
DAML+OIL (March 2001)
formats.
Each version contains about 37,000 DAML statements.
Author
Mike Dean
$Id: index.xml,v 1.11 2003/03/08 23:29:27 mdean Exp $