Solving the Nitty-Gritty
Ontology references can have typos
DAML is mixed with XML and HTML and all
may be malformed
Sites change, are unavailable for short, long
and permanent periods
Massive amounts of content – space and time
efficiency issues
53,000 pages, 3.7 million DAML statements,
800MB crawled DB size
Currently using SirPac, exploring JENA