Re: New DQL Specification

From: Richard Fikes ([email protected])
Date: 05/28/02

  • Next message: Peter F. Patel-Schneider: "Re: New DQL Specification"
    > A binding is defined to be a
    > > "minimal identifying description" (MID) of the object denoted by a node
    > > in the RDF graph.  The MID is the smallest connected subgraph of the RDF
    > > graph of the KB that contains the node being described for which all
    > > "tip" nodes (i.e., nodes not in a loop in the graph) are either literals
    > > or have an associated URI.  In the case where the node is a literal or
    > > has an associated URI, the binding is simply the literal or the URI.  In
    > > the case of an anonymous node, the binding is a description (in the
    > > Description Logic sense) consisting of the arcs coming into and going
    > > out from the node in the graph.  Such a description might say, for
    > > example, "a parent of Joe that has Paris as a hometown and two male
    > > siblings".  The MID of a node in effect consists of the conjunction of
    > > the RDF statements defined by the arcs into and out of the node, where
    > > each node in the description is specified either by a literal, by an
    > > associated URI, or by its MID (i.e., if an anonymous node is related to
    > > another anonymous node, then the MID of either of those nodes will
    > > include the description of the other.  For example, a MID might be "a
    > > parent of a sister of Bill", where neither the parent nor the sister has
    > > a name.).  
    > 
    > This definition is deeply flawed.
    
    > I view the MID as completely useless.  It uses ill-defined terms.  It is
    > ill-defined itself.  It does not identify nodes in an RDF graph.   It would
    > have been much better to just return internal identifiers.
    
    Well, other than that, what did you think of it?  :-)
    
    Yes, I now see that the definition is incorrect.  However, I think the
    idea is well-founded and useful, and that the formal definition can be
    debugged.  Let's try another round.
    
    The intuition for the MID of a node is that is consists of all the RDF
    triples in which the node participates.  So, to use one of your
    examples, suppose the KB contains:
    
       _:x rdf:type Person .
       _:x age "35" .
       _:x name "Peter" .
    
    And suppose that _:x does not participate as a subject or as an object
    in any other sentences in the KB.  Then the MID for _:x is simply those
    three sentences with a variable in the place of _:x; i.e., "(rdf:type ?y
    Person)", "(age ?y "35")", and "(name ?y "Peter")".  Each of 'Person',
    "35", and "Peter" are the tip nodes of the MID.
    
    The definition of a node's MID must take into account the case in which
    the node is related to another anonymous node (or to itself as in your
    "_:x loves _:x" example).  That case is accommodated by essentially
    adding in the MIDs of any such nodes to the MID being defined.  So, if
    the MID of a non-anonymous node (i.e., a node that has an associated URI
    or is a literal) is the node itself, then we can define a MID of a node
    to include the node itself, all arcs into and out of the node, and the
    MID of all nodes at the ends of those arcs.  (I think that is the
    debugged definition.)
    
    So, consider again my example "a parent of a sister of Bill".  The
    example supposes the KB contains:
    
      Bill hasSister _:s
      _:s hasParent _:p
    
    and has no other sentences that include _:s or _:p.  To construct the
    MID for _:p, we note that there is only one arc connected to _:p, namely
    the arc for the relationship "_:s hasParent _:p".  So, the MID for _:p
    will include the sentence "_:s hasParent _:p" and the MID for _:s. 
    Since _:s is an anonymous node, its MID will be all the sentences in
    which it participates.  Those sentences are "Bill hasSister _:s" and
    "_:s hasParent _:p".  So, those two sentences are the MID for both _:s
    and for _:p.  Informally, the MID for _:s is "a sister of Bill that has
    a parent", and the MID for _:p is "a parent of a sister of Bill".
    
    I hope that clarifies what we had in mind.
    
    > This definition is deeply flawed.
    > 
    > 0/ For something to be a MID, all nodes not in loops have to have URIs or
    >    be literals.  This is silly.  Instead, a much better notion for a tip
    >    node would be a node that is connected to only one other node (but even
    >    this has problems), see below.
    
    The revised definition is that the MID of a node is the graph consisting
    of the node itself, all arcs into and out of the node, and the MID of
    all nodes at the ends of those arcs.  The MID of a non-anonymous node is
    the node itself.
    
    > 1/ Suppose the KB incorporates something like
    > 
    >    _:x loves _:x .
    > 
    >    The definition of MID includes this RDF graph as an MID.
    
    Yes, that is correct.  Informally, the MID for _:x is "an entity that
    loves itself".
    
    > 3/ Suppose the only information known about some resource is that it
    >    belongs to one class, i.e., the only triple that mentions _:x is
    > 
    >    _:x rdf:type Person .
    > 
    >    Then there is no connected subgraph containing a node for _:x that
    >    satisfies the tip property, either as given or as restated.
    
    That one statement is the MID for _:x; i.e., "a person".
    
    > 4/ Suppose that the KB includes 
    > 
    >    _:x rdf:type Person .
    >    _:x age "35" .
    >    _:y rdf:type Person .
    >    _:y age "35" .
    > 
    >    and the query asks for instances of Person.
    > 
    >    How many times is the MID
    > 
    >    ?l rdf:type Person .
    >    ?l age "35" .
    > 
    >    returned?
    
    Once.
    
    Thanks for motivating the finding of the bug in the definition.  Hope
    this makes more sense and seems more useful now.
    
    Richard
    


    This archive was generated by hypermail 2.1.4 : 05/28/02 EDT