Re: deSkolemizing to get Horn-expressivity with RDF rules

From: Sandro Hawke (
Date: 11/12/02

  • Next message: Harold Boley: "Re: deSkolemizing to get Horn-expressivity with RDF rules"
    Benjamin writes:
    > A key fact about FOL, including
    > Horn FOL, to keep in mind is that in any implication,
    > existentially quantifying a variable within the scope of (only) the 
    > antecedent is
    > equivalent to pulling that variable out and universally quantifying
    > it over the scope of the whole implication.  E.g.,
    >     forall x,y,z.  (grandparent(x,z) <-  parent(x,y) and parent(y,z) )
    > is equivalent to
    >     forall x,z.  (grandparent(x,z) <- (exists y.  (parent(x,y) and parent(y,z
    > ))
    > Cleary in your messages you would like to be able to represent the converse 
    > of the above,
    > i.e.,
    >      forall x,z.  (grandparent(x,z) -> (exists y.  (parent(x,y) and 
    > parent(y,z))
    > Formally, however, introducing an existential in the head is not allowed
    > in Horn, so this goes beyond Horn.  Your "style 2" does this.
    > By contrast, your "style 1" is simply Horn (Datalog-ness issue aside), no 
    > more no less.  Since style 2
    > can express style 1 as a special case, style 2 is strictly more powerful 
    > than Horn.
    I agree the form is not Horn clausal, but I think it is equivalent.
    The existential I'm allowing in the head would come right back out,
    leaving you with only Horn clauses, if you clausified the formula.  So
    the expressiveness is the same.
    > Quite non-standard in your way of describing a "rule" (in both "styles") is 
    > to treat the
    > *names* of quantified variables as significant *outside* of the scope of
    > their quantification.   By contrast, in FOL (and thus in its expressive 
    > fragments, including Horn and
    > Description Logic), within the scope of a quantifier one can equivalently 
    > rewrite the variable names arbitrarily.
    > Your intuition of correspondence between variables in an antecedent 
    > expression and variables in a consequent
    > expression is, I believe, simply better viewed as universal 
    > quantification.   Viewed separately, the two
    > expressions are each what in FOL is simply called "open", i.e., they have 
    > variables in them.
    I totally agree with you.  Back when I wrote that, I thought perhaps
    bNodes were more acceptable to the community than universally
    quantified variables, so that maybe we could sneak in (er, gracefully
    introduce) rules using only bNodes as variable.  I now think that's a
    bad idea.
    > I don't really understand what you mean by saying that you're "deSkolemizing"
    > other than at a rather hazy intuitive level.  The existentially quantified 
    > variables
    > in the consequent emerge only after carving out the universally quantified 
    > variables
    > that (typically) also appear in the body.
    > Maybe you can clarify this further?
    Maybe.  :-)
    The typical procedure for converting an FOL sentence into clausal
    form begins: 
       1.  Rewrite => using ~ and |
       2.  Move ~ inwards 
       3.  Rename variables to be distinct through the sentence
       4.  Move quantifiers left
       5.  Skolemize: remove existential quantifiers by replacing the
           variables with new functions, parameterized by all 
           universals to their left.
    So there's a kind of duality between existentially quantified
    variables and functions.  Skolemizing is converting all existential
    variables to being function terms; deSkolemizing is converting all
    function terms to be existentially quantified variables.
    If you think of f(x,y) as denoting the thing which is mapped to by f
    from the pair <x,y>, it makes perfect sense to involve that "thing"
    in a predicate.  We can imaging such a predicate, pf, defined
       pf(z,x,y)  <=>   z=f(x,y) 
    Maybe we can call pf the "Skolem Predicate" of f (or maybe it has some
    existing name in the literature).  To deSkolemize a formula, getting
    rid of all functions, we convert it as 1-4 above, then rewrite each
    literal (atomic sentence or negated atomic sentence) containing a
    function term as an existentially quantified formula containing the
    conjunction of the Skolem predicate formula(s) and the literal with
    the function term(s) replaced by the newly introduced variable(s).
    For example...
         ... & ~p(a, b, f(a), g(c, d)) & ...
    turns into
         ... & exists e1 e2 (
                 ~p(a, b, e1, e2) &
    	     pf(e1, a) &
    	     pg(e2, c, d)
               ) & ...
    I think I've got that right.  In any case, it should be enough to show
    what I mean.
    As you pointed out on the call, pf and pg need to be defined with some
    kind of uniqueness rule, like
          all x y z ( p(x,y) & p(z, y) => x=z )
    This actually makes a nice example of when you want to use the
    equivalence you meantioned at the start of the message.  If we want to
    make it a biconditional, we need to move y's quantification inside:
         all x z (
            exists y ( p(x,y) & p(z, y) )
    ...although I guess that's only true if f were a total function; to
    handle that more carefully, we should do something with the domain of
         all x z (
            exists y ( p(x,y) & p(z, y) )
            inDomainOf_f(x) & x=z 
    The details are going to require more thought than I'm capable of
    right now.
    To remind ourselves of the point here: RDF graphs are more expressive
    than FOL atomic sentences.  The logic-function terms and n-ary
    predicates in FOL are essentially just syntactic sugar; they can be
    converted into RDF by using additional predicates and triples (modulo
    the function's many-to-one mapping quality, which requires additional
    expressivity to encode).
    The presence of logic functions is usually seen as the distinguishing
    feature between datalog and Horn logic, but that's only a true
    division when we're looking at clausal form sentences.  In
    characterizing RDF rules as UR-Centered-Bin-Datalog, it looks like
    you've incorrectly let RDF's lack of logic functions make you think
    RDF rules have to be no more expressive than datalog.  Puting it under
    datalog makes RDF rules fit nicely in your hierarchy, but it gives us
    a less powerful system than we (cwm/n3 users at least) want.
    (Ah the silly contortions of working with RDF....  :-)
         -- sandro

    This archive was generated by hypermail 2.1.4 : 11/12/02 EST