% draft of 7/23/2004
% SWSL Rules language design document
% by Benjamin Grosof and Michael Kifer
The purpose of SWSL Rules is to provide a language that is suitable for
describing all major aspects of the semantics of a Web service except for
the process modeling aspects. This includes inputs, outputs, preconditions
and postconditions; other aspects (the so-called "nonfunctional
properties") useful for discovery; contracts; and policies.
Version History:
o v1 was by Michael based on previous notes file by Benj
o v2 by Benj
- add document header info
- made various edits
- add some overall notes
- add some todo's/Q's
o v3 by Benj and Michael on long telecon
- various edits
o v4 by Michael
- reification
- nesting
- path expressions
- various edits of 1-10.
o v5 by Benj based in part on discussion with Said
- added some to-discuss notes at top
o v6 by Benj based on discussion with Michael
o v7 by Michael
- fleshed out a bit more on reification and transaction/situated stuff
o v8 by Michael: very minor polish
o v9 by Michael:
Major reification cleanup
Added process description terms
Clarified HiLog predicates; explained them with examples
General First-order Formulas
Clarified rules
Many additions and issues throughout.
o v10 by Michael:
Incorporated David's comments
o v11 by Michael:
Elaborated on constants, URIs
%%%%
Should aim at a layered approach to the language. Possible layers:
Core - just the regular datalog with function symbols and negation as
failure
Layers (they are orthogonal):
HiLog
F-logic (F-logic itself can have several layers - being specified
by FORUM, http://projects.semwebcentral.org/projects/forum/)
Lloyd-Topor
Courteous
...
%%%%
SWSL Rules Language: An Outline
===============================
Gets more sketchy towards the end. Summary of the topics (some have
many subtopics, some are short):
List of topics:
1. Terms
2. Positive literals
3. General First-order Formulas
4. Complex F-logic molecules
5. F-Logic Path Notation Syntax [SKETCHY]
6. Rules
7. Courteous extensions
8. Namespace declarations
9. Comments
10. Equality
11. Sensors [SKETCHY]
12. XML Types [SKETCHY]
13. Slots, lists, unordered collections, and rest variables [SKETCHY]
-------------------------------------------------------------------------------
1. Terms
Terms are expressions of one of the forms below. Each of these types of
terms is formally defined later.
1. HiLog terms (H-term). These are extensions of regular
first-order terms.
A HiLog term is either a constant, a variable, or a complex term,
as described below.
2. Path expressions. These are generalizations of the common
object-oriented idioms as used in languages like Java
3. Reifications (R-terms). These are reficiations of formulas.
4. Process terms (P-terms). These are process descriptions
(a.k.a. "process macros").
Rationale: A process description is neither an FOL formula nor an
LP formula. It is a stylized Golog formula and more. Whatever it
is, the Rules language must be able to refer to them directly (eg
to find out that indeed the credit card will be charged after the
service is consumed). Since this is what the service provider
will specify, we should support this directly in our Rules language.
The easiest way to do it is to encode them as terms. (They are
not reifications, since they don't describe a formula that is
understood directly by SWSL Rules.)
a. Constants
A constant is either a literal or a URI.
Literal:
- a number (may distinguish between florats, ints, etc.)
- a string. Strings are encloded in quotes. To include a quote in
a string one should double it. For instance, 'a''b' is
a representation for the literal a'b and 'a''''b' is a
representation for a''b.
String literals that contain only the alphanumeric
things (+ the underscore) don't need to be quoted --
the usual convention in LP.
URI:
- a complete URI (protocol://location...)
- a Qname (prefix:...)
Both must conform to the W3 specs.
A constant is a special case of an H-term.
The prefix may be empty. In this case, the overall URI
is interpreted using the default namespace.
ISSUE: In the default namespace, does it mean that it looks like
:foo (ie the ":" is preserved)?
MK: I think so.
Note: MK: Idea: later perhaps extend to permit the namespace
to be a more general H-term in which variables may appear.
b. Variables
A variable is a special case of an H-term.
Variables look like ? .
c. Complex Terms
If t, t1,...,tn are H-terms then t(t1,...,tn) is an H-term.
Note1: Variables can range over function symbols and predicates
Things like p(X)(a,Y) are terms. Has many uses (see the HiLog
paper)
d. Reifications (R-Terms): Defined later
e. Process terms. To be defined later when the Process subgroup is
done with its job.
2. Positive literals: Either HiLog predicates or F-logic molecules.
Later (below) will add complex literals, which are just
syntactic sugar for common object-oriented idioms.
Note: We might add builtins, which would have a different syntax,
as we discussed. For instance, lessThan{...}
Probably a good idea to plagiarize SWRL builtins.
(Rip them off completely or preserve their namespace?)
a. HiLog predicates
Any HiLog term is also a HiLog predicate.
Explanation. This aspect of HiLog formalizes the so called Prolog
"call" meta-predicate, but doesn't require such a predicate to
perform the same function.
For instance, in HiLog one can have the following program.
p(a).
q(p(a)).
Then the query
?- q(X), X.
will succeed with the binding X=p(a).
Note that if we didn't have p(a) in our database above, then this
query would fail, since
q(X) will succeed with the binding X=p(a)
but
the call X (where X is now p(a)) will fail,
since p(a) is not true.
In fact, the following rule is completely legal:
call(X) :- X.
it defines the predicate "call" with exactly the same semantics as
in Prolog, except that here it is not a builtin meta predicate with
non-logical semantics but rather a normal predicate, which is
defined directly in our language.
b. F-logic molecules
Preamble: The F-Logic features are useful to
represent frames and object-oriented conceptualization.
They are orthogonal to HiLog extensions of predicate
logic.
Note: Every ground H-term denotes an object.
Classes are objects, attributes are
objects, methods are objects, even integers and strings
are objects. (The latter can be separated out, but there
is no good reason to do so.)
Also, function symbols are objects, predicate names are
objects, etc.
The atomic formulas in F-logic are called "molecules". We first
describe atomic molecules and then introduce "complex molecules".
Complex molecules typically play the role similar to atomic
formulas in predicate logic, but they are not "indivisible". This
is why they are called molecules and not atoms.
- Value molecules
If t, m, v are terms then t[m -> v] is a value-molecule
Meaning:
Here t denotes an object, m a method,
and v denotes the value returned by this invocation in
the scope of object t.
If m = s(t1,...,tn), i.e. has arguments, then
t[s(t1,...,tn) -> v] is interpreted as an invocation of
method s on args t1,...,tn in the context of the object
t, which returns a set of values that contains v.
The syntax t[m->{v1,...,vk}] is also supported. Means: if m is
invoked in the context of the object t then it returns a set
that contains v1,...,vk.
Note1: this semantics makes the above term to be equivalent to a
conjunction of t[m->v1], ..., t[m->vk], so the syntax
t[m->{v1,...,vk}] is just a syntactic sugar.
Note2: Methods are conceptualized as set-valued
functions. Cardinality constraints (described later) can be
imposed on them.
- Boolean value-molecules
Are of the form t[m], where t,m are terms. Useful to specify
things like mary[female]. Could be alternatively written as
mary[female->true], but is less natural.
- Class membership molecules
If t, s are terms then t in s is a membership molecule.
Meaning: t is an object of class s.
(Others have used "isa" in place of "in", but we think "in" is
better for the novice.)
s has t is equivalent to t in s .
Note: in the original F-logic and in all its implementations
(both commercial and academic) membership is denoted as
t:s. However, we better swim with the tide and use ":"
for namespaces.
- Subclass molecules
If t, s are terms then t sub s is a subclass molecule.
Meaning: t is a subclass of s.
s sup t is equivalent to t sub s
Note: The original F-logic (and all current systems) use :: for
subclassing.
- Signature molecules
Explanation: Signatures are assertions about the expected types of
the method arguments and results. They don't have direct
effect on the inference (unless the signatures themselves
appear in the rules that have value/membership/subclass
molecules). The signature information is optional.
The semantics is defined as follows. First, the intended
model of the KB is computed. Then, if typing needs to be
checked, we must verify that this intended model is
well-typed. A well-typed model is one where the
value-molecules conform to their signatures. For the precise
definition of well-typed models see the F-logic paper.
(There can be several different notions of well-typed
models. For instance, one for semi-structured data and
another for completely structured data.)
A type-checker can be written in F-logic using just a few
rules. Such a type checker is a query, which returns "No",
if the model is well-typed and a counterexample otherwise.
In particular, type-checking has the same complexity as
querying.
Type information is important for conceptual design and
debugging.
If t, m, v are terms then t[m => v] is a (signature) molecule
Meaning: class t (or collection of classes if t is not ground)
has method invocation m that returns values of type v
(ie these are objects that belong to class v). If m itself
has arguments (m = s(t1,...,tn)) then they are also
interpreted as types. For instance
t[s(t1,...,tn) => v]
means that given the arguments of types t1,...,tn the
method s returns values of class v.
- Boolean signature molecules
The original F-logic didn't have a signature syntax
corresponding to value-molecules, but the Forum WG is currently
discussing
t[=>m]
as a signature for t[m].
Note: in the Forum effort we have decided to add cardinality
constraints as follows:
t[s(t1,...,tn) {2:5} => v]
meaning that given the args o1,...,on of types t1,...,tn
the method can return between 2 and 5 values of type
v. That is, 2 <= |s(o1,...,on)| <= 5
The original F-logic had => to denote single-valued
(functional) methods and =>> for set-valued methods.
It didn't have cardinality constraints. In Forum we
decided to simplify and extend using the cardinality
constraints.
Note:
Type checking relies heavily on negation-as-failure to infer
that types are weakly violated. For instance, if a certain
argument of a method is expected to be an object of type Person
and we get an object abc, then we invoke the query
?- abc in Person.
If we can't prove that abc is a person, then we conclude that
it is not and thus the type constraint is violated.
However, with the Courteous extension, negative information can
be specified explicitly. Therefore, a more "credulous" kind of
type checking is possible.
c. Inheritance in F-logic.
Explanation:
This is an optional extension feature. To avoid using it,
avoid employing the syntactic constructs below. To some extent,
this feature is still in theoretical development -- see comments
below about relationship to courteous.
Method/attributes can be inheritable and non-inheritable.
Non-inheritable correspond to class methods in Java, while
inheritable correspond to instance methods.
In F-logic, the type of the method is determined by the arrow.
Inheritable molecules look like t[m *-> v]. They denote info
about class t, which is inherited by subclasses and instances.
Non-inheritable methods are denoted t[m -> v] (the syntax that we
have already seen). The type info of inheritable methods is given
by signature molecules of the form t[m *=> v].
Rules for inheritance:
t sub s and s[m *=> v] |= t[m *=> v]
t in s and s[m *=> v] |= t[m => v]
Explanation: The type declaration of an inheritable method
is inherited to subclasses in an inheritable form (i.e.,
it can be further inherited).
However, to the members of a class such declarations is
inherited in a non-inheritable form. That is,
inheritance is propagated through subclasses, but stops
once it is propagated to class members.
t sub s and s[m *-> v] |= t[m *-> v], unless overriden
t in s and s[m *-> v] |= t[m -> v], unless overriden
Explanation:
The values returned by inheritable methods are propagated to
subclasses in the inheritable form (i.e., it can be
propagated further) unless inheritance is overriden.
To the members of a class, the values returned for a
method are propagated in a non-inheritable form (and
even that only if not overriden).
Semantics of overriding is based in part on the Well-Founded
Semantics. We omit details here.
Note: no overriding for signature inheritance - signature
inheritance is monotonic, while value inheritance is not.
Remark:
We should look at how to combine/merge the courteous
extensions with the Yang & Kifer inheritance
approach. This is future work.
3. General "First-order" Formulas
ISSUE/Question: What is the first-order subset for?
Answer (MK): They are for the Loyd-Topor extensions -- NOT for
the FLOWS stuff.
This is because, according to the f2f discussions, process
descriptions are going to be written in a stylized Golog-like
senstences. They are very different from first-order and we
represent them using P-terms, NOT as formulas.
PSL ontology will be written in some first-order language.
But we don't care about ITS syntax, because the user will never
see or this ontology except maybe in the papers that define PSL.
The ordinary user will deal with the PSL ontology ONLY through
the concepts defined by these ontology, which will appear to her
as predicates, classes, or whatever SWSL decides.
Note: First-order is in quotes because these are only syntactically
first-order. We actually use them in the rule bodies.
Moreover, we use both "not" (negation as failure) and
later "neg" (explicit weak form of classical negation -- used for
courteous LP).
Definitions:
Connectives:
"and" stands for conjunction.
"or" stands for disjunction.
"neg" stands for (weak) classical negation
"not" stands for negation as failure
Quantifiers:
exists [, ]*
forall [, ]*
Formula:
| and
| or
| not
| not neg // courteous
| neg // courteous
| () |
Benjamin: Any restrictions on the use of neg? For instance, is
neg not neg permitted?
Also, don't remember if Lloyd-Topor imposes any
restrictions on the use of "not". For instance,
not (exists X (p(...) \/ not q(...)))?
**** We should move the neg stuff to the courteous subsection.
4. Complex positive literals - syntactic sugar for common O-O idioms
Molecules can be combined in two ways:
- by grouping and
- by nesting
Grouping. Grouping applies to molecules that describe the same object:
t[m1 -> v1] and t[m2 => v2] and t[m3 {6:9} => v3] and t[m4->v4]
is -- by definition -- equivalent to
t[m1 -> v1 and m2 => v2 and m3 {6:9} => v3 and m4->v4]
Can also combine and with or according to the normal precedence rules:
t[m1 -> v1] and t[m2 => v2] or t[m3 {6:9} => v3] and t[m4->v4]
is a sugar for
t[m1 -> v1 and m2 => v2 or m3 {6:9} => v3 and m4->v4]
Note:
FLORA-2 also allows things like a[b->c] and not a[e->f] to be written
as a[b->c and not e->f].
Used relatively rarely. Worth adding? If so, we won't be able to
call these literals "complex positive" litarals.
Nesting. Nesting applies to molecules in the following "chaining"
situation, which is a common technique in object-oriented databases:
t[m -> v] and v[q -> r]
is by definition equivalent to
t[m -> v[q -> r]]
Nesting can also be used to combine membership and subclass molecules
with value-molecules in the following situations:
t sup s and s[m->v]
is equivalent to
t sup s[m->v]
Likewise,
t has s and s[m->v]
is equivalent to
t has s[m->v]
Molecules can also be nested inside predicates:
p(a[b->c])
is
p(a) and a[b->c].
Deep nesting is allowed:
p(f(q,a[b->c]),foo)
is
p(f(q,a),foo) and a[b->c]
and
a[b-> foo(e[f->g])]
is
a[b-> foo(e)] and e[f->g]
Note that this syntax is completely compositional. That is, molecules
are allowed anywhere where a term is allowed. For instance,
a[foo(b[c->d]) -> e]
is
a[foo(b)->e] and b[c->d]
and
a[foo[b->c] -> e]
means
a[foo -> e] and foo[b->c]
5. Rules (also see Courteous extensions later)
a. A rule is of the form
:- ;
ISSUE: some rule systems use "." as ending delimiter instead of ";".
What do we do?
NOTE: Rules are *not* classical material implication
b. Rule head
A conjunction of positive literals.
Parentheses are used to enclose such conjunctive expressions,
in the usual manner.
Note: a common case of such conjunction is when there is an F-Logic
complex molecule in the head.
c. Rule body
A rule body may be empty (in which case the rule is a fact)
It may be a single (positive or negative) literal.
It may be a conjunction of literals.
Positive Lloyd-Topor extensions:
More generally, the body may be built up from literals
using the conjunction ("and")
and disjunction ("or") connectives. Conjunction and disjunction
may be nested, e.g., two and'ed subexpressions within a top-level
or. Parentheses are used to enclose such conjunctive or
disjunctive expressions, in the usual manner.
General Lloyd-Topor extensions:
In the body:
these include (in addition to the above):
- negation-as-failure applied to expressions
- universal quantifiers
- existential quantifiers
- if ... then ...
- if ... then ... else ...
All these essentially rely on rewriting using negation-as-failure.
In the head:
these include (in addition to the positive L-T extensions):
- if ... then ...
*** "else" is not permitted - doesn't make sense
Note: Also in the head it's easy but not really useful to add
universal quantifiers
Note: variables that are not explicitly quantified are assumed to
be implicitly universally quantified outside of the rule.
6. The Path Notation Syntax
Preamble: Path expressions are useful shorthands that are widely used
in object-oriented and Web languages. In a logic-based
language, a path expression allows one to write formulas
more concisely by eliminating the need to introduce explicit
variables.
A value path expression has the form
t.t1.t2. ... .tn
where t, t1, t2, ... are terms or molecules.
In the BODY OF A RULE, such an expression is considered a shorthand for
the variable ?Var_n, which is defined by the following formula:
t[t1->?Var1] and ?Var1[t2->?Var2] and ... and ?Var_{n-1}[tn->?Var_n]
In other words, writing mary.father.mother = sally stands for
mary[father->?F] and ?F[mother->?M] and ?M = sally
and thus mary.father.mother denotes mary's paternal grandmother.
Note that molecules can appear wherever a term can appear (nesting).
Therefore, a.b[c->d].e.f[g->h].k is allowed. It denotes the ?V defined
by the following formula:
a[b-> ?X] and ?X[e->?Y] and ?Y[f->?Z] and ?Z[k->?V]
and ?X[c->d] and ?Z[g->h]
This is analogous to XPath's intermediate predicates.
For instance, X = a.b[c->d].e.f[g->h].k is equivalent to
X = a.b.e.f.k and a.b[c->d] and a.b.e.f[g->h]
If we spell it out completely, then we get:
a[b-> ?X] and ?X[e->?Y] and ?Y[f->?Z] and ?Z[k->?V] and ?X[c->d]
and ?Z[g->h] and ?X[c->d] and ?Z[g->h]
A signature path expression is defined by analogy. Here we use ".." to
define such path expressions rather than ".". It has the form
t..t1..t2.. ... ..tn
where t, t1, t2, ... are terms.
ISSUE: It was proposed at the F2F to use / to descend down the ->
and .. to go up. The motivation is that it will allow
to go in both directions.
However: What should we use for navigation in signatures?
(Maybe we shouldn't use path expressions for signatures
at all?)
The other point is that it is already easy to go up,
and going up is very rare anyway. Here is an example
for going up. Something similar to a/../b/c starting
with the object "d" in XPath:
?Y[a -> d].b.c
This is actually more coherent than the XPath stuff.
Note: The * in XPath (eg b/*/c beginning with object "a" is represented as
a.b.?X.c
i.e., you use a variable.
ISSUE: One may ask: what is the substitute for things like //?
Before we introduce such a syntax, we need to see a usecase.
One should be aware that navigation in object structures, which
can be cyclic graphs is not the same as navigation in XML trees.
So, // is not that well-defined (although it can be with some
effort).
Here is an example of a cyclic structure:
a[b->c].
c[b->a].
SEMANTICS DETAILS:
NOTE: A path expression means different things depending on whether
it is uded in the rule body or the rule head.
For the process descriptions that come from the FLOWS part,
we need to see how they are used and decide what their
semantics is supposed to be. In fact, it is not clear what is
the use of this semantics for the Rules language, since
process descriptions are just P-terms.
However, in the actual translation from these process
expressions into the
internal-KIF-style-language-that-only-theoreticians-will-see
such a semantics is important. Probably a job for the FLOWS
group.
In the BODY of a rule, such an expression is considered a shorthand for
the variable ?Var_n, which is defined by the following formula:
t[t1=>?Var1] and ?Var1[t2=>?Var2] and ... and ?Var_{n-1}[tn=>?Var_n]
Value and signature path expressions can be mixed (but this is rarely
useful): t.t1..t2.t3.
A path expression can appear wherever an object can appear. For
example,
a[b->c.d] is legal and means
c[d->?V] and a[b->?V]
(because c.d is a shorthand for the variable ?V in the above).
Similarly, a.b.c[d->e] is legal and means
a[b->?B] and ?B[c->?D] and ?D[d->e]
(because a.b.c is a shorthand for ?D in the above).
The expressions of the form
a.b.c[d->e].f.g[h->k].m
are also allowed. They stand for such a value v that
a.b.c[d->e] and a.b.c.f.g[h->k] and a.b.c.f.g.m = v
In the HEAD of a rule, a path expression is defined with the help of
Skolem functions. For instance,
a.b[foo->X] <- body(X).
means
a[b->skolem(a,b)] <- body(X).
skolem(a,b)[foo->X] <- body(X).
where skolem is some new function symbol.
Similarly for type signature constraints
a..b[foo=>X] <- body(X).
means
a[b=>sigskolem(a,b)] <- body(X).
sigskolem(a,b)[foo=>X] <- body(X).
where sigskolem is yet another new function symbol.
7. Reification.
If f is a formula that has the syntactic form of a rule head, rule
body, or of a rule (defined later) then
f
is also considered to be a term. Therefore, it can be used
whenever an object can occur.
NOTE: As defined, reification introduces syntactic ambiguity. See below
how this is dealt with it.
A reified formula represents an objectification of the
corresponding formulas. Such terms are useful for specifying
ontologies where objects specify theories that can be true in some
worlds, but are not true in the present world (so those theories
cannot be asserted in the present world). Examples include the
effects of actions.
Reification doesn't cause paradoxes as long as:
- rule heads don't contain classical negation
- rule head can't be a variable, i.e., X :- body (which is
allowed in HiLog)
(See G. Yang and M. Kifer, "Reasoning about Anonymous
Resources and Meta Statements on the Semantic Web," Journal
of Data Semantics, Springer Verlag, 2003.)
The fact that Courteous SWSL-Rules can have negation in the rule heads
doesn't mean that paradoxes are possible. The SWSL-Rules negation
is not classical (but rather defeasible) and such sets of rules
can be rewritten without negation in the head.
SYNTACTIC DETAILS.
Reification introduces ambiguity into the syntax. For instance,
consider
a[b->t]
where t is a term, which we want to consider as a
reification of the formula c[d->e]. Since we said that a formula
is a term then we should be able to just write
a[b->c[d->e]]
But this is ambiguous, since earlier we defined the above as a
commonly used object-oriented idiom, a syntactic sugar for
a[b->c] and c[d->e]
Similarly, if we want to write something like
t[b->c]
where t is a reification of f[g->h] then we can't write
f[g->h][b->c]
because this is syntactic sugar for f[g->h] and f[b->c].
To resolve this ambiguity, we introduce the reification
operator whose only role is to indicate to the parser that
a particular occurrence of a nested molecule is to be treated as
a term that represents a reified formula rather than
as a syntactic sugar for an object-oriented idiom.
Note that the reification operator is not needed for HiLog
predicates because there is no ambiguity:
a[b->p(X)]
doesn't mean a[b->p(X)] and p(X) because the sugar is only for
the object-oriented idioms, i.e., for nested molecules.
However, in
a[b->p(X[foo->bar])]
explicit reification is needed because the above is a syntactic
sugar for
a[b->p(X)] and X[foo->bar]
So, we should write this instead:
a[b-> ${p(X[foo->bar])}]
Why is the object-oriented idiom the default rather than
reification? Because reification is rare and the use of the
idiom is common.
Examples:
Reification in SWSL-Rules is very powerful and yet it doesn't add
to the complexity of the language. Here is an example that
models an agent who believes in modus ponens:
john[believes-> ${p(a)}].
john[believes-> ${q(?X) <- p(?X)}].
// modus ponens
john[believes-> ?A] <-
john[believes -> ${?A <- ?B}] and john[believes -> ${?B}].
One can infer that john believes in q(a).
Note: explicit reification in ${p(a)} is not needed, because
no ambiguity exists here.
8. Courteous extensions
- Rule Labels:
Rules have optional labels:
'<'