% draft of 7/23/2004 % SWSL Rules language design document % by Benjamin Grosof and Michael Kifer The purpose of SWSL Rules is to provide a language that is suitable for describing all major aspects of the semantics of a Web service except for the process modeling aspects. This includes inputs, outputs, preconditions and postconditions; other aspects (the so-called "nonfunctional properties") useful for discovery; contracts; and policies. Version History: o v1 was by Michael based on previous notes file by Benj o v2 by Benj - add document header info - made various edits - add some overall notes - add some todo's/Q's o v3 by Benj and Michael on long telecon - various edits o v4 by Michael - reification - nesting - path expressions - various edits of 1-10. o v5 by Benj based in part on discussion with Said - added some to-discuss notes at top o v6 by Benj based on discussion with Michael o v7 by Michael - fleshed out a bit more on reification and transaction/situated stuff o v8 by Michael: very minor polish o v9 by Michael: Major reification cleanup Added process description terms Clarified HiLog predicates; explained them with examples General First-order Formulas Clarified rules Many additions and issues throughout. o v10 by Michael: Incorporated David's comments o v11 by Michael: Elaborated on constants, URIs %%%% Should aim at a layered approach to the language. Possible layers: Core - just the regular datalog with function symbols and negation as failure Layers (they are orthogonal): HiLog F-logic (F-logic itself can have several layers - being specified by FORUM, http://projects.semwebcentral.org/projects/forum/) Lloyd-Topor Courteous ... %%%% SWSL Rules Language: An Outline =============================== Gets more sketchy towards the end. Summary of the topics (some have many subtopics, some are short): List of topics: 1. Terms 2. Positive literals 3. General First-order Formulas 4. Complex F-logic molecules 5. F-Logic Path Notation Syntax [SKETCHY] 6. Rules 7. Courteous extensions 8. Namespace declarations 9. Comments 10. Equality 11. Sensors [SKETCHY] 12. XML Types [SKETCHY] 13. Slots, lists, unordered collections, and rest variables [SKETCHY] ------------------------------------------------------------------------------- 1. Terms Terms are expressions of one of the forms below. Each of these types of terms is formally defined later. 1. HiLog terms (H-term). These are extensions of regular first-order terms. A HiLog term is either a constant, a variable, or a complex term, as described below. 2. Path expressions. These are generalizations of the common object-oriented idioms as used in languages like Java 3. Reifications (R-terms). These are reficiations of formulas. 4. Process terms (P-terms). These are process descriptions (a.k.a. "process macros"). Rationale: A process description is neither an FOL formula nor an LP formula. It is a stylized Golog formula and more. Whatever it is, the Rules language must be able to refer to them directly (eg to find out that indeed the credit card will be charged after the service is consumed). Since this is what the service provider will specify, we should support this directly in our Rules language. The easiest way to do it is to encode them as terms. (They are not reifications, since they don't describe a formula that is understood directly by SWSL Rules.) a. Constants A constant is either a literal or a URI. Literal: - a number (may distinguish between florats, ints, etc.) - a string. Strings are encloded in quotes. To include a quote in a string one should double it. For instance, 'a''b' is a representation for the literal a'b and 'a''''b' is a representation for a''b. String literals that contain only the alphanumeric things (+ the underscore) don't need to be quoted -- the usual convention in LP. URI: - a complete URI (protocol://location...) - a Qname (prefix:...) Both must conform to the W3 specs. A constant is a special case of an H-term. The prefix may be empty. In this case, the overall URI is interpreted using the default namespace. ISSUE: In the default namespace, does it mean that it looks like :foo (ie the ":" is preserved)? MK: I think so. Note: MK: Idea: later perhaps extend to permit the namespace to be a more general H-term in which variables may appear. b. Variables A variable is a special case of an H-term. Variables look like ? . c. Complex Terms If t, t1,...,tn are H-terms then t(t1,...,tn) is an H-term. Note1: Variables can range over function symbols and predicates Things like p(X)(a,Y) are terms. Has many uses (see the HiLog paper) d. Reifications (R-Terms): Defined later e. Process terms. To be defined later when the Process subgroup is done with its job. 2. Positive literals: Either HiLog predicates or F-logic molecules. Later (below) will add complex literals, which are just syntactic sugar for common object-oriented idioms. Note: We might add builtins, which would have a different syntax, as we discussed. For instance, lessThan{...} Probably a good idea to plagiarize SWRL builtins. (Rip them off completely or preserve their namespace?) a. HiLog predicates Any HiLog term is also a HiLog predicate. Explanation. This aspect of HiLog formalizes the so called Prolog "call" meta-predicate, but doesn't require such a predicate to perform the same function. For instance, in HiLog one can have the following program. p(a). q(p(a)). Then the query ?- q(X), X. will succeed with the binding X=p(a). Note that if we didn't have p(a) in our database above, then this query would fail, since q(X) will succeed with the binding X=p(a) but the call X (where X is now p(a)) will fail, since p(a) is not true. In fact, the following rule is completely legal: call(X) :- X. it defines the predicate "call" with exactly the same semantics as in Prolog, except that here it is not a builtin meta predicate with non-logical semantics but rather a normal predicate, which is defined directly in our language. b. F-logic molecules Preamble: The F-Logic features are useful to represent frames and object-oriented conceptualization. They are orthogonal to HiLog extensions of predicate logic. Note: Every ground H-term denotes an object. Classes are objects, attributes are objects, methods are objects, even integers and strings are objects. (The latter can be separated out, but there is no good reason to do so.) Also, function symbols are objects, predicate names are objects, etc. The atomic formulas in F-logic are called "molecules". We first describe atomic molecules and then introduce "complex molecules". Complex molecules typically play the role similar to atomic formulas in predicate logic, but they are not "indivisible". This is why they are called molecules and not atoms. - Value molecules If t, m, v are terms then t[m -> v] is a value-molecule Meaning: Here t denotes an object, m a method, and v denotes the value returned by this invocation in the scope of object t. If m = s(t1,...,tn), i.e. has arguments, then t[s(t1,...,tn) -> v] is interpreted as an invocation of method s on args t1,...,tn in the context of the object t, which returns a set of values that contains v. The syntax t[m->{v1,...,vk}] is also supported. Means: if m is invoked in the context of the object t then it returns a set that contains v1,...,vk. Note1: this semantics makes the above term to be equivalent to a conjunction of t[m->v1], ..., t[m->vk], so the syntax t[m->{v1,...,vk}] is just a syntactic sugar. Note2: Methods are conceptualized as set-valued functions. Cardinality constraints (described later) can be imposed on them. - Boolean value-molecules Are of the form t[m], where t,m are terms. Useful to specify things like mary[female]. Could be alternatively written as mary[female->true], but is less natural. - Class membership molecules If t, s are terms then t in s is a membership molecule. Meaning: t is an object of class s. (Others have used "isa" in place of "in", but we think "in" is better for the novice.) s has t is equivalent to t in s . Note: in the original F-logic and in all its implementations (both commercial and academic) membership is denoted as t:s. However, we better swim with the tide and use ":" for namespaces. - Subclass molecules If t, s are terms then t sub s is a subclass molecule. Meaning: t is a subclass of s. s sup t is equivalent to t sub s Note: The original F-logic (and all current systems) use :: for subclassing. - Signature molecules Explanation: Signatures are assertions about the expected types of the method arguments and results. They don't have direct effect on the inference (unless the signatures themselves appear in the rules that have value/membership/subclass molecules). The signature information is optional. The semantics is defined as follows. First, the intended model of the KB is computed. Then, if typing needs to be checked, we must verify that this intended model is well-typed. A well-typed model is one where the value-molecules conform to their signatures. For the precise definition of well-typed models see the F-logic paper. (There can be several different notions of well-typed models. For instance, one for semi-structured data and another for completely structured data.) A type-checker can be written in F-logic using just a few rules. Such a type checker is a query, which returns "No", if the model is well-typed and a counterexample otherwise. In particular, type-checking has the same complexity as querying. Type information is important for conceptual design and debugging. If t, m, v are terms then t[m => v] is a (signature) molecule Meaning: class t (or collection of classes if t is not ground) has method invocation m that returns values of type v (ie these are objects that belong to class v). If m itself has arguments (m = s(t1,...,tn)) then they are also interpreted as types. For instance t[s(t1,...,tn) => v] means that given the arguments of types t1,...,tn the method s returns values of class v. - Boolean signature molecules The original F-logic didn't have a signature syntax corresponding to value-molecules, but the Forum WG is currently discussing t[=>m] as a signature for t[m]. Note: in the Forum effort we have decided to add cardinality constraints as follows: t[s(t1,...,tn) {2:5} => v] meaning that given the args o1,...,on of types t1,...,tn the method can return between 2 and 5 values of type v. That is, 2 <= |s(o1,...,on)| <= 5 The original F-logic had => to denote single-valued (functional) methods and =>> for set-valued methods. It didn't have cardinality constraints. In Forum we decided to simplify and extend using the cardinality constraints. Note: Type checking relies heavily on negation-as-failure to infer that types are weakly violated. For instance, if a certain argument of a method is expected to be an object of type Person and we get an object abc, then we invoke the query ?- abc in Person. If we can't prove that abc is a person, then we conclude that it is not and thus the type constraint is violated. However, with the Courteous extension, negative information can be specified explicitly. Therefore, a more "credulous" kind of type checking is possible. c. Inheritance in F-logic. Explanation: This is an optional extension feature. To avoid using it, avoid employing the syntactic constructs below. To some extent, this feature is still in theoretical development -- see comments below about relationship to courteous. Method/attributes can be inheritable and non-inheritable. Non-inheritable correspond to class methods in Java, while inheritable correspond to instance methods. In F-logic, the type of the method is determined by the arrow. Inheritable molecules look like t[m *-> v]. They denote info about class t, which is inherited by subclasses and instances. Non-inheritable methods are denoted t[m -> v] (the syntax that we have already seen). The type info of inheritable methods is given by signature molecules of the form t[m *=> v]. Rules for inheritance: t sub s and s[m *=> v] |= t[m *=> v] t in s and s[m *=> v] |= t[m => v] Explanation: The type declaration of an inheritable method is inherited to subclasses in an inheritable form (i.e., it can be further inherited). However, to the members of a class such declarations is inherited in a non-inheritable form. That is, inheritance is propagated through subclasses, but stops once it is propagated to class members. t sub s and s[m *-> v] |= t[m *-> v], unless overriden t in s and s[m *-> v] |= t[m -> v], unless overriden Explanation: The values returned by inheritable methods are propagated to subclasses in the inheritable form (i.e., it can be propagated further) unless inheritance is overriden. To the members of a class, the values returned for a method are propagated in a non-inheritable form (and even that only if not overriden). Semantics of overriding is based in part on the Well-Founded Semantics. We omit details here. Note: no overriding for signature inheritance - signature inheritance is monotonic, while value inheritance is not. Remark: We should look at how to combine/merge the courteous extensions with the Yang & Kifer inheritance approach. This is future work. 3. General "First-order" Formulas ISSUE/Question: What is the first-order subset for? Answer (MK): They are for the Loyd-Topor extensions -- NOT for the FLOWS stuff. This is because, according to the f2f discussions, process descriptions are going to be written in a stylized Golog-like senstences. They are very different from first-order and we represent them using P-terms, NOT as formulas. PSL ontology will be written in some first-order language. But we don't care about ITS syntax, because the user will never see or this ontology except maybe in the papers that define PSL. The ordinary user will deal with the PSL ontology ONLY through the concepts defined by these ontology, which will appear to her as predicates, classes, or whatever SWSL decides. Note: First-order is in quotes because these are only syntactically first-order. We actually use them in the rule bodies. Moreover, we use both "not" (negation as failure) and later "neg" (explicit weak form of classical negation -- used for courteous LP). Definitions: Connectives: "and" stands for conjunction. "or" stands for disjunction. "neg" stands for (weak) classical negation "not" stands for negation as failure Quantifiers: exists [, ]* forall [, ]* Formula: | and | or | not | not neg // courteous | neg // courteous | () | Benjamin: Any restrictions on the use of neg? For instance, is neg not neg permitted? Also, don't remember if Lloyd-Topor imposes any restrictions on the use of "not". For instance, not (exists X (p(...) \/ not q(...)))? **** We should move the neg stuff to the courteous subsection. 4. Complex positive literals - syntactic sugar for common O-O idioms Molecules can be combined in two ways: - by grouping and - by nesting Grouping. Grouping applies to molecules that describe the same object: t[m1 -> v1] and t[m2 => v2] and t[m3 {6:9} => v3] and t[m4->v4] is -- by definition -- equivalent to t[m1 -> v1 and m2 => v2 and m3 {6:9} => v3 and m4->v4] Can also combine and with or according to the normal precedence rules: t[m1 -> v1] and t[m2 => v2] or t[m3 {6:9} => v3] and t[m4->v4] is a sugar for t[m1 -> v1 and m2 => v2 or m3 {6:9} => v3 and m4->v4] Note: FLORA-2 also allows things like a[b->c] and not a[e->f] to be written as a[b->c and not e->f]. Used relatively rarely. Worth adding? If so, we won't be able to call these literals "complex positive" litarals. Nesting. Nesting applies to molecules in the following "chaining" situation, which is a common technique in object-oriented databases: t[m -> v] and v[q -> r] is by definition equivalent to t[m -> v[q -> r]] Nesting can also be used to combine membership and subclass molecules with value-molecules in the following situations: t sup s and s[m->v] is equivalent to t sup s[m->v] Likewise, t has s and s[m->v] is equivalent to t has s[m->v] Molecules can also be nested inside predicates: p(a[b->c]) is p(a) and a[b->c]. Deep nesting is allowed: p(f(q,a[b->c]),foo) is p(f(q,a),foo) and a[b->c] and a[b-> foo(e[f->g])] is a[b-> foo(e)] and e[f->g] Note that this syntax is completely compositional. That is, molecules are allowed anywhere where a term is allowed. For instance, a[foo(b[c->d]) -> e] is a[foo(b)->e] and b[c->d] and a[foo[b->c] -> e] means a[foo -> e] and foo[b->c] 5. Rules (also see Courteous extensions later) a. A rule is of the form :- ; ISSUE: some rule systems use "." as ending delimiter instead of ";". What do we do? NOTE: Rules are *not* classical material implication b. Rule head A conjunction of positive literals. Parentheses are used to enclose such conjunctive expressions, in the usual manner. Note: a common case of such conjunction is when there is an F-Logic complex molecule in the head. c. Rule body A rule body may be empty (in which case the rule is a fact) It may be a single (positive or negative) literal. It may be a conjunction of literals. Positive Lloyd-Topor extensions: More generally, the body may be built up from literals using the conjunction ("and") and disjunction ("or") connectives. Conjunction and disjunction may be nested, e.g., two and'ed subexpressions within a top-level or. Parentheses are used to enclose such conjunctive or disjunctive expressions, in the usual manner. General Lloyd-Topor extensions: In the body: these include (in addition to the above): - negation-as-failure applied to expressions - universal quantifiers - existential quantifiers - if ... then ... - if ... then ... else ... All these essentially rely on rewriting using negation-as-failure. In the head: these include (in addition to the positive L-T extensions): - if ... then ... *** "else" is not permitted - doesn't make sense Note: Also in the head it's easy but not really useful to add universal quantifiers Note: variables that are not explicitly quantified are assumed to be implicitly universally quantified outside of the rule. 6. The Path Notation Syntax Preamble: Path expressions are useful shorthands that are widely used in object-oriented and Web languages. In a logic-based language, a path expression allows one to write formulas more concisely by eliminating the need to introduce explicit variables. A value path expression has the form t.t1.t2. ... .tn where t, t1, t2, ... are terms or molecules. In the BODY OF A RULE, such an expression is considered a shorthand for the variable ?Var_n, which is defined by the following formula: t[t1->?Var1] and ?Var1[t2->?Var2] and ... and ?Var_{n-1}[tn->?Var_n] In other words, writing mary.father.mother = sally stands for mary[father->?F] and ?F[mother->?M] and ?M = sally and thus mary.father.mother denotes mary's paternal grandmother. Note that molecules can appear wherever a term can appear (nesting). Therefore, a.b[c->d].e.f[g->h].k is allowed. It denotes the ?V defined by the following formula: a[b-> ?X] and ?X[e->?Y] and ?Y[f->?Z] and ?Z[k->?V] and ?X[c->d] and ?Z[g->h] This is analogous to XPath's intermediate predicates. For instance, X = a.b[c->d].e.f[g->h].k is equivalent to X = a.b.e.f.k and a.b[c->d] and a.b.e.f[g->h] If we spell it out completely, then we get: a[b-> ?X] and ?X[e->?Y] and ?Y[f->?Z] and ?Z[k->?V] and ?X[c->d] and ?Z[g->h] and ?X[c->d] and ?Z[g->h] A signature path expression is defined by analogy. Here we use ".." to define such path expressions rather than ".". It has the form t..t1..t2.. ... ..tn where t, t1, t2, ... are terms. ISSUE: It was proposed at the F2F to use / to descend down the -> and .. to go up. The motivation is that it will allow to go in both directions. However: What should we use for navigation in signatures? (Maybe we shouldn't use path expressions for signatures at all?) The other point is that it is already easy to go up, and going up is very rare anyway. Here is an example for going up. Something similar to a/../b/c starting with the object "d" in XPath: ?Y[a -> d].b.c This is actually more coherent than the XPath stuff. Note: The * in XPath (eg b/*/c beginning with object "a" is represented as a.b.?X.c i.e., you use a variable. ISSUE: One may ask: what is the substitute for things like //? Before we introduce such a syntax, we need to see a usecase. One should be aware that navigation in object structures, which can be cyclic graphs is not the same as navigation in XML trees. So, // is not that well-defined (although it can be with some effort). Here is an example of a cyclic structure: a[b->c]. c[b->a]. SEMANTICS DETAILS: NOTE: A path expression means different things depending on whether it is uded in the rule body or the rule head. For the process descriptions that come from the FLOWS part, we need to see how they are used and decide what their semantics is supposed to be. In fact, it is not clear what is the use of this semantics for the Rules language, since process descriptions are just P-terms. However, in the actual translation from these process expressions into the internal-KIF-style-language-that-only-theoreticians-will-see such a semantics is important. Probably a job for the FLOWS group. In the BODY of a rule, such an expression is considered a shorthand for the variable ?Var_n, which is defined by the following formula: t[t1=>?Var1] and ?Var1[t2=>?Var2] and ... and ?Var_{n-1}[tn=>?Var_n] Value and signature path expressions can be mixed (but this is rarely useful): t.t1..t2.t3. A path expression can appear wherever an object can appear. For example, a[b->c.d] is legal and means c[d->?V] and a[b->?V] (because c.d is a shorthand for the variable ?V in the above). Similarly, a.b.c[d->e] is legal and means a[b->?B] and ?B[c->?D] and ?D[d->e] (because a.b.c is a shorthand for ?D in the above). The expressions of the form a.b.c[d->e].f.g[h->k].m are also allowed. They stand for such a value v that a.b.c[d->e] and a.b.c.f.g[h->k] and a.b.c.f.g.m = v In the HEAD of a rule, a path expression is defined with the help of Skolem functions. For instance, a.b[foo->X] <- body(X). means a[b->skolem(a,b)] <- body(X). skolem(a,b)[foo->X] <- body(X). where skolem is some new function symbol. Similarly for type signature constraints a..b[foo=>X] <- body(X). means a[b=>sigskolem(a,b)] <- body(X). sigskolem(a,b)[foo=>X] <- body(X). where sigskolem is yet another new function symbol. 7. Reification. If f is a formula that has the syntactic form of a rule head, rule body, or of a rule (defined later) then f is also considered to be a term. Therefore, it can be used whenever an object can occur. NOTE: As defined, reification introduces syntactic ambiguity. See below how this is dealt with it. A reified formula represents an objectification of the corresponding formulas. Such terms are useful for specifying ontologies where objects specify theories that can be true in some worlds, but are not true in the present world (so those theories cannot be asserted in the present world). Examples include the effects of actions. Reification doesn't cause paradoxes as long as: - rule heads don't contain classical negation - rule head can't be a variable, i.e., X :- body (which is allowed in HiLog) (See G. Yang and M. Kifer, "Reasoning about Anonymous Resources and Meta Statements on the Semantic Web," Journal of Data Semantics, Springer Verlag, 2003.) The fact that Courteous SWSL-Rules can have negation in the rule heads doesn't mean that paradoxes are possible. The SWSL-Rules negation is not classical (but rather defeasible) and such sets of rules can be rewritten without negation in the head. SYNTACTIC DETAILS. Reification introduces ambiguity into the syntax. For instance, consider a[b->t] where t is a term, which we want to consider as a reification of the formula c[d->e]. Since we said that a formula is a term then we should be able to just write a[b->c[d->e]] But this is ambiguous, since earlier we defined the above as a commonly used object-oriented idiom, a syntactic sugar for a[b->c] and c[d->e] Similarly, if we want to write something like t[b->c] where t is a reification of f[g->h] then we can't write f[g->h][b->c] because this is syntactic sugar for f[g->h] and f[b->c]. To resolve this ambiguity, we introduce the reification operator whose only role is to indicate to the parser that a particular occurrence of a nested molecule is to be treated as a term that represents a reified formula rather than as a syntactic sugar for an object-oriented idiom. Note that the reification operator is not needed for HiLog predicates because there is no ambiguity: a[b->p(X)] doesn't mean a[b->p(X)] and p(X) because the sugar is only for the object-oriented idioms, i.e., for nested molecules. However, in a[b->p(X[foo->bar])] explicit reification is needed because the above is a syntactic sugar for a[b->p(X)] and X[foo->bar] So, we should write this instead: a[b-> ${p(X[foo->bar])}] Why is the object-oriented idiom the default rather than reification? Because reification is rare and the use of the idiom is common. Examples: Reification in SWSL-Rules is very powerful and yet it doesn't add to the complexity of the language. Here is an example that models an agent who believes in modus ponens: john[believes-> ${p(a)}]. john[believes-> ${q(?X) <- p(?X)}]. // modus ponens john[believes-> ?A] <- john[believes -> ${?A <- ?B}] and john[believes -> ${?B}]. One can infer that john believes in q(a). Note: explicit reification in ${p(a)} is not needed, because no ambiguity exists here. 8. Courteous extensions - Rule Labels: Rules have optional labels: '<'