From: Richard Fikes ([email protected])
Date: 11/06/01
I said last week that I would send out a message articulating the issues we have been discussing regarding query-answering. This is that message. My apologies for not sending the message out sooner. I am back at home (and my father is recuperating well from his surgery), but I did not have time to put this together sooner. As per my earlier messages, my student Yulin Li has been significant contributions to the material in this message. Richard ----------------------- This message is to articulate issues involved in specifying and formalizing the information content of a "query-answering discourse" between two agents in which one agent seeks information from a second agent by sending a "query" to the second agent. I will refer to the agent sending the query as the "client" and the agent receiving the query as the "server". I will refer to the response sent by the server to the client as a "query result" and will assume that a query result may contain one or more "query answers". Knowledge Base I assume a query is posed with respect to a knowledge base that is a DAML+OIL representation of a logical theory. Thus, a query needs to include a reference to a DAML+OIL knowledge base. I will refer to that knowledge base as the "query KB". Query Premise We have discussed enabling a query to include a premise that is to be added to the query KB so that the query is being asked of the query KB unioned with the premise. A premise essentially facilitates queries of the form if-then while still remaining within the expressiveness of DAML+OIL. ISSUE: Do we want to enable the inclusion of a premise in a query, and if so what is the form of a premise? I recommend that we do allow premises and that they be an arbitrary DAML+OIL knowledge base. Query Pattern I assume a query contains a "query pattern" that specifies relationships among unknown sets of objects in a domain of discourse. Each unknown object is represented in the query pattern by a "query variable". Answering a query with query variables x1,�,xn involves identifying tuples of object constants such that for any such tuple <c1,�,cn>, if ci is substituted for xi in the query pattern for i=1,�,n, then the resulting "query pattern instance" specifies a sentence that is entailed by the query KB. For each such ci, the pair [xi, ci] is called a "query variable binding" for the corresponding query variable xi. Each set of query variable bindings specifies a candidate answer to the query. ISSUE: What is the expressive power of the query pattern language? For example, we might decide that a query pattern instance can specify a sentence that is a disjunction of a conjunction of RDF statements or is a negation of a conjunction of RDF sentences, or �. My recommendation is that we define the query pattern language so that a query pattern instance specifies a conjunction of sentences each of which is representable in DAML+OIL (i.e., a conjunction of RDF statements). Answer Mapping Function A query needs to specify a mapping of query variable bindings into query answers. In particular, a query answer may include or make use of bindings to only a subset of the query variables. ISSUE: What is the nature of the answer mapping function language? For example, a query answer might be allowed to be any s-expression whose atomic elements are bindings to specified variables. (E.g., map the bindings [x1, c1], [x2,c2], [x3,c3], and [x4,c4] to the s-expression "(c1 (c2 c3) c4)".) I think all that matters for our formalization and core design work is that a query answer may make use of the bindings to only a subset of the query variables. So, my recommendation is that for now we consider a query answer to consist of a set of bindings for a subset of the query variables, and that the query specifies which query variables are in that subset. I will make that assumption in the remainder of this document. ISSUE: What constants can be in query answer bindings? In particular, can a query variable be bound in a query answer to an anonymous node in the RDF graph? If so, what is the form and semantics of that binding? Also, can a query variable be bound in a query answer to an object that is entailed in the knowledge base (e.g., by a cardinality constraint) but whose identity is not known by the server? If so, what is the form and semantics of that binding? Uniqueness of Answers In general, a query result may contain multiple query answers. ISSUE: What guarantees do we want to make about the distinctiveness of multiple query answers in a query result? We may want to guarantee that no two answers consist of identical sets of bindings. That would say that if A1 and A2 are query answers in a query result, then there exists a query variable xi for which the binding for xi in A1 is not the same constant as the binding for xi in A2. A more difficult guarantee would be that no two answers consist of equal sets of bindings. That would say that if A1 and A2 are query answers in a query result, then there exists a query variable xi for which the binding for xi in A1 is not equal to (i.e., does not denote the same object as) the binding for xi in A2. My recommendation is that we guarantee there are no identical sets of bindings and that we enable a query to include an indicator as to whether equal sets of bindings are acceptable. Number of Answers I assume our query language needs to enable a query to specify what is being asked for and our query result language needs to enable a query result to contain the information requested in a query. That specification in a query would include how many query answers are being requested and what information is being requested about how many query answers there are. Also, the query result language needs to enable including in a query result the information requested about how many query answers there are. ISSUE: Whether and how to include in a query the number of query answers being requested. For example, we might enable a query to include a "# answers requested" that could be either a non-negative integer, the constant "All", or the constant "Enumerator". If the number of answers requested is an integer n, the query is a request for as many query answers as the server can deduce up to n. If the number of answers requested is "All", the request is for as many query answers as the server can deduce. If the number of answers requested is "Enumerator", the request is for a process handle (continuation) that can be sent back to the server in a subsequent message to request that the server produce a "batch" of k query answers. Some convention is needed for the case where the number of answers requested is "All" and the server can deduce an infinite number of answers. Perhaps the convention would be that the server returns a process handle in that case. ISSUE: Whether and how to include in a query what information is being requested about how many query answers there are and in a query result the information requested. For example, we might enable a query to include a request for information as to how many query answers there are ("# answers?") and for a query result to include a specification of how many query answers are entailed ("# answers entailed") in the form of an ordered paired denoting a closed interval. For example, a result containing the ordered pair [4,0-0] as the "# answers entailed" means that the server has determined that there are at least 4 answers to the query. (I am using "0-0" here to stand for the "infinity" symbol.) Note that a value of [0,0-0] for "# answers entailed" is always true. When a query includes a request of "# answers", the server is being asked to deduce whatever it can about how many answers there are. Note that a query can specify that "# answers requested" is zero so that the only information being requested is the number of answers. Also, note that a query result could contain a value for "# answers" even when that information was not requested in the query. Justification For Answers Under the assumption that our query language needs to enable a query to specify what is being asked for and our query result language needs to enable a query result to contain the information requested in a query, the query language needs to enable inclusion in a query a request for a justification for each query answer and the query result language needs to enable inclusion of such justifications in a query result. ISSUE: Whether and how to include provisions for justifications to be requested in a query and be included in a query result. What kind of justification language(s) do we include in our query results language? Knowledge Base Structure Queries There seems to be a clear need to ask queries about the knowledge base itself as an artifact. The most compelling example involves determining the "direct" subclasses of a class or subproperties of a property. ISSUE: Whether and how to include such queries in our query language and query results language. The primary issue seems to be to what extent we allow intermingling of "structural queries" with "entailment queries". That is, we probably don't want to allow a request for direct subclasses to appear anywhere in a query pattern that an RDF statement (with variables) could appear.
This archive was generated by hypermail 2.1.4 : 04/02/02 EST