From: Richard Fikes ([email protected])
Date: 11/12/01
I have been doing other things for a few days and am now getting back to the query language discussion. In this message, I am attaching at the end an updated version of the "Query Language Issues" message I sent out last Tuesday and am presenting a proposal for how to handle query answers that include bindings to constants created by the server (i.e., the query answering agent). The updated issues message primarily separates the statements of issues from the proposals I made to deal with some of the issues. It also contains some minor updates in response to some of Pat's concerns about the original message. What follows is a proposal for discussion for how to handle query answers that include bindings to constants created by the server. Such constants would either be the labels of anonymous RDF nodes or Skolem constants that denote objects whose existence is entailed (e.g., by cardinality restrictions). In an e-mail discussion involving Pat and Ian, an example query ("What person owns which red car?") was being considered whose query pattern is, using their notation, ?[x,y](Px & RCy & Oxy). I will assume here that the request is for all answers to the query. There was discussion of using a binding to "<blank>" when the binding would be to a constant created by the server. So, for example, an answer might be [x/Ian, y/<blank>], meaning that Ian owns at least one red car but that no names are available for the cars. The following examples were also given: > [[x/blank y/blank]] -Yes, someone owns a red car > [[x/Ian y/blank]] - Ian owns (at least one) red car > [[x/Ian y/Car4][x/Ian y/Car5]] - Ian owns at least two red cars > [[x/blank y/Car4]] - Somebody owns this red car > [[x/blank y/blank][x/blank y/blank]] - at least 2 different people own > red cars (warning, this may lead to infinite answers) > [[x/Ian y/Car4][x/Ian y/Car5][x/blank y/blank]] - Ian owns at least > two red cars, and someone else who is not Ian also owns a red car The notion of a binding to <blank> seems like a useful one. However, it also seems to need refinement. Problems that I see involve the semantic relationships among multiple occurrences of <blank> in a query answer (e.g., in [[x/blank y/blank]], do the bindings to x and y necessarily denote different objects?), semantic relationships between <blank> in one query answer and bindings to both constants and to <blank> in other answers to the same query (e.g., does the set of answers [[x/blank y/blank][x/blank y/blank]] mean what the comment above partially suggests that at least 2 different people own 2 different red cars"? and does the set of answers [[x/Ian y/Car4][x/Ian y/Car5][x/blank y/blank]] mean as the comment above suggests that ian owns at least two read cars and someone else who is not Ian own yet a third red car?). And how would the server communicate a conclusion that one unnamed person owns at least two unnamed red cars? It seems to me that the use of a single <blank> binding in query answers is inadequate. What I would suggest is needed is a vocabulary of blank bindings, say <blank-1>, <blank-2>, ..., that can be used as bindings in query answers and that have the semantics of existentially quantified variables. Such a vocabulary would be used in our example as follows: A query answer of [x/Ian, y/blank-1] would mean that the following sentence is entailed by the query KB: (exists b1) (P Ian & RC b1 & O Ian b1) A query result of [[x/blank-1 y/blank-2]] would mean that the following sentence is entailed by the query KB: (exists b1 b2) (P b1 & RC b2 & O b1 b2) A query result of [[x/blank-1 y/blank-1]] would mean that the following sentence is entailed by the query KB: (exists b1) (P b1 & RC b1 & O b1 b1) That answer doesn't make sense in the intended interpretation of our example query, but I assume it is clear that one could generate an example query where such an answer would make sense. What about a query result of [[x/blank-1 y/blank-2][x/blank-3 y/blank-4]]? Such a result seems to make sense only when the query asks for answers each of which is known to denote a distinct set of objects. (See my proposal in the attachment regarding "Uniqueness of Answers".) If the only requirement in the query is that each set of bindings is unique (rather than each set of objects denoted by the bindings), then the server could generate any number of answers by simply using different blank-i bindings. If the query does indeed ask for answers each of which is known to denote a distinct set of objects, then the query result means that the following sentence is (sentences are?) entailed by the query KB: (exists b1 b2 b3 b4) (P b1 & RC b2 & O b1 b2) (P b3 & RC b4 & O b3 b4) and that not both "(equivalentTo b1 b3)" and "(equivalentTo b2 b4)" are entailed by the query KB. We might decide that results such as [[x/blank-1 y/blank-2][x/blank-3 y/blank-4]] are beyond the scope of our query language. However, there seem to be cases that are more compelling, such as [[x/Ian y/blank-1][x/Ian y/blank-2], meaning Ian owns two red cars, or [[x/blank-1 y/blank-2][x/blank-1 y/blank-3], meaning someone owns two red cars. What do you think of this proposal? Richard ----------------------------------- The following is an articulation of open issues and proposals for specifying and formalizing the information content of a "query-answering discourse" between two agents in which one agent seeks information from a second agent by sending a "query" to the second agent. Refer to the agent sending the query as the "client" and the agent receiving the query as the "server". Refer to the response sent by the server to the client as a "query result" and assume that a query result may contain one or more "query answers". Knowledge Base A query is posed with respect to a knowledge base that is a DAML+OIL representation of a logical theory. Thus, a query needs to include a reference to a DAML+OIL knowledge base. Refer to that knowledge base as the "query KB". Query Premise We have discussed enabling a query to include a premise that is to be added to the query KB so that the query is being asked of the query KB unioned with the premise. A premise essentially facilitates queries of the form if-then while still remaining within the expressiveness of DAML+OIL. ISSUE: Do we want to enable the inclusion of a premise in a query, and if so what is the form of a premise? PROPOSAL: A query may include a premise that is an arbitrary DAML+OIL knowledge base. Query Pattern Assume a query contains a "query pattern" that specifies relationships among unknown sets of objects in a domain of discourse. Each unknown object is represented in the query pattern by a variable. Answering a query with variables x1,�,xn involves identifying tuples of object constants such that for any such tuple <c1,�,cn>, if ci is substituted for xi in the query pattern for i=1,�,n, then the resulting "query pattern instance" specifies a sentence that is entailed by the query KB. For each such ci, the pair [xi, ci] is a binding for the corresponding variable xi. Each set of such variable bindings specifies a candidate answer to the query. ISSUE: What is the expressive power of the query pattern language? For example, we might decide that a query pattern instance can specify a sentence that is a disjunction of a conjunction of RDF statements or is a negation of a conjunction of RDF sentences, or �. PROPOSAL: Define the query pattern language so that a query pattern specifies a conjunction of RDF statements containing variables. A query pattern instance therefore specifies a collection of RDF statements that are entailed by the query KB. Answer Mapping Function A query may specify a mapping of variable bindings into query answers. In particular, a query answer may include or make use of bindings to only a subset of the variables in the query pattern. ISSUE: What is the nature of the answer mapping function language? PROPOSAL: I think all that matters for our formalization and core design work is that a query answer may make use of the bindings to only a subset of the variables in the query pattern. So, my recommendation is that for now we consider a query answer to consist of a set of bindings for a subset of the variables, call them "query variables", and that the query specifies which variables are in that subset. I will make that assumption in the remainder of this document. ISSUE: What constants can be in query answer bindings? In particular, can a query variable be bound in a query answer to an anonymous node in the RDF graph? If so, what is the form and semantics of that binding? Also, can a query variable be bound in a query answer to an object that is entailed in the knowledge base (e.g., by a cardinality constraint) but whose identity is not known by the server? If so, what is the form and semantics of that binding? Uniqueness of Answers In general, a query result may contain multiple query answers. ISSUE: What guarantees do we want to make about the distinctiveness of multiple query answers in a query result? We may want to guarantee that no two answers consist of identical sets of bindings. That would say that if A1 and A2 are query answers in a query result, then there exists a query variable xi for which the binding for xi in A1 is not the same constant as the binding for xi in A2. A more difficult guarantee would be that no two answers consist of equal sets of bindings. That would say that if A1 and A2 are query answers in a query result, then there exists a query variable xi for which the binding for xi in A1 is not equal to (i.e., does not denote the same object as) the binding for xi in A2. PROPOSAL: We guarantee there are no identical sets of bindings and that we enable a query to include an indicator as to whether equal sets of bindings are acceptable. Number of Answers I assume our query language needs to enable a query to specify what is being asked for and our query result language needs to enable a query result to contain the information requested in a query. That specification in a query would include how many query answers are being requested and what information is being requested about how many query answers there are (i.e., how many answers are entailed by the KB). Also, the query result language needs to enable including in a query result the information requested about how many query answers there are. ISSUE: Whether and how to include in a query the number of query answers being requested. PROPOSAL: Enable a query to include a "# answers requested" that could be either a non-negative integer, the constant "All", or the constant "Enumerator". If the number of answers requested is an integer n, the query is a request for as many query answers as the server can deduce up to n. If the number of answers requested is "All", the request is for as many query answers as the server can deduce. If the number of answers requested is "Enumerator", the request is for a process handle (continuation) that can be sent back to the server in a subsequent message to request that the server produce a "batch" of k query answers. Note that if the query indicates that equal sets of bindings are not acceptable, then "# answers requested" refers to the number of sets of bindings no two of which are equal. If the query indicates that equal sets of bindings are acceptable, then "# answers requested" refers to the number of distinct sets of bindings. Some convention is needed for the case where the number of answers requested is "All" and the server can deduce an infinite number of answers. Perhaps the convention would be that the server returns a process handle in that case. ISSUE: Whether and how to include in a query what information is being requested about how many query answers there are and in a query result the information requested. PROPOSAL: Enable a query to include a request for information as to how many query answers there are ("# answers?") and for a query result to include a specification of how many query answers are entailed ("# answers entailed") in the form of an ordered paired denoting a closed interval. For example, a result containing the ordered pair [4,0-0] as the "# answers entailed" means that the server has determined that there are at least 4 answers to the query. (I am using "0-0" here to stand for the "infinity" symbol.) Note that a value of [0,0-0] for "# answers entailed" is always true. When a query includes a request of "# answers", the server is being asked to deduce whatever it can about how many answers there are. Note that a query can specify that "# answers requested" is zero so that the only information being requested is the number of answers. Also, note that a query result could contain a value for "# answers" even when that information was not requested in the query. Justification For Answers Under the assumption that our query language needs to enable a query to specify what is being asked for and our query result language needs to enable a query result to contain the information requested in a query, the query language needs to enable inclusion in a query a request for a justification for each query answer and the query result language needs to enable inclusion of such justifications in a query result. ISSUE: Whether and how to include provisions for justifications to be requested in a query and be included in a query result. What kind of justification language(s) do we include in our query results language? Knowledge Base Structure Queries There seems to be a clear need to ask queries about the knowledge base itself as an artifact. The most compelling example involves determining the "direct" subclasses of a class or subproperties of a property. ISSUE: Whether and how to include such queries in our query language and query results language. The primary issue seems to be to what extent we allow intermingling of "structural queries" with "entailment queries". That is, we probably don't want to allow a request for direct subclasses to appear anywhere in a query pattern that an RDF statement (with variables) could appear.
This archive was generated by hypermail 2.1.4 : 04/02/02 EST