queries often need arithmetic operators and comparison, and functions on other scalar types (such as string matching and concatenation). Such functions can be viewed as predefined predicates (with large, possibly infinite, extents). For example, addition can be modeled as a three-place predicate sum(M,N,P)
that contains all facts where M
+ N
= P
, such as sum(2,3,5)
and sum(5,0,5)
. The greater-than comparison can be modeled as a predefined two-place predicate gt(M,N)
containing all pairs of numbers that stand in the correct relationship, such as gt(5,1)
and gt(6,3)
. Obviously, the extents of such predicates cannot be stored explicitly (unless the domain is small, such as for Boolean operators). Thus, such predicates are actually supported by evaluation in the underlying processor. Because of this computational realization the use of evaluable predicates is restricted in which positions must be bound before evaluation. For example, sum
might require that the first two or all three arguments be bound; gt
requires both arguments to be bound. The conditions for safe Datalog rules can be extended to incorporate these binding requirements. Maier and Warren [1981] give a general method of checking safety in a query by modeling binding patterns with functional dependencies.
Most logic languages provide alternative syntax for evaluable predicates. For example, the rule
can be written in Prolog as
1.4.3 Aggregation and Sets
Aggregates (such as sum, count, and average) are important operations for database query, and so should be supported in a data language such as Datalog. But just as the inclusion of recursion complicates the semantics of negation, it also complicates the semantics of aggregation. With any reasonably powerful aggregation capability, we can define negation in terms of aggregation. For example, the COUNT of solutions to a goal is 0 if and only if the goal has no solutions.
So, just as definitions that involve recursion through negation can be problematic, so can definitions that involve recursion through aggregation. There have been a number of approaches to providing a semantics of aggregation operators in Datalog.
Prolog has basically one aggregation operator, findall/3
(or variants setof/3
and bagof/3
),9 which collects the answers to a query into a list. With this operation, the programmer can then explicitly program many other aggregation operators by iterating over that list of collected results. But that approach requires complex data structures (the list) which are excluded from basic Datalog. LDL [Tsur and Zaniolo 1986] added a set data structure to Datalog, in which a program variable could take on the value of an extensional set. Aggregates were then defined over these extensional sets in a way not unlike in Prolog. In Prolog the evaluation of some definitions would go into infinite loops; in LDL some specifications were “undefined” or “unsafe”. Fully supporting extensional sets in a logic language, including general set unification, is quite complex, and was abandoned in the move to LDL++ [Zaniolo et al. 1993]. Another approach to supporting sets, an intensional approach in which sets have names, is discussed below in Section 1.4.7 on higherorder extensions.
XSB [Swift and Warren 2011] supports aggregation by allowing the user to provide a lattice operation or a partial order relation defined in Prolog, and then applying the operator incrementally as values are added to goal tables to retain the set of minimal covering values. This capability has found interesting application in implementing conditional preference theories [Cornelio et al. 2015]. But XSB provides no formal definition of when aggregations are well defined, and the user is responsible for avoiding problematic definitions.
A first approach to ensuring the well-definedness of recursive rules that include aggregation was to require that definitions involving aggregation be stratified. In this case, the computation can proceed bottom-up from lower to higher strata in such a way that any values needed for an aggregate operator are defined at a lower stratum from the aggregate, and can therefore be completely evaluated before they are needed. Predicate stratification of a program is easily decided at compile time, but it turns out that many reasonable uses of aggregation are not predicate stratified. But local stratification is more complex and essentially requires full computation to determine whether a Datalog program is locally stratified. So efforts have been made to define efficiently recognizable classes of locally stratified programs that cover all (or most of) the meaningful aggregate definitions that involve recursion, for example LDL++
[Zaniolo et al. 1993].
Another approach is to define the meaning of aggregates for all programs using the Stable-Model Semantics or the Well-Founded Semantics [Kemp and Stuckey 1991, Pelov et al. 2007]. These approaches define powerful languages for aggregates and give them semantics under both frameworks, but some program restrictions seem necessary to ensure efficient computation of correct answers.
An interesting example of a meaningful definition using non-predicate-stratified aggregation is for shortest path. In XSB, it would be written as:
The table
declaration indicates the use of the minimum
lattice operation to aggregate the third argument of the sp/3
predicate with grouping on the first two arguments. When a query :- sp(X, Y, D).
is invoked, computation proceeds as usual, except when a new answer, say sp(a, b, 3)
, is computed for the predicate sp/3
, the table is checked to see if the new answer for the aggregate argument, here 3, is smaller than an existing answer, say 1 for sp(a, b, 1)
. If it is smaller (or if there is no previous answer for X
and Y
) the new answer is added (and any old one is deleted). If it is not smaller than the existing answer, the new answer is not added and computation fails.
We can understand the semantics of this program by assuming that the aggregate table declaration asserts, in this case, the axiom
and so sp(X, Y, D)
is true if there is some path from X
to Y
of length D
or less. Operationally we do not add a new answer to the table if it is implied by an existing answer. And when we add a new answer, we delete any old one implied by the new one.
We also mention here approaches to aggregation that involve introducing new structures or language constructs, such as lattices [Conway et al. 2012, Ross and Sagiv 1992], keys on results [Zaniolo 2002], and non-deterministic choice [Zaniolo and Wang 1999].
1.4.4 Existential Variables in Rule Heads: Datalog ±
Datalog± [Calì et al. 2011, Gottlob et al. 2014] extends the basic Datalog by allowing existential variables in the rule heads. That is, rules can now have the form
Here, the head of the rule has existentially quantified variables (Z
), while the variables X
(common to the head and the body) and Y
(exclusive to the body) are universally quantified outside of the rule. These latter quantifiers are omitted, following the usual convention in logic programming. Another novelty is that the head can have the form X1 = X2
(possibly quantified with ∃
), meaning that the inferred instantiations for X1
and X2
must be the same constant. Here is an example:
Rules 2 and 3 here have existentials in the head and rule 8 has equality in the head. In the database theory (from which Datalog± came), the rules with head-existentials as well as the regular Datalog rules (such as rules 4–7) are called tuple-generating dependencies (TGDs), while the