and Vardi 1992, Naughton 1986],
We should add that the area of Description Logic [Baader et al. 2003] is concerned with the query containment and equivalence questions for various subsets of first-order logic, which have a limited intersection with Datalog. However, we are not aware of a result from that area that yields new insights when applied to that intersection.
1.6 Early Datalog and Deductive Database Systems
While much of the initial work on Datalog looked at evaluation options, semantics and extensions, a number of fairly complete implementations of deductive database systems were built in the late 1980s and early 1990s. These systems differed in several ways.
One was in how they provided for ordered execution. Although Datalog has pure logical semantics and (at least, in theory) the programmer does not need to worry about rules and goal ordering, most applications need operations with side effects, such as updates and I/O, where order is important. Thus, these systems tended either to embed declarative queries in a procedural language, or have certain rules where evaluation order was enforced.
A second source of variation is in what kinds of extensions of basic Datalog were supported, such as structured terms, collections, negation and aggregation. A third aspect was the optimizations and evaluation strategies employed, which are dictated to some extent by the extensions supported. The final area we call out is how base relations were managed. Some systems had disk-based storage for the EDB, either custom built or using an existing DBMS or storage system. Others worked with a main-memory database (possibly backed to secondary storage between sessions).
The coverage that follows is by no means exhaustive, but serves to illustrate the differences we mentioned above. We have also leaned toward systems that were publicly distributed. We recommend the survey by Ramakrishnan and Ullman [1995] for a more in-depth comparison of systems from that era. Also, while we refer to these systems in the present tense, to our knowledge only XSB is still being supported and widely used, though some of the concepts and techniques of LDL++
are incorporated in DeALS [Shkapsky et al. 2013].
1.6.1 LDL and LDL++
The LDL language and implementation were developed at the Microelectronics and Computer Technology Corporation (MCC) during the mid-1980s as an alternative to lose coupling between logic languages and databases [Tsur and Zaniolo 1986, Zaniolo 1988]. LDL was intended to be complete enough for full application development, although it can be called from procedural languages and can call computed predicates defined in C++
. While starting from pure Horn clauses with declarative semantics, it does provide some control constructs to enforce order on updates and screen output.
LDL preserves complex terms from Prolog, such as date(1981, 'March')
, and augments them with sets. Set values can be listed explicitly, or created in rules by grouping results. For example, a rule
will return result facts such as
LDL provides a partition
predicate that can non-deterministically split a set, to support parallel processing of large datasets. LDL also supports non-determinism through a choice
predicate, which selects a specific assignment of values from a range of options. For example, in
the choice((AID), (PDI))
goal ensures that each advisor id AID
will have only one person id PID
associated with it in oneAdvisee
. (In relational database terms, the view oneAdvisee
will satisfy the functional dependency AID → PID.) LDL supports stratified negation, which is handled via an anti-join strategy. For example, the rule
is evaluated by anti-join of the person
table with the advised
table on their first arguments. (Anti-join removes any rows from its first argument that connects with at least one row in the second argument on the join condition. Here it will knock out any person
-fact that has a matching advised
-fact.)
LDL has two implementations. The first compiles LDL into FAD, an algebraic language designed at MCC for execution on a parallel database machine. A later, single-user prototype for Unix called SALAD [Chimenti et al. 1990] was created to make LDL available more broadly. SALAD was among the first systems to integrate many of the evaluation techniques covered in Section 1.5. SALAD applies both Magic Sets and Generalized Counting [Saccà and Zaniolo 1986a] rewrites. (The latter is suitable for graph-like queries where path length matters but not path membership.) SALAD evaluation is bottom-up semi-naïve, but has several variants for processing joins associated with individual rules, such as various combinations of eager and lazy evaluation, sideways information passing, materialization, and memoization. Some methods support recursive predicates, and some can be used with intelligent backtracking. Intelligent backtracking detects cases where a join should not continue iterating from an earlier subgoal than it normally would. Consider the rule for the csInfo
predicate from the initial example,
which is essentially a join of the thesis
, person
and area
relations. Suppose we are currently considering the two rows
and fail to find any area
row matching the goal
There is no reason to return to consider further person
-rows, since the hw
value comes from the thesis
-row. The SALAD optimizer chooses among the joinprocessing options based on program requirements, as well as estimated costs and index availability. The LDL compiler can process a predicate differently depending on the binding pattern of a call. So, for example, for csInfo(b, b, f, f)
(a call with the first two arguments bound and the last two free), it might choose to put person
first in the join, whereas for csInfo(f, f, b, b)
, it might put person
last in the join.
LDL++
[Arni et al. 2003] was an extension of LDL begun at MCC and completed at UCLA. LDL++
adds support for user-defined aggregates (UDAs), offering both forms that provide final results as well as versions that can return “online” results. For example, an aggregate for average might return the intermediate value after each group of 100 inputs has been examined. LDL++
also introduced a testable form of local stratification in which iterations of a predicate are explicitly indexed, and rules with negation must only use results available at a previous index.
LDL++
added access to external databases to SALAD’s internal main-memory database. LDL++
can create SQL calls to external sources that contain multiple goals, thus benefitting from the DBMS’s query optimizer.
1.6.2 CORAL
Ramakrishnan et al. [1994] developed the CORAL deductive system at the University of Wisconsin. To support imperative aspects, CORAL is “mutually embedding” with C++
code—that is, CORAL modules can be called from C++
and evaluable predicates for CORAL can be written in C++
. In terms of Datalog extensions, CORAL supports structured terms in predicates and explicit multi-set values, with grouping capabilities similar to LDL sets. CORAL also provides functions, such as union and subset, to operate over multi-set values, and aggregation functions, such as sum and count, that can reduce a multi-set to a scalar. CORAL supports a generalized form of stratified negation where a predicate can depend negatively on itself, as long as no individual ground goal does. So, for example, if a predicate p
can have a rule of the form