aggregation capability, integrated with tables, that allows programmers to define their own aggregation operators to be applied to all answers of a query. These operators can be used in recursive rules to solve such problems as finding the length of the shortest path between a pair of nodes in a directed graph.
XSB does minimal automatic query optimization. The general philosophy is to allow the programmer to control how the query is evaluated by determining how it is specified. This approach was inherited from XSB’s foundation in the Prolog programming language. XSB does include a few simple program transformations to improve indexing and to factor clauses to potentially reduce the polynomial complexity of some queries. But these optimizations must be explicitly invoked by the user. XSB also supports subsumptive tabling, which allows a table to be re-used if a proper instance of the original goal is encountered. This capability supports inserting a bottom-up component into the normal goal-directed computation of XSB. For example, for transitive closure, we could declare the predicate as subsumptively tabled, and then make the fully open call (i.e., where each argument is a distinct variable). In this case, there will be only one call that uses the clauses that match this predicate; every subsequent call is subsumed by that first call, and so it will suspend on the table associated with the initial open call, waiting for answers to show up that it can return. So the first answers must come from the base case, since only these answer do not require a recursive call. The next answers will come from a single use of the recursive call, and so on, simulating bottom-up computation.
XSB does not directly provide much support for persistent data. Using Prolog, programmers can read and write files to define internal predicates. So the programmer can batch-load data files, or can read them incrementally. Some work explored evaluation-control strategies that would provide more efficient disk access patterns when accessing external data [Freire et al. 1997]. XSB also has a powerful ODBC interface to any relational database system,20 so an XSB user has access to data persistence facilities.
1.6.6 Other Systems
Aditi was a deductive database project started in 1988 at the University of Melbourne [Vaghani et al. 1994]. Aditi-Prolog is a declarative variant of Prolog, which can be used to define predicates in the NU-Prolog system [Ramamohanarao et al. 1988]. Aditi can evaluate predicates bottom up (with optional parallelism) and in tuple-at-a-time top-down mode. Aditi provides different strategies to order literals in a clause during compilation, such as choosing the next subgoal to maximize bound variables or to minimize free variables. Aditi programs are translated to relational algebra programs, which are further optimized using programming-language and database techniques. Joins can be evaluated by several methods, which can take advantage of different tuple representations that Aditi supports. Parallelism is available both among different subgoals of a rule and among multiple rules for a predicate.
DECLARE is the language for the Smart Data System (SDS) [Kießling et al. 1994] whose foundational work was done at the Technical University of Munich (TUM) and then continued at a research and development center during the period of 1986–1990. One of the main targets of SDS was decision making over heterogeneous data, hence it emphasizes connectivity to multiple data sources, both local and remote. DECLARE is a typed language using SQL types plus lists and supporting constraints. SDS provides distributed query processing, where evaluation plans can contain complex SQL queries executed by external systems. Also, the SDS developers were interested in search strategies beyond top-down and bottom-up techniques, in particular, using domain-specific knowledge. For example, in a rule for connecting flights, the intermediate city will generally be in the same geographic region as the source and destination cities.
LOLA is another deductive database system from TUM [Freitag et al. 1992], which can access stored relations in external databases and predicates defined in Common Lisp. It supports type groups that collect terms into classes, such as airports and geographic regions. LOLA is translated to an intermediate language, from which it builds operator graphs that are optimized for subexpressions and indexing opportunities. These graphs are evaluated in R-Lisp, which implements an in-memory algebra for relations with complex objects. Some of the preprocessing routines are in fact written in LOLA itself.
RDL is a production-rule system from INRIA [Kiernan et al. 1990]. It uses “if … then” rules with the consequent being a sequence of inserts, deletes, and updates of stored or virtual tables. We mention it here because unlike most production-rule systems, which fire a rule separately for each instantiation, RDL takes a set-oriented approach similar to most Datalog systems. It computes all bindings of a rule body under the current database state, then processes all modifications.
The Starburst project at IBM sought to make relational DBMSs more extensible in several dimensions, including query processing [Haas et al. 1989]. It introduced table functions. A table function produces a table as a result, and can call itself in its own definition, thus expressing a recursive query. An interesting outcome of the project was a method for applying the Magic Sets technique to full SQL [Mumick and Pirahesh 1994].
FLORID [Frohn et al. 1998] is an object-oriented Datalog system developed in the 1990s at Freiburg University. It differs from all the aforementioned early systems in that it was based on the object-oriented model provided by F-logic (see Section 1.4.6) and not on the usual predicate syntax. It was thus one of the first systems that provided theoretically clean support for semi-structured data.21 It was also one of the first to support the well-founded semantics for negation (Section 1.4.1). In other aspects, FLORID’s evaluation strategy is bottom-up and it has a very flexible support for aggregation, which allows aggregations to be nested arbitrarily.
1.7 The Decline and Resurgence of Datalog
After its swift rise in the 1980s, interest in Datalog entered into a period of decline in 1990s and then started to rise again in 2000s. This section reviews some of the reasons for the decline and the subsequent resurgence.
1.7.1 The Decline
With the notable exception of XSB, the systems in the previous section are no longer supported. A limited form of linear recursion made it from Starburst into IBM’s DB2 product and into the SQL:1999 standard. Datalog and deductive databases did not emerge as a major commercial category. Research on Datalog ebbed in the 1990s. The reasons for the waning of interest are not completely clear, but we consider here some contributing factors. We were helped by comments from colleagues in the field (see Acknowledgments).
No “Killer Application”. A new technology is often propelled by a “killer application” that it enables for which existing technologies are ill-suited. For example, the World Wide Web led to major growth in the Internet (and network connectivity in the home). No such application emerged for Datalog—nobody seemed that interested in computing his or her ancestors. While there were problems that could not be handled directly by conventional relational DBMSs, there were often specific products in the market to solve those problems. For example, IBM’s COPICS product had a module that handled the Bill-of-Materials problem (traversing down the part-subpart hierarchy while aggregating numbers of parts, weight, total cost, and so forth), thus supporting a particular type of recursive query.
Theoretical and Practical Limitations. There were limitations in both evaluation technology and computer hardware that hampered the efficiency and applicability of early Datalog systems. While Magic Sets and similar approaches helped the performance of bottom-up evaluation, in the worst case they could cause an exponential expansion in program size, if the target predicates had many binding patterns. For top-down evaluation, while tabling techniques had been investigated, there were still issues with handling negation, incrementality, and the best way to schedule subgoals. On the hardware side, computers were limited in processor power and memory (4–64 MB of main memory was typical for desktop machines in the 1990s), which meant that Datalog programs on what today would be considered small amounts of data could easily be disk bound.
Loss of Mindshare by Logic Programming in General. Logic programming was being overshadowed by object-oriented languages, which were much better at