Michael Kifer

Declarative Logic Programming


Скачать книгу

graphics at a point where powerful desktop workstations were coming on the market. Prolog had modest UI capabilities, and most would not transfer into Datalog, so you could not write a complete application in it, in contrast to object-oriented languages such as Smalltalk and C++. Programmer productivity issues seemed focused on graphics and interactivity vs. sophisticated query and analysis, though there was still interest in expert systems and knowledge bases. Coverage of Datalog (and logic programming) in academic settings declined, meaning there were fewer programmers versed in logic languages as compared to object-oriented languages.

      Antipathy in the Database Systems Community. There was open antagonism by some in the commercial and systems research communities toward deductive database work. Some of that ill will seemed rooted in a vested stake in SQL orthodoxy (which also manifested in criticism of object-oriented databases). That sentiment might also have come from feelings that deductive database work was not addressing the most important issues in data management, such as extensibility and spatial access methods. The nature of some of the research contributions might have added to these perceptions. For example, there were many papers on proposed evaluation techniques, but a lot of them were not actually implemented, much less benchmarked for performance.

      Barriers to Entry in the Database Market. While Datalog systems could support more expressive querying than SQL-based DBMSs, the latter had many features that were often primitive or lacking in Datalog implementations, such as concurrency control, recovery, authorizations, cost-based optimization, indexing (and other physical design options), replication and back-up services, integrity constraints, and programming interfaces from multiple languages. While some Datalog implementations had some of these features, most database applications need all of them. Faced with a choice between recursive queries and the other features, developers chose the other features. Recursive queries could be implemented in a generalpurpose programming language making calls to the DBMS, whereas simulating the missing features in Datalog systems that did not have them was in most cases infeasible. Some of the features, such as concurrency and recovery, need quite sophisticated techniques to perform efficiently, and the investment costs to provide them to a Datalog system were likely beyond the means of most development projects. An alternative approach is to have a regular DBMS as a submodule of a Datalog system (and some systems followed this approach). Such a hybrid could provide most of the features mentioned, however, query optimization and evaluation was split across two components, limiting the performance of the resulting system. Datalog systems could not meet the minimum requirements to compete in the data-management market.

      Inaccessibility of the Literature. Some whom we consulted said they found the deductive database literature overly dry and complicated. Papers were not easy to understand, and often lacked compelling examples or applications. This aspect may have limited uptake of ideas into existing DBMSs, especially since papers did not say how to fit these approaches into current implementation technology. Also, while Datalog bore close resemblance to the domain relational calculus, leading relational languages, such as SQL and QUEL, were tuple-calculus based. While the difference may seem trivial, and although in many cases Datalog allowed more concise expression of queries, positional notation can be challenging and errorprone for predicates with dozens of attributes, compared to named notation.

      Contributions of Datalog. Even during its decline stage and before it started to surge again, Datalog had lasting effects on the contributing areas of logic programming, knowledge representation, and databases. In logic programming, it provided a “declarative reset,” and got people thinking again about how far one could go in logic languages without procedural or extra-logical features. For example, Datalog provided an arena to examine the recursion and negation, to find approaches that were more declarative than negation-as-failure. One approach was the Stable Model semantics, which led to Answer Set programming—the focus of much of current logic programming activity. In knowledge representation, it has influenced languages to deal with graph-based knowledge structures. For example, the core of the SPARQL language for RDF builds from Datalog-like goals over triples of the form (Subject, Predicate, Object). (SELECT queries in SPARQL have been shown equivalent to non-recursive safe Datalog with negation [Angles and Gutierrez 2008] and some implementation strategies for SPARQL are based on translation to Datalog [Polleres 2007].) In the database area, many SQL products now have support for certain forms of recursion, and some evaluation techniques for Datalog have been applied to non-recursive aspects of relational query languages.

      Another large area of contribution for Datalog was as a vehicle for understanding the complexity and expressiveness of database querying. Complexity refers to the amount of time or space to evaluate a query over a database, as a function of input size. The relevant input can be the data (for data complexity), the query (for query or expression complexity), or both (for combined complexity). The complexity is generally characterized by broad classes, such as logarithmic space (LOGSPACE) or polynomial time (PTIME). A classical result is that domain relational calculus (essentially non-recursive Datalog with negation) has LOGSPACE data complexity and PSPACE query complexity. (Domain relational calculus is PSPACE-complete, meaning—informally—that domain relational calculus evaluation is as hard as any problem in PSPACE.) Abiteboul et al. [1995] provide an introduction to such complexity concepts and results. We note that Datalog itself is PTIME-complete for data complexity and EXPTIME-complete for query and combined complexity [Immerman 1982, Vardi 1982].

      Datalog has helped the study of database query complexity (and decidability), particularly in terms of the complexity “consequences” of different language features and semantics. Compared to a syntactically complicated language such as SQL, language features can be more cleanly removed from or added to Datalog, such as recursion, negation, second-order variables, inequality literals, ordered domains, fixpoint operators, iteration, disjunction (in rule heads) and complex objects. It also readily admits alternative semantics, such as Well-Founded vs. Stable-Model semantics for negation. A survey by Dantsin et al. [2001] covers complexity of these variants and more. There is also work investigating restrictions of Datlog, such as linear recursion (at most one IDB goal per rule body) [Ullman and Van Gelder 1988] and limitation to a single rule [Gottlob and Papadimitriou 1999]. We also note that Datalog has also proved a useful tool in studying the complexity of non-database problems, such as constraint satisfaction [Feder and Vardi 1999].

      Work on expressiveness considers questions such as which queries cannot be expressed in a particular language, how two query languages are related in terms of the queries they can express, and how query languages relate to complexity classes. For example, it is not possible to express a transitive-closure query in non-recursive Datalog [Aho and Ullman 1979], linear Datalog is strictly less expressive than regular Datalog [Afrati and Cosmadakis 1989], Datalog with equality and inequality predicates is equivalent to fixpoints over positive existential queries [Chandra and Harel 1985], and Datalog with ordered structures (that is, with a built-in ordering predicate on the domain of values) exactly captures polynomial time [Papadimitriou 1985]. Such expressiveness results are not purely of theoretical interest—they can show that a language with better “programmability” can be substituted for another without diminishing the queries that can be expressed. For example, both Monadic Datalog (all head predicates have a single variable) and a natural subset of Elog (a language for web wrappers) are as expressive as Monadic Second-Order logic (logic with variables ranging over sets) on trees [Gottlob and Koch 2004]. Thus, for tasks such as information extraction from the Web, the former languages provide an easier-to-use alternative to the latter. The survey by Dantsin et al. [2001] also covers a wide range of expressiveness results.

       1.7.2 The Resurgence

      Work on Datalog always kept going at some level, even after the initial flurry of activity subsided. Beginning around 1998 there was renewed interest in Datalog, driven by its use to solve real problems rather than by intrinsic properties or theoretical developments (although there have been new extensions and evaluation techniques motivated by these applications). Some of this interest was due to the new (at the time) field of the Semantic Web and its focus on inference [Berners-Lee et al. 2001, Boley and Kifer 2013a, Decker et al. 1998, Guha et al. 1998]. In other cases, Datalog was being applied in settings where database technology was not commonly used, such as program analysis and networking. Thus, its adoption was