dependencies (EGDs) [Beeri and Vardi 1981].10
What is the meaning of such rules? As with regular Datalog, we can start by deriving more facts bottom-up, as explained in Section 1.1. The first bottom-up application of the rules above would derive these facts:
The little twist here with respect to the ordinary bottom-up strategy is that rules 2 and 3 have existential variables in the head and they must be bound to completely new constants that do not appear elsewhere in the database or in the rules. In the above, we denoted these constants with o1–o4
. In databases, such new constants are usually called “nulls” and in logic “Skolem constants.” We assume that each head-existential gives us a unique new constant, since there is no reason to think that any of these constants might refer to the same entity. (There are also good theoretical reasons behind this choice, of course—see, for example, the work of Calì et al. [2013] and Deutsch et al. [2008].) We will further derive that each of these new entities is a human and therefore has an unknown parent and a mother:
and so on. The resulting set of facts can (and, in this example, will) be infinite because this process will keep inventing new nulls. What happens if we use mary
and bob
in rules 2 and 3 again? In case of rule 2, we will derive a pair of new entities, o1’
and o2’
, as the second parents for mary
and bob
(and then the third, the fourth, and so on). There is no reason to assume that these new constants refer to the same entities as o1
and o2
. Things are different in case of rule 3, however. Here we derive entities o3’
and o4’
, but then the EGD rule 8 says that o3’
should be the same as o3
(since both are mothers of mary
, and EGD rule 8 says that each human has only one mother) and o4’
must be the same as o4
(for a similar reason: bob
can have only one mother). As a result, o3’
and o4’
are renamed into o3
and o4
, respectively, which turns the newly derived facts mother(mary, o3’)
and mother(bob, o4’)
into the “old” facts mother(mary, o3)
and mother(bob, o4)
, making the new facts duplicate and redundant.
The set of facts thus computed satisfies all the rules and is known as a universal model of these rules. It has a homomorphism into any other model of these rules and all universal models are isomorphic to each other [Deutsch et al. 2008, Maier et al. 1979]. (They are the same up to a renaming of the null constants.) Query answering in a database defined by these rules and facts is taken to mean query answering in the universal model. Usually, only the answers that do not contain null values are considered meaningful.
The bottom-up process just described is known as the chase [Deutsch et al. 2008, Maier et al. 1979] because it “chases” after a database instance that satisfies the given rules. It is also sometimes called the oblivious chase, because it keeps rederiving new existentials, as in our case of the parents of mary
and bob
, and does not try to remember which existentials were used previously. The chase was originally proposed as a proof procedure for deriving implications of data dependencies [Maier et al. 1979], but later found other applications, including query containment and optimization [Johnson and Klug 1984] and data exchange [Deutsch et al. 2008].
There is also a restricted chase, which does not rederive existential constants such as o3’
and o4’
in the example above. The restricted chase is not guaranteed to compute the entire universal model. However, sometimes it is enough to compute a subset of that model, via the restricted chase, in order to have a decision procedure for query answering. These decidable cases are obtained via various syntactic restrictions on the appearance of variables in the rules [Calautti et al. 2015, Calì et al. 2013, Gottlob et al. 2013]. The key idea (which dates back to Johnson and Klug [1984]) is that, although the chase may be infinite, various restrictions may reduce the problem of query answering and containment over the infinite chase to a similar problem over a bounded finite prefix of that chase.
Note that although the chase, TGDs, and EGDs have been around since the late 1970s, they were used mostly for query processing and not as a language for knowledge representation. Datalog± as a language was introduced only in 2011 [Calì et al. 2011]. Decidable cases of Datalog± are implemented in a number of prototypes, such as Alaska [da Silva et al. 2012], DLV∃ [Alviano et al. 2012], and IRIS±.11 A partial implementation of Datalog± is provided in the Ergo knowledge-representation and reasoning system [Coherent Knowledge LLC 2017]. It does not rely on the chase to ensure termination and does not take advantage of the aforementioned decidable cases, however. Instead, it uses a form of bounded rationality called radial restraint [Grosof and Swift 2013]. Ergo’s semantics for head-existentials is also slightly different and corresponds to the aforementioned restricted chase. In fact, the restricted chase is equivalent to replacing head-existentials with Skolem functions and, in that sense, any Datalog system that permits function symbols in the rule heads, such as XSB [Swift and Warren 2011] or LogicBlox [Aref et al. 2015], can be said to partially implement Datalog±.
1.4.5 Typing and Constraints
Most computer languages have some notion of typing or constraints (or both). In databases, these aspects both manifest in the database schema, which supports the declaration of domains (datatypes) for attributes, primary keys (uniqueness constraints) on single tables, and foreign keys (referential integrity) between pairs of tables. For example, for the tables
the database schema might declare that Univ is a character string, Year is a 4-digit number, ID is a key for person
, and every PID value in the thesis
table should have a matching ID value in the table person
, among other things. Most programming languages will associate types with program variables—implicitly or explicitly—and use those types to determine that expressions are well formed. For instance, given that the type of variable Title
is a character string, then a compiler can determine that the expression
is well formed, whereas
is not. One might say that a database schema protects data from “wrong code,” while programming-language types protect code from “wrong data.” Constraints and typing information can also improve execution times, for example, by simplifying queries or limiting the bindings that need to be considered for a given variable.
Datalog as a language has neither constraints nor types. However, we can consider integrity constraints as an alternative kind of axiom in a deductive database. So far, axioms have been interpreted as deductive rules: the EDB should be augmented so that resulting EDB plus IDB satisfies the axioms. But an axiom can also serve as an integrity constraint—it prohibits models where the axiom is not satisfied [Nicolas and Yazdanian 1977]. For example, if we want to require that the first argument of the area
predicate is a unique key, we could use the axiom
This constraint applies to an EDB table, but there can be constraints on IDB predicates as well. For example, if we wanted to prohibit cycles in the advised
predicate, we can define the ancestor
predicate as
and then say an advisee cannot be an ancestor of his or her advisor.