It assumes that the world consists of individuals, among which various relations hold or do not hold. Logic provides a semantics, a specification of meaning, which agent designers can then choose to implement in various ways.
Both probability and predicate calculus can be seen as extending propositional logic, one by adding measures over the worlds, and the other by extending the propositional calculus to include individuals and relations. Since, statistical relational AI (StarAI) aims to seamlessly combine these two strands of research, we will first introduce each of these separately before moving on to presenting their possible integrations.
2.1 PROBABILISTIC GRAPHICAL MODELS
The semantics of probability [Halpern, 1990, 2003] can be defined in terms of possible worlds and random variables (although they are neither random nor variable). We can either define random variables in terms of possible worlds (a random variable is a function on possible worlds) or define possible worlds in terms of random variables (a possible world is an assignment of a value to every random variable). Either way, a random variable has a value in every world. A random variable having a particular value is a proposition. Probability is defined in terms of a non-negative measure over sets of possible worlds that follow some intuitive axioms (see, e.g., [Halpern, 2003]). This means that a probabilistic graphical model defines a probability distribution over its possible worlds, or equivalently a joint probability distribution over the propositions, or the assignments of values to the random variables.
In Bayesian probability, we make explicit assumptions and the conclusions are consequences of the specified knowledge and assumptions. The assumptions are about the space of possible hypotheses and the prior probabilities. One particular explicit assumption is the assumption of conditional independence. graphical models [Pearl, 1988, Lauritzen, 1996, Koller and Friedman, 2009] provide representations of such conditional independence of random variables in terms of graphs. A Bayesian network [Pearl, 1988, Darwiche, 2009] is an acyclic directed graphical model of probabilistic dependence where the nodes are random variables. A Bayesian network encapsulates a directed form of independence between the random variables in the graph: a variable is conditionally independent of its non-descendants given its parents. This has turned out to be a very useful assumption in practice. A Bayesian network requires a representation of the conditional probability of each variable given its parents.
Undirected graphical models, or Markov networks, [Pearl, 1988] are defined in terms of factors (or potentials) among random variables. A factor represents a function of a set of random variables; for example, a factor on variables {X, Y, Z} is a function that given a value for each variable, returns a non-negative number. The nodes in a Markov network correspond to random variables, and the variables in a factor are neighbors of each other in the graph. These models encapsulate the assumption that a variable is independent of other variables given all of its neighbors. The next two sections give more details.
2.1.1 BAYESIAN NETWORKS
Formally, a Bayesian network (or belief network) is an augmented, acyclic directed graph (DAG), where each node corresponds to a random variable Xi and each edge indicates a direct influence among the random variables. The influence for each variable Xi is quantified with a conditional probability distribution (CPD) P(Xi | Pa(Xi)), where Pa(Xi) are the parents of Xi in the graph. The Bayesian network represents a set of independence assumptions: a variable is independent of the other variables that are not its descendents given its parents.
Example 2.1 As an example of a Bayesian network, consider Judea Pearl’s famous burglary alarm network. The Bayesian network in Fig. 2.1 contains the random variables burglary, earthquake, mary_calls, john_calls, and alarm. The network specifies that burglary and earthquake are independent, and that john_calls is independent of the other variables given alarm. Assume that the random variables are all Boolean, i.e., they can have the range {true, false}, then the Bayesian network defines a probability distribution over truth-assignments to {alarm, earthquake, mary_calls, john_calls, burglary}. To do so, it makes use of CPDs associated to each of the nodes are specified in Table 2.1. They include the CPDs P(alarm | burglary, earthquake), P(earthquake), etc.
Figure 2.1: A Bayesian network for burglary alarms (adapted from Judea Pearl). Nodes denote random variables and edges denote direct influences among the random variables.
Table 2.1: Conditional probability distributions associated with the nodes in the burglary alarm network, cf. Fig. 2.1
The Bayesian network thus has two components: a qualitative one—the directed acyclic graph—and a quantitative one—the conditional probability distributions. Together they specify the joint probability distribution.
Because of the conditional independence assumption, we can write down the joint probability density as
The independence assumption, which states that each variable is independent of its non-descendents given its parents, implies other independencies. These are axiomatized by d-separation [Pearl, 1988], which allows to infer all independencies that are implied from the independence assumptions of Bayesian networks.
2.1.2 MARKOV NETWORKS AND FACTOR GRAPHS
Conceptually, a Markov random field [Pearl, 1988] is an undirected analogue of a Bayesian network. A Markov network is defined in terms of a set of discrete-valued random variables, X = {X1, X2, …, Xn} and a set of factors {f1, …, fm}, where a factor is a non-negative function of a subset of the variables.
A Markov network represents the joint distribution over X as proportional to the product of the factors, i.e.,
where fk(Xk) is a factor on Xk ⊆ X, and xk is x projected onto Xk. Z is a normalization constant known as the partition function.
As long as the factors are all non-zero (they always return a positive number), the distribution can be represented as a log-linear model:
where the factors gk(·) are functions of (a subset of) the configuration x, and wi are real-valued weights.
A Markov network is a graphical representation of a Markov random field where the nodes are the random variables and there is an arc between any two variables that are in a factor together.
As for Bayesian networks, the structure of a Markov network encodes conditional independencies. Basically, two nodes X and Y in a network are conditionally independent given variables Z1, ldots, Zn (i.e., P(X | Y, Z1, …, Zn) = P(X | Z1, …, Zn)) if and only if all paths from X to Y contain one of the variables Zi. The Markov network for the alarm example is illustrated in