because probability seemed to rely on too many numbers and did not deal with the complexities of a world of individuals and things. But the use of probabilistic graphical models, exploiting probabilistic independencies, has revolutionized AI. The independencies specified in such models are natural, provide structure that enables efficient reasoning and learning, and allow one to model complex domains. Many AI problems arising in a wide variety of fields such as machine learning, diagnosis, network communication, computer vision, and robotics have been elegantly encoded and solved using probabilistic graphical models.
Meanwhile, there have also been considerable advances in logical AI, where agents reason about the structure of complex worlds. One aspect of this is in the semantic web and the use of ontologies to represent meaning in diverse fields from medicine to geology to the products in a catalogue. Generally, there is an explosive growth in the amount of heterogeneous data that is being collected in the business and scientific world. Example domains include biology and chemistry, transportation systems, communication networks, social networks, and robotics. Like people, intelligent agents should be able to deal with many different types of knowledge, requiring structured representations that give a more informative view of the AI task at hand.
Moreover, reasoning about individuals and relations is all about reasoning with regularities and symmetries. We lump individuals into categories or classes (such as “person” or “course”) because the individuals in a category share common properties—e.g., there are statements that are true about all living people such as they breath, they have skin and two biological parents. Similarly for relations, there is something in common between Sam being advised by Professor Smith and Chris being advised by Professor Johnson; there are statements about publishing papers, working on a thesis and projects that are common among the “advised by” relationships. We would like to make predictions about two people about whom all we know may be only their advisory relationships. It is these commonalities and regularities that enable language to describe the world. Reasoning about regularities and symmetries is the foundation of logics built on the predicate calculus, which allows statements about all individuals.
Figure 1.1: Statistical Relational Artificial Intelligence (StarAI) combines probability, logic, and learning and covers major parts of the AI spectrum.
Thus, to deal with the real world we actually need to exploit uncertainty, independencies, and symmetries and tackle a long standing goal of AI, namely unifying first-order logic—capturing regularities and symmetries—and probability—capturing uncertainty and independence. Predicate logic and probability theory are not in conflict with each other, they are synergistic. Both extend propositional logic, one by adding relations, individuals, and quantified variables, the other by allowing for measures over possible worlds and conditional queries. This may explain why there has been a considerable body of research in combining both of them over the last 25 years, evolving into what has come to be called Statistical Relational Artificial Intelligence (StarAI); see also Fig. 1.1:
the study and design of intelligent agents that act in worlds composed of individuals (objects, things), where there can be complex relations among the individuals, where the agents can be uncertain about what properties individuals have, what relations are true, what individuals exist, whether different terms denote the same individual, and the dynamics of the world.
The basic building block of StarAI are relational probabilistic models—we use this term in the broad sense, meaning any models that combine relations and probabilities. They can be seen as combinations of probability and predicate calculus that allow for individuals and relations as well as probabilities. In building on top of relational models, StarAI goes far beyond reasoning, optimization, learning, and acting optimally in terms of a fixed number of features or variables, as it is typically studied in machine learning, constraint satisfaction, probabilistic reasoning, and other areas of AI. In doing so, StarAI has the potential to make AI more robust and efficient.
This book aims to provide an introduction that can help newcomers to the field get started, understand the state-of-the-art and the current challenges, and be ready for future advances. It reviews the foundations of StarAI, motivates the issues, justifies some choices that have been made, and provides some open problems. Laying bare the foundations will hopefully inspire others to join us in exploring the frontiers and the yet unexplored areas.
The target audience for this book consists of advanced undergraduate and graduate student as well as researchers and practitioners who want to get an overview of the basics and the state-of-the-art in StarAI. To this aim, Part I starts with providing the necessary background in probability and logic. We then discuss the representations of relational probability models and the underlying issues. Afterward, we focus first on inference, in Part II, and then on learning, in Part III. Finally, we touch upon relational tasks that go beyond the basic probabilistic inference and learning tasks as well as some open issues.
Researchers who are already working on StarAI—we apologize to anyone whose work we are accidentally not citing—may enjoy reading about parts of StarAI with which they are less familiar.
1.2 CHALLENGES OF UNDERSTANDING STARAI
Since StarAI draws upon ideas developed within many different fields, it can be quite challenging for newcomers to get started.
One of the challenges of building on top of multiple traditions is that they often use the same vocabulary to mean different things. Common terms such as “variable,” “domain,” “object,” “relation,” and “parameter” have come to have accepted meanings in mathematics, computing, statistics, logic, and probability, but their meanings in each of these areas is different enough to cause confusion. We will be clear about the meaning of these when using them. For instance, we follow the logic tradition and use the term “individuals” for things. They are also called “objects,” but that terminology is often confusing to people who have been brought up with object-oriented programming, where objects are data structures and associated methods. For example, a person individual is a real person and not a data structure that encapsulates information about a person. A computer is not uncertain about its own data structures, but can be uncertain about what exists and what is true in the world.
Another confusion stems from the term “relational.” Most existing datasets are, or can be, stored in relational databases. Existing machine learning techniques typically learn from datasets stored in relational datasets where the values are Boolean, discrete, ordinal, or continuous. However, in many datasets the values are the names of individuals, as in the following example.
Figure 1.2: An example dataset that is not amenable to traditional classification.
Example 1.1 Consider learning from the dataset in Fig. 1.2. The values of the Student and the Course attributes are the names of the students (s1, s2, s3 and s4) and the courses (c1, c2, c3 and c4). The value of the grade here is an ordinal (a is better than b which is better than c). Assume that the task is to predict the grade of students on courses, for example predicting the grade of students s3 and s4 on course c4. There is no information about course c4, and students s3 and s4 have the same average (they both have one “b”); however, it is still possible to predict that one will do better than the other in course c4. This can be done by learning how difficult each