mention of an entity may correspond to multiple candidate entities known from some other database such as Freebase. To determine which entity is correct, one may use the heuristic that “if the string of a mention is identical to the canonical name of an entity, then this mention is likely to refer to this entity.” In the relational probabilistic model, this may read as the quantified conditional probability statement:
Given such conditional probabilities, the marginal probability of every tuple in the database as well as all derived tuples such as entityMentioned can be inferred.
We can also model the relationship between learned features using rich linguistic features for trigger and argument detection and type labeling using weighted logical formulae (defined in Chapter 3) such as:
For each sentence, this assigns probability to the join of a word at some position in a sentence and the dependency type label that connects the word token to the argument token in the dependency parse tree with the trigger types and argument types of the two tokens. Here, triggerType denotes the prediction from a pre-trained support vector machine for triggering some information extracted from the text. This way, high-dimensional, sophisticated features become available to the relational probabilistic model. Moreover, as with Google’s Knowledge Vault [Dong et al., 2014], the parameters of the relational probabilistic model or even (parts) of its structure can be learned from data.
Example 1.4 Language modeling The aim of language modeling is to estimate a distribution over words that best represents the text of a corpus. This is central to speech recognition, machine translation, and text generation, among others, and the parameters of language models are commonly used as features or as initialization for other natural language processing approaches. Examples include the word distributions learned by probabilistic topic models, or the word embeddings learned through neural language models. In practice, however, the size of the vocabulary traditionally limited the distributions applicable for this task: specifically, one has to either resort to local optimization methods, such as those used in neural language models, or work with heavily constrained distributions. Jernite et al. [2015] overcame these limitations. They model the entire corpus as an undirected graphical model, whose structure is illustrated in Fig. 1.5. Because of parameter sharing in the model, each of the random variables is indistinguishable and by exploiting this symmetry, Jernite et al. derived an efficient approximation of the partition function using lifted variational inference with complexity independent of the length of the corpus.
Figure 1.5: Instance of the cyclic language model [Jernite et al., 2015] with added start and end < S > tokens based on two sentences. Each word of a sentence is a random variable and probabilistically depends on the words being at most two steps away. The model exhibits symmetries that are exploitable during parameter estimation using lifted probabilistic inference.
Figure 1.6: A Blocks World planning problem and a plan to solve the goal to have c clear.
Example 1.5 Planning under uncertainty The Blocks World, see Fig. 1.6, consists of a number of blocks stacked into towers on a table large enough to hold them all. The positioning of the towers on the table is irrelevant. The planning problem is to turn an initial state of the blocks into a goal state, by moving one block at a time from the top of a tower onto another tower or to the table. However, the effects of moving a block may not be perfectly predictable. For example, the gripper of a robot may be slippery so moving a block may not succeed. This uncertainty compounds the already complex problem of planning a course of action to achieve the goal.
The classical representation and algorithms for planing under uncertainty require the enumeration of the state space. Thus, although arguably a simple planning problem, we get already a rather large representation. For instance, if there are just 10 blocks, there are more than 50 million states. More importantly, we lose the structure implicit in the problem description. Decision-theoretic planning [Boutilier et al., 1999] generalized state-based planning under uncertainty to include features. But features are not enough to exploit all of the structure of the world. Each Blocks World planning instance is composed of these same basic individuals relations, but the instances differ in the arrangements of the blocks. We would like the agent to make use of this structure in terms of computing plans that apply across multiple scenarios. This can be achieved using relational models. For instance, the following parameterized plan, which says that we do nothing if block c is clear. If c is not clear than we move a clear block B that is above c to the floor, achieves the goal to have the block c clear
no matter how many blocks there are.
Example 1.6 Robotics As robots start to perform everyday manipulation tasks, such as cleaning up, setting a table or preparing simple meals, they must become much more knowledgeable than they are today. Typically, everyday tasks are specified vaguely and the robot must therefore infer what are the appropriate actions to take and which are the appropriate individuals involved in order to accomplish these tasks. These inferences can only be done if the robot has access to a compact and general knowledge base [Beetz et al., 2010, Tenorth et al., 2011]. StarAI provides the means for action-centered representation, for the automated acquisition of concepts through observation and experience, for reasoning about and managing uncertainty, and for fast inference.
These are only few examples for the many StarAI applications. It is clear that they all fit the definition of StarAI given above, that is, there are individuals that are connected to one another through relations and there is uncertainty that needs to be taken into account.
There are many more applications of StartAI. For instance, Getoor et al. [2001c] used statistical relational models to estimate the result size of complex database queries. Segal et al. [2001] employed probabilistic relational models for clustering gene expression data and to discover cellular processes from gene expression data [Segal et al., 2003]. Getoor et al. [2004] used probabilistic relational models to understand tuberculosis epidemiology. McGovern et al. [2003] estimated probabilistic relational trees to discover publication patterns in high-energy physics. Neville et al. [2005] used probabilistic relational trees to learn to rank brokers with respect to the probability that they would commit a serious violation of securities regulations in the near future. Anguelov et al. [2005] used relational Markov networks for segmentation of 3D scan data. Relational Markov networks have also been used to compactly represent object maps and to estimate trajectories of people by Limketkai et al. [2005]. Kersting et al. [2006] employed relational hidden Markov models for protein fold recognition. Singla and Domingos [2006] proposed a Markov logic model for entity resolution (ER), the problem of finding records corresponding to the same real-world entity. Markov logic has also been used for joint inference for event extraction [Poon and Domingos, 2007, Riedel and McCallum, 2011]. Poon and Domingos [2008] showned how to use Markov logic to perform joint unsupervised coreference resolution. Nonparametric relational models have been used for analyzing social networks [Xu et al., 2009a]. Kersting and Xu [2009] and Xu et al. [2010] used relational Gaussian processes for learning to rank search results. Poon and Domingos [2009] showned how to perform unsupervised semantic parsing using Markov logic networks, and Davis and Domingos [2009] used MLNs to successfully transfer learned knowledge among molecular biology, social network and Web domains. Yoshikawa et al. [2009] used Markov logic for identifying temporal relations in text, Meza-Ruíz and Riedel [2009] for semantic role labeling, and Kiddon and Domingos [2011] for biomolecular