Salman Khan

A Guide to Convolutional Neural Networks for Computer Vision


Скачать книгу

The entropy of Ql is given by

Image

      where pi is the number of samples of class i in Ql divided by Ql. The feature and the associated threshold which maximize the gain are selected as the splitting test for that node

Image

      Entropy and Information Gain: The entropy and information gain are two important concepts in the RDF training process. These concepts are usually discussed in information theory or probability courses and are briefly discussed below.

      Information entropy is defined as a measure of the randomness in the information being processed. More precisely, the higher the entropy, the lower the information content. Mathematically, given a discrete random variable X with possible values {x1 …, xn} and a probability mass function P(X), the entropy H (also called Shannon entropy) can be written as follows:

      For example, an action such as flipping a coin that has no affinity for “head” or “tail,” provides information that is random (X with possible values of {“head,” “tail”}). Therefore, Eq. (2.26) can be written as follows:

      As shown in Fig. 2.9, this binary entropy function (Eq. (2.27)) reaches its maximum value (uncertainty is at a maximum) when the probability is Image, meaning that P(X = “head”) = Image or similarly P(X = “tail”) = 2Image. The entropy function reaches its minimum value (i.e., zero) when probability is 1 or 0 with complete certainty P(X = “head” = 1) or P(X = “head” = 0) respectively.

      Information gain is defined as the change in information entropy H from a prior state to a state that takes some information (t) and can be written as follows:

Image

      If a partition consists of only a single class, it is considered as a leaf node. Partitions consisting of multiple classes are further partitioned until either they contain single classes or the tree reaches its maximum height. If the maximum height of the tree is reached and some of its leaf nodes contain labels from multiple classes, the empirical distribution over the classes associated with the subset of the training samples v, which have reached that leaf, is used as its label. Thus, the probabilistic leaf predictor model for the t-th tree is pt (c|v), where c ∈ {ck} denotes the class.

       Classification

      Once a set of decision trees have been trained, given a previously unseen sample xj, each decision tree hierarchically applies a number of predefined tests. Starting at the root, each split node applies its associated split function to xj. Depending on the result of the binary test the data is sent to the right or the left child. This process is repeated until the data point reaches a leaf node. Usually the leaf nodes contain a predictor (e.g., a classifier) which associates an output (e.g., a class label) to the input xj. In the case of forests, many tree predictors are combined together to form a single forest prediction:

      Figure 2.9: Entropy vs. probability for a two class variable.

Image

      where T denotes the number of decision trees in the forest.

      Traditional computer vision systems consist of two steps: feature design and learning algorithm design, both of which are largely independent. Thus, computer vision problems have traditionally been approached by designing hand-engineered features such as HOG [Triggs and Dalal, 2005], SIFT [Lowe, 2004], and SURF [Bay et al., 2008] that lack in generalizing well to other domains, are time consuming, expensive, and require expert knowledge on the problem domain. These feature engineering processes are usually followed by learning algorithms such as SVM [Cortes, 1995] and RDF [Breiman, 2001, Quinlan, 1986]. However, progress in deep learning algorithms resolves all these issues, by training a deep neural network for feature extraction and classification in an end-to-end learning framework. More precisely, unlike traditional approaches, deep neural networks learn to simultaneously extract features and classify data samples. Chapter 3 will discuss deep neural networks in detail.

Image

      Figure 2.10: RDF classification for a test sample xj. During testing the same test sample is passed through each decision tree. At each internal node a test is applied and the test sample is sent to the appropriate child. This process is repeated until a leaf is reached. At the leaf the stored posterior pt (c|xj) is read. The forest class posterior p(c|xj) is the average of all decision tree posteriors.

      CHAPTER 3

       Neural Networks Basics

       3.1 INTRODUCTION

      Before going into the details of the CNNs, we provide in this chapter an introduction to artificial neural networks, their computational mechanism, and their historical background. Neural networks are inspired by the working of cerebral cortex in mammals. It is important to note, however, that these models do not closely resemble the working, scale and complexity of the human brain. Artificial neural network models can be understood as a set of basic processing units, which are tightly interconnected and operate on the given inputs to process the information and generate desired outputs. Neural networks can be grouped into two generic categories based on the way the information is propagated in the network.

      • Feed-forward networks

      The information flow in a feed-forward network happens only in one direction. If the network is considered as a graph with neurons as its nodes, the connections between the nodes are such that there are no loops or cycles in the graph. These network architectures can be referred as Directed Acyclic Graphs (DAG). Examples include MLP and CNNs, which we will discuss in details in the upcoming sections.

      • Feed-back networks

      As the name implies, feed-back networks have connections which form directed cycles (or loops). This architecture allows them to operate on and generate sequences of arbitrary sizes. Feed-back networks exhibit memorization ability and can store information and sequence relationships in their internal memory. Examples of such architectures include Recurrent Neural Network (RNN) and Long-Short Term Memory (LSTM).

      We provide an example architecture for both feed-forward and feed-back networks in Sections