Robert Blanchard

Deep Learning for Computer Vision with SAS


Скачать книгу

to the rest of the body. Neurons have three parts: a cell body, dendrites, and axons. Inputs arrive in the dendrites (short branched structures) and are transmitted to the next neuron in the chain via the axons (a long, thin fiber). Neurons do not actually touch each other but communicate across the gap (called a synaptic gap) using neurotransmitters. These chemicals either excite the receiving neuron, making it more likely to “fire,” or they inhibit the neuron, making it less likely to become active. The amount of neurotransmitter released across the gap determines the relative strength of each dendrite’s connection to the receiving neuron. In essence, each synapse “weights” the relative strength of its arriving input. The synaptically weighted inputs are summed. If the sum exceeds an adaptable threshold (or bias) value, the neuron sends a pulse down its axon to the other neurons in the network to which it connects.

      A key discovery of modern neurophysiology is that synaptic connections are adaptable; they change with experience. The more active the synapse, the stronger the connection becomes. Conversely, synapses with little or no activity fade and, eventually, die off (atrophy). This is thought to be the basis of learning. For example, a study from the University of Wisconsin in 2015 showed that people could begin to “see” with their tongue. Attached to the electric grid was a camera that was fastened to the subject’s forehead. The subject was blindfolded. However, within 30 minutes, as their neurons adapted, subjects began to “see” with their tongue. Amazing!

      Although there are branches of neural network research that attempt to mimic the underlying biological processes in detail, most neural networks do not try to be biologically realistic.

      In a seminal paper with the rather understated title “A logical calculus of the ideas immanent in nervous activity,” McCulloch and Pitts (1943) gave birth to the field of artificial neural networks. The fundamental element of a McCulloch-Pitts network is called, unsurprisingly, a McCulloch-Pitts neuron. As in real neurons, each input (xi) is first weighted (wi) and then summed. To mimic a neuron’s threshold functionality, a bias value (w0) is added to the weighted sum, predisposing the neuron to either a positive or negative output value. The result is known as the neuron’s net input:

      Notice that this is the classic linear regression equation, where the bias term is the y-intercept and the weight associated with each input is the input’s slope parameter.

      The original McCulloch-Pitts neuron’s final output was determined by passing its net input value through a step function (a function that converts a continuous value into a binary output 0 or 1, or a bipolar output -1 or 1), turning each neuron into a linear classifier/discriminator. Modern neurons replace the discontinuous step function used in the McCulloch-Pitts neuron with a continuous function. The continuous nature permits the use of derivatives to explore the parameter space.

      The mathematical neuron is considered the cornerstone of a neural network. There are three layers in the basic multilayer perceptron (MLP) neural network:

      1. An input layer containing a neuron/unit for each input variable. The input layer neurons have no adjustable parameters (weights). They simply pass the positive or negative input to the next layer.

      2. A hidden layer with hidden units (mathematical neurons) that perform a nonlinear transformation of the weighted and summed input activations.

      3. An output layer that shapes and combines the nonlinear hidden layer activation values.

      A single hidden-layer multilayer perceptron constructs a limited extent region, or bump, of large values surrounded by smaller values (Principe et al. 2000). The intersection of the hyper-planes created by a hidden layer consisting of three hidden units, for example, forms a triangle-shaped bump.

      The hidden and output layers must not be connected by a strictly linear function in order to act as separate layers. Otherwise, the multilayer perceptron collapses into a linear perceptron. More formally, if matrix A is the set of weights that transforms input matrix X into the hidden layer output values, and matrix B is the set of weights that transforms the hidden unit output into the final estimates Y, then the linearly connected multilayer network can be represented as Y=B[A(X)]. However, if a single-layer weight matrix C=BA is created, exactly the same output can be obtained from the single-layer network—that is, Y=C(X).

      In a two-layer perceptron with k inputs, h1 hidden units in the first hidden layer, and h2 hidden units in the second hidden layer, the number of parameters to be learned is .

      The number 1 represents the biased weight W0 in the combination function of each neuron.

      Figure 1.1: Multilayer Perceptron

      Note: The “number of parameters” equations in this book assume that the inputs are interval or ratio level. Each nominal or ordinal input increases k by the number of classes in the variable, minus 1.

      The term deep learning refers to the numerous hidden layers used in a neural network. However, the true essence of deep learning is the methods that enable the increased extraction of information derived from a neural network with more than one hidden layer. Adding more hidden layers to a neural network provides little benefit without deep learning methods that underpin the efficient extraction of information. For example, SAS software has had the capability to build neural networks with many hidden layers using the NEURAL procedure for several decades. However, a case can be made to suggest that SAS has not had deep learning because the key elements that enable learning to persist in the presence of many hidden layers had not been discovered. These elements include the use of the following:

      ● activation functions that are more resistant to saturation than conventional activation functions

      ● fast moving gradient-based optimizations such as Stochastic Gradient Descent and ADAM

      ● weight initializations that consider the amount of incoming information

      ● new regularization techniques such as dropout and batch normalization

      ● innovations in distributed computing.

      The elements outlined above are included in today’s SAS software and are described below. Needless to say, deep learning has shown impressive promise in solving problems that were previously considered infeasible to solve.

      The process of deep learning is to formulate an outcome from engineering new glimpses of the input space, and then reengineering these engineered projections with the next hidden layer. This process is repeated for each hidden layer until the output layers are reached. The output layers reconcile the final layer of incoming hidden unit information to produce a set of outputs. The classic example of this process is facial recognition. The first hidden layer captures shades of the image. The next hidden layer combines the shades to formulate edges. The next hidden layer combines these edges to create projections of ears, mouths, noses, and other distinct aspects that define a human face. The next layer combines these distinct formulations to create a projection of a more complete human face. And so on. A brief comparison of traditional neural networks and deep learning is shown in Table 1.1.

      Table 1.1: Traditional Neural Networks versus Deep Learning

AspectTraditionalDeep Learning
Hidden activationfunction(s)Hyperbolic Tangent (tanh)Rectified Linear (ReLU)and other variants
Gradient-basedlearningBatch GD andBFGSStochastic GD,Adam, and LBFGS
Weight initializationConstant VarianceNormalized Variance
RegularizationEarly Stopping, L1,and L2Early Stopping, L1, L2,Dropout, and BatchNormalization
ProcessorCPUCPU or GPU

      Deep learning incorporates activation