Vivienne Sze

Efficient Processing of Deep Neural Networks


Скачать книгу

or rules designed by experts.

      The superior accuracy of DNNs, however, comes at the cost of high computational complexity. To date, general-purpose compute engines, especially graphics processing units (GPUs), have been the mainstay for much DNN processing. Increasingly, however, in these waning days of Moore’s law, there is a recognition that more specialized hardware is needed to keep improving compute performance and energy efficiency [11]. This is especially true in the domain of DNN computations. This book aims to provide an overview of DNNs, the various tools for understanding their behavior, and the techniques being explored to efficiently accelerate their computation.

      In this section, we describe the position of DNNs in the context of artificial intelligence (AI) in general and some of the concepts that motivated the development of DNNs. We will also present a brief chronology of the major milestones in the history of DNNs, and some current domains to which it is being applied.

      DNNs, also referred to as deep learning, are a part of the broad field of AI. AI is the science and engineering of creating intelligent machines that have the ability to achieve goals like humans do, according to John McCarthy, the computer scientist who coined the term in the 1950s. The relationship of deep learning to the whole of AI is illustrated in Figure 1.1.

image

      Figure 1.1: Deep learning in the context of artificial intelligence.

      Within AI is a large sub-field called machine learning, which was defined in 1959 by Arthur Samuel [12] as “the field of study that gives computers the ability to learn without being explicitly programmed.” That means a single program, once created, will be able to learn how to do some intelligent activities outside the notion of programming. This is in contrast to purpose-built programs whose behavior is defined by hand-crafted heuristics that explicitly and statically define their behavior.

      The advantage of an effective machine learning algorithm is clear. Instead of the laborious and hit-or-miss approach of creating a distinct, custom program to solve each individual problem in a domain, a single machine learning algorithm simply needs to learn, via a process called training, to handle each new problem.

      Within the machine learning field, there is an area that is often referred to as brain-inspired computation. Since the brain is currently the best “machine” we know of for learning and solving problems, it is a natural place to look for inspiration. Therefore, a brain-inspired computation is a program or algorithm that takes some aspects of its basic form or functionality from the way the brain works. This is in contrast to attempts to create a brain, but rather the program aims to emulate some aspects of how we understand the brain to operate.

      Although scientists are still exploring the details of how the brain works, it is generally believed that the main computational element of the brain is the neuron. There are approximately 86 billion neurons in the average human brain. The neurons themselves are connected by a number of elements entering them, called dendrites, and an element leaving them, called an axon, as shown in Figure 1.2. The neuron accepts the signals entering it via the dendrites, performs a computation on those signals, and generates a signal on the axon. These input and output signals are referred to as activations. The axon of one neuron branches out and is connected to the dendrites of many other neurons. The connections between a branch of the axon and a dendrite is called a synapse. There are estimated to be 1014 to 1015 synapses in the average human brain.

image

      Figure 1.2: Connections to a neuron in the brain. xi, wi, f (.), and b are the activations, weights, nonlinear function, and bias, respectively. (Figure adapted from [4].)

      A key characteristic of the synapse is that it can scale the signal (xi) crossing it, as shown in Figure 1.2. That scaling factor can be referred to as a weight (wi), and the way the brain is believed to learn is through changes to the weights associated with the synapses. Thus, different weights result in different responses to an input. One aspect of learning can be thought of as the adjustment of weights in response to a learning stimulus, while the organization (what might be thought of as the program) of the brain largely does not change. This characteristic makes the brain an excellent inspiration for a machine-learning-style algorithm.

image

      Figure 1.3: Simple neural network example and terminology. (Figure adapted from [4].)

      Figure 1.3a shows a diagram of a three-layer (non-biological) neural network. The neurons in the input layer receive some values, compute their weighted sums followed by the nonlinear function, and propagate the outputs to the neurons in the middle layer of the network, which is also frequently called a “hidden layer.” A neural network can have more than one hidden layer, and the outputs from the hidden layers ultimately propagate to the output layer, which computes the final outputs of the network to the user. To align brain-inspired terminology with neural networks, the outputs of the neurons are often referred to as activations, and the synapses are often referred to as weights, as shown in Figure 1.3a. We will use the activation/weight nomenclature in this book.

image

      Figure 1.4: Example of image classification using deep neural networks. (Figure adapted from [15].) Note that the features go from low level to high level as we go deeper into the network.

      Figure 1.3b shows an example of the computation at layer 1: