Vivienne Sze

Efficient Processing of Deep Neural Networks


Скачать книгу

spatial height/width R/S Filter spatial height/width (= H/W in FC) P/Q Ofmap spatial height/width (= 1 in FC)

image

      o, i, f, and b are the tensors of the ofmaps, ifmaps, filters, and biases, respectively. U is a given stride size.

      Figure 2.2b shows a visualization of this computation (ignoring biases). As much as possible, we will adhere to the following coloring scheme in this book.

      • Blue: input activations belonging to an input feature map.

      • Green: weights belonging to a filter.

      • Red: partial sums—Note: since there is no formal term for an array of partial sums, we will sometimes label an array of partial sums as an output feature map and color it red (even though, technically, output feature maps are composed of activations derived from partial sums that have passed through a nonlinear function and therefore should be blue).

image image

      In this calculation, each output at a point (n, m, p, q) is calculated as a dot product taken across the index variables c, r, and s of the specified elements of the input activation and filter weight tensors. Note that this notation attaches no significance to the order of the index variables in the summation. The relevance of this will become apparent in the discussion of dataflows (Chapter 5) and mapping computations onto a DNN accelerator (Chapter 6).

      Finally, to align the terminology of CNNs with the generic DNN,

      • filters are composed of weights (i.e., synapses), and

      • input and output feature maps (ifmaps, ofmaps) are composed of input and output activations (partial sums after application of a nonlinear function) (i.e., input and output neurons).

image

      Figure 2.3: Fully connected layer from convolution point of view with H = R, W = S, P = Q = 1, and U = 1.

      In an FC layer, every value in the output feature map is a weighted sum of every input value in the input feature map (i.e., it is fully connected). Furthermore, FC layers typically do not exhibit weight sharing and as a result the computation tends to be memory-bound. FC layers are often processed in the form of a matrix multiplication, which will be explained in Chapter 4. This is the reason while matrix multiplication is often associated with DNN processing.

      An FC layer can also be viewed as a special case of a CONV layer. Specifically, a CONV layer where the filters are of the same size as the input feature maps. Therefore, it does not have the local, sparsely connected with weight sharing property of CONV layers. Therefore, Equation (2.1) still holds for the computation of FC layers with a few additional constraints on the shape parameters: H = R, W = S, P = Q = 1, and U = 1. Figure 2.3 shows a visualization of this computation and in the tensor index notation from Section 2.3.1 it is:

image image

      Figure 2.4: Various forms of nonlinear activation functions. (Figure adapted from [62].)

      Reducing the spatial resolution of a feature map is referred to as pooling or more generically downsampling. Pooling, which is applied to each channel separately, enables the network to be robust and invariant to small shifts and distortions. Pooling combines, or pools,