Computational Statistics in Data Science. Группа авторов. Читать онлайн. Hotlib. HOTLIB.NET

Computational Statistics in Data Science

Max Pooling and Average Pooling. The Max Pooling layer returns the maximum value from the portion of the image covered by the kernel matrix. The Average Pooling layer returns the average of all values covered by the kernel matrix. The convolution and pooling process can be repeated by adding additional convolutional and pooling layers. Deep convolutional networks have been successfully trained and used in image classification problems.

Figure 2 Convolution operation with stride size

4.2 Convolutional Layer

The convolution operation is illustrated in Figure 2. The weight matrix of the convolutional layer is usually called the kernel matrix. The kernel matrix ( bold-italic upper W element-of double-struck upper R Superscript d times d ) shifts over the input matrix and performs elementwise multiplication between the kernel matrix () and the covered portion of the input matrix ( bold-italic upper X element-of double-struck upper R Superscript n times m ), resulting in a feature matrix ( bold-italic h element-of double-struck upper R Superscript left-parenthesis n minus d plus 1 right-parenthesis times left-parenthesis m minus d plus 1 right-parenthesis ). The stride of the kernel matrix determines the amount of movement in each step. In the example in Figure 2, the stride size is 1, so the kernel matrix moves one unit in each step. In total, the kernel matrix shifts 9 times, resulting in a 3 times 3 feature matrix. The stride size does not have to be 1, and a larger stride size means fewer shifts.

Another commonly used structure in a CNN is the pooling layer, which is good at extracting dominant features from the input. Two main types of pooling operation are illustrated in Figure 3. Similar to a convolution operation, the kernel shifts over the input matrix with a specified stride size. If Max Pooling is applied to the input, the maximum of the covered portion will be taken as the result. If Average Pooling is applied, the mean of the covered portion will be calculated and taken as the result. The example in Figure 3 shows the result of pooling with kernel size that equals 2 times 2 and stride that equals 1 on a 3 times 3 input matrix.

4.3 LeNet‐5

LeNet‐5 is a CNN introduced by LeCun et al. [8]. This is one of the earliest structures of CNNs and was initially introduced to do handwritten digit recognition on the MNIST dataset [9]. The structure is straightforward and simple to understand, and details are shown in Figure 4.

The LeNet‐5 architecture consists of seven layers, where three are convolutional layers, two are pooling layers, and two are fully connected layers. LeNet‐5 takes images of size 32 times 32 as input and outputs a 10‐dimensional vector as the predict scores for each class.

Figure 3 Pooling operation with stride size

Figure 4

LeNet‐5 of LeCun et al. [8].

Source: Modified from LeCun et al. [8].

The first layer (C1) is a convolutional layer, which consists of six kernel matrices of size 5 times 5 and stride 1. Each of the kernel matrices will scan over the input image and produce a feature matrix of size 28 times 28 . Therefore, six different kernel matrices will produce six different feature matrices. The second layer (S2) is a Max Pooling layer, which takes the matrices as input. The kernel size of this pooling layer is 2 times 2 , and the stride size is 2. Therefore, the outputs of this layer are six 14 times 14 feature matrices.

Table 1 Connection between input and output matrices in the third layer of LeNet‐5 [8].

Source: LeCun et al. [8].

	Indices of output matrices

1	1	5	6	7	10	11	12	13	15	16
2	1	2	6	7	8	11	12	13	14	16
3	1	2	3	7	8	9	12	14	15	16
4	2	3	4	7	8	9	10	13	15	16
5	3	4 Скачать книгу В начало < 32 33 34 35 36 37 38 39 40 41 > В конец e-mail: [email protected]