et al. [19, 48] for AMC. The proposed network comprises of five convolutional (Conv) layers and three fully connected (FC) dense layers including the output layer. The input size of TF spectral image is 128 × 128 that fed to the first Conv layer (Conv1) of the model. CNN has 128 filters, each of size 5 × 5. Activation function used is rectified linear unit (ReLU). To keep the output of the first layer same as the size of input image, appropriate zero padding is employed. All next Conv layers up to fourth are designed same as the first layer. The fifth Conv layer differs only with filter size, i.e., size is 7 × 7 for CNN. The sixth and seventh layers are FC dense layers, each having 256 neurons and ReLU as the activation function. The output layer is also FC dense layer with a number of neurons equal to output classes, and SoftMax as the activation function. Average pooling of stride 4 × 4 is implemented after Conv1 and Conv2 layer. A stride of 4 × 2 is carried out subsequently to Conv3 and Conv4 layers, respectively. No pooling is performed after the Conv5.
The main objective of 2D filters is to let the kernels to familiarize to I and Q data separately. Here, most of the layer’s functionality set almost same as in [19, 48]. The filter size and pooling layers are optimized for our preprocessed input unit based on the trial-and-error method. Adam optimizer used stochastic gradient instead for updating weights as similar to [48], that takes care of tuning hyperparameters like learning rate. Table 5.1 represents the details about proposed CNN layout for AMC.
Table 5.1 CNN architecture layout for TF images using RML synthetic data set.
Layer | Output | Parameter |
Input | 128 × 128 × 1 | - |
Conv 1 (128 × 5 × 5), ReLU | 128 × 128 × 1 | 3,328 |
Average Pooling (4 × 4) | 64 × 64 × 128 | - |
Conv 2 (128 × 5 × 5), ReLU | 64 × 64 × 128 | 409,728 |
Average Pooling (4 × 4) | 32 × 32 × 128 | - |
Conv 3 (128 × 5 × 5), ReLU | 32 × 32 × 128 | 409,728 |
Average Pooling (4 × 2) | 16 × 16 × 128 | - |
Conv 4 (128 × 5 × 5), ReLU | 16 × 16 × 128 | 409,728 |
Average Pooling (4 × 2) | 8 × 8 × 128 | - |
Conv 5 (128 × 7 × 7), ReLU | 8 × 8 × 128 | 802,944 |
FC Dense 6 (256), ReLU | 256 | 2,097,408 |
FC Dense 7 (256), ReLU | 256 | 65,792 |
FC Dense 8 (90), Softmax | 90 | 23,130 |
The notion of EOCs is motivated from transfer learning in which novel labeling technique is adopted for the estimated output type. Individual sample in the training data set is labeled with two labels, i.e., the modulation and received SNR label, respectively. In AMC, using CNN common approach is to use only modulation class label. By using “EOC” method, the CNN is trained to estimate both the modulation label as well as the SNR label of the input sample. It is carried out by describing output classes with [Modulation, SNR] labels rather than just [Modulation] labels. Our data contains ten modulation types; hence there will be 90 output classes. As shown in Table 5.1 in the last FC dense layer consists of 90 neurons.
The motive behind the extended class approach is to make the network more adaptable to signal features at different SNR. Further, to prepare the CNN for unpredictable SNR situation that might be encountered during the testing of an unknown sample. Therefore, the network should learn to identify the reasonably accurate SNR scenario from the input sample and then familiarize itself to achieve superior classification accuracy. At the end, many-to-one mapping function block is implemented extract only modulation type.
5.3.1.3 Results and Discussion
It is customary and vital in ML for performance comparison to have standard benchmarks and open data sets [19]. That is the rule in the computer vision, voice recognition, and other applications in which DL techniques have gained more remarkable success. Similarly, a group of researchers in [7] has generated synthetic and over-the-air (OTA) data sets for modulation classification for conducting reproducible research in wireless communication [19, 7]. Publicly available data set RADIOML 2016.10A (synthetic) are used as a benchmark for training and evaluating the performance of the proposed classifier. The Keras framework was used to design CNN architecture. Network model training, validation, and testing have been carried out on benchmark data set. This data set is a sample, TF I-Q Image with a size of 128 × 128 for CNN, and it contains a total of 368,640 samples. Here, 85% (313,344) of the data samples are used for the training and validation set and the remaining 15% (55,296) are considered for testing purpose. The implementation of training and prediction of the proposed network is carried out in Keras running on top of TensorFlow using Google Colaboratory.
CNN is trained with TF spectral images of STFT with EOC. The classification accuracies achieved by the network, overall, the modulations of the Master data set, and for different values of received SNR levels are plotted in Figure 5.3. It is observed that STFT-CNN combinational module achieved a classification accuracy of 84% compared to the benchmark network of [19] that achieved 75% to 80% at the highest SNR 16 dB. This is because of TF spectral images could capture joint TF energy density, IF, and phase information independently. Classifying accuracy of both DL networks without preprocessing was lacking compared to the joint TF energy density preprocessed image–based networks.
Figure 5.3 Comparison of overall classification accuracy with benchmark network.
Figure 5.4 shows a confusion matrix for the CNN classifier for all the ten classes at 16 dB SNR. At high SNR of 16 dB networks confusion matrix plot depicts the almost clean diagonal with respect to predicted and true labels of 1,000 test data. It is observed that there is slight confusion between AM-DSB and WB-FM, QPSK and 8PSK, and 16 QAM and 64 QAM. This may be due to the short time observation window, and under harsh, noisy condition some features of QAM, PSK, and WB-FM may be matching with low order modulation.