the input weights will be changed so as to get the minimum error. The error value will be investigated every time and it is helpful in changing the weights at nodes. At every hidden node, functions called activation functions are also used. Some of them are as follows. Please see Figure 1.2.
1.4.1.1.1 Sigmoid
The sigmoid function is used because it ranges between 0 and 1.
Sigmoid advantage is that it gives output which lies between 0 and 1 so we can we use it when even we need to predict the probability as probability always exists between (0, 1). There are two major disadvantages of using sigmoid activation function. One of them is that the outputs from the sigmoid are always not zero centered. The second one is the problem of gradients getting nearly 0. Please see Figure 1.3.
1.4.1.1.2 Tanh
Tanh is quite similar to sigmoid but better and ranges from −1 to 1.
The main advantage of using tanh is that we can get outputs +ve as well as −ve which helps us to predict values which can be both +ve and −ve. The slope for the tanh graph is quite steeper than the sigmoid. Using the tanh or sigmoid totally depends on the dataset being used and the values to be predicted. Please see Figure 1.4.
Figure 1.3 Sigmoid function.
Figure 1.4 Tanh function.
1.4.1.1.3 ReLU
ReLU ranges from 0 to infinity.
Using ReLU can rectify the vanishing grading problem. It also required very less computational power compared to the sigmoid and tanh. The main problem with the ReLU is that when the Z < 0, then the gradient tends to 0 which leads to no change in weights. So, to tackle this, ReLU is only used in hidden layers but not in input or output layers.
All these activation functions and forward and backward propagation are the key features that make artificial neural networks different from others. Please see Figure 1.5.
Figure 1.5 ReLU function.
Figure 1.6 Basic Bernoulli’s restricted Boltzmann machine.
1.4.2 Bernoulli’s Restricted Boltzmann Machines
A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network which learns a probability distribution over its set of inputs. Please see Figure 1.6.
Bernoulli’s RBM has binary type of hidden and visible units hi and vi, respectively, and a matrix of weights w. It also has bias weights ai and bi for visible and hidden units, respectively. With these, the energy equation can be written as follows:
(1.1)
The probability distribution over the hidden and visible layers in terms of energy is as follows:
(1.2)
Z is a normalizing constant just to make the sum of all probabilities equal to 1.
The conditional probability of h given v is as follows:
(1.3)
The conditional probability of v given h is as follows:
(1.4)
The individual activation probabilities are as follows:
(1.5)
(1.6)
1.5 Results
For ANN, the results are as follows.
For ANN model with one hidden layer, the accuracy vs. epochs plot (Figure 1.7).
For ANN model with two hidden layers, the accuracy vs. epochs plot (Figure 1.8).
For ANN model with three hidden layers, the accuracy vs. epochs plot (Figure 1.9).
For ANN model with four hidden layers, the accuracy vs. epochs plot (Figure 1.10).
The accuracy vs. number of hidden layers in ANN plot Figure (1.11).
Figure 1.7 Accuracy plot for one hidden layer–based ANN.
Figure 1.8 Accuracy plot for two hidden layer–based ANN.
Figure 1.9 Accuracy plot for three hidden layer–based ANN.
Figure 1.10 Accuracy plot for four hidden layer–based ANN.