the same structure and weights. A simple example of the process can be written as
(9)
where
However, a drawback of RNN is that it has problem “remembering” remote information. In RNN, long‐term memory is reflected in the weights of the network, which memorizes remote information via shared weights. Short‐term memory is in the form of information flow, where the output from the previous state is passed into the current state. However, when the sequence length
(10)
where
(11)
Therefore,
6.3 Long Short‐Term Memory Networks
To solve the problem of losing remote information, researchers proposed long short‐term memory (LSTM) networks. The idea of LSTM was introduced in Hochreiter and Schmidhuber [19], but it was applied to recurrent networks much later. The basic structure of LSTM is shown in Figure 9. It solves the problem of the vanishing gradient by introducing another hidden state
Since the original LSTM model was introduced, many variants have been proposed. Forget gate was introduced in Gers et al. [20]. It has been proven effective and is standard in most LSTM architectures. The forwarding process of LSTM with a forget gate can be divided into two steps. In the first step, the following values are calculated:
(12)