of node v′s neighbors ordered by their node degree as inputs. As suggested by Eq. (5.66), DRNE implicitly learns network embeddings via an LSTM network rather than by using the LSTM network to generate network embeddings. It avoids the problem that the LSTM network is not invariant to the permutation of node sequences.Network representations with adversarially regularized autoencoders (NetRA) [58] proposes a graph encoder‐decoder framework with a general loss function, defined as
(5.67)
where dist (·) is the distance measure between the node embedding z and the reconstructed z. The encoder and decoder of NetRA are LSTM networks with random walks rooted on each node v ∈ V as inputs.
Graph generation (decoding): With multiple graphs, GAEs are able to learn the generative distribution of graphs by encoding graphs into hidden representations and decoding a graph structure given the hidden representations. These methods either propose a new graph in a sequential manner or in a global manner.
Sequential approaches generate a graph by proposing nodes and edges step by step.
Deep Generative Model of Graphs (DeepGMG) [59] assumes that the probability of a graph is the sum over all possible node permutations:
(5.68)
where π denotes a node ordering. It captures the complex joint probability of all nodes and edges in the graph. DeepGMG generates graphs by making a sequence of decisions, namely, whether to add a node, which node to add, whether to add an edge, and which node to connect to the new node. The decision process of generating nodes and edges is conditioned on the node states and the graph state of a growing graph updated by an RecGNN.
Global approaches output a graph all at once.
Graph variational autoencoder (GraphVAE) [60] models the existence of nodes and edges as independent random variables. By assuming the posterior distribution qφ (z∣ G) defined by an encoder and the generative distribution pθ(G ∣ z) defined by a decoder, GraphVAE optimizes the variational lower bound:
(5.69)
where p(z) follows a Gaussian prior, φ and θ are learnable parameters. With a ConvGNN as the encoder and a simple multi‐layer perception as the decoder, GraphVAE outputs a generated graph with its adjacency matrix, node attributes, and edge attributes
5.2.4 STGNNs
These networks are designed to capture the dynamics of graphs. STGNNs follow two directions, recurrent neural network (RNN)‐based methods and CNN‐based methods. Most RNN‐based approaches capture spatial‐temporal dependencies by filtering inputs and hidden states passed to a recurrent unit using graph convolutions. To illustrate this (see Figure 5.7), suppose a simple RNN takes the form
where X(t) ∈ Rn × d is the node feature matrix at time step t. After inserting graph convolution, Eq. (5.70) becomes
(5.71)
where Gconv (·) is a graph convolutional layer. The Graph Convolutional Recurrent Network (GCRN) combines a LSTM network with ChebNet. DCRNN incorporates a proposed DGC layer (Eq. (5.50)) into a GRU network.
Previous methods all use a predefined graph structure. They assume the predefined graph structure reflects the genuine dependency relationships among nodes. However, with many snapshots of graph data in a spatial‐temporal setting, it is possible to learn latent static graph structures automatically from data. To realize this,
Figure 5.7 A STGNN for spatial‐temporal graph forecasting. A graph convolutional layer is followed by a 1D‐CNN layer. The graph convolutional layer operates on A and X(t) to capture the spatial dependency, while the 1D‐CNN layer slides over X along the time axis to capture the temporal dependency. The output layer is a linear transformation, generating a prediction for each node, such as its future value at the next time step.
Source: Wu et al. [38].
Graph WaveNet [61] proposes a self‐adaptive adjacency matrix to perform graph convolutions. The self‐adaptive adjacency matrix is defined as
(5.72)
where the softmax function is computed along the row dimension, E1 denotes the source node embedding, and E2 denotes the target node embedding with learnable parameters. By multiplying E1 with E2, one can get the dependency weight between a source node and a target node. With a complex CNN‐based spatial‐temporal neural network, Graph WaveNet performs well without being given an adjacency matrix.
5.3 Complexity of NN
In order to analyze the network complexity, we need a more detailed definition of both network topologies and network parameters. For this, we will use the original definitions introduced in [1], one of the first papers in this field. For this reason, in this section we will revisit the description of the GNN with more details on the specific model of the network, computation of the states, the learning process, and specific transition and output function implementations, and to facilitate a better understanding of these processes, we will provide a comparison with random walks and recursive neural nets. When discussing computational complexity issues, we will look at the complexity of instructions and the time complexity of the GNN mode. The illustrations will be based on the results presented in [1].
5.3.1 Labeled Graph NN (LGNN)
Here, a graph G is a pair (N, E), where N is the set of nodes and E is the set of edges. The set ne [n] stands for the neighbors of n, that is, the nodes connected to n by an arc, while co[n] denotes the set of arcs having n as a vertex. Nodes and edges may have labels represented by real vectors. The labels attached to node n and edge (n1, n2) will be represented by
If y is a vector that contains data from a graph and S is a subset