of relations is very large, r‐GCN (Relational Data with Graph Convolutional Networks) [7] introduces two kinds of regularization to reduce the number of parameters for modeling relations: basis‐ and block‐diagonal‐decomposition. With the basis decomposition, each Wr is defined as follows:(5.7)
Here each Wr is a linear combination of basis transformations
Dynamic graphs: These have a static graph structure and dynamic input signals. To capture both kinds of information, diffusion convolutional recurrent NN (DCRNN) [8] and spatial‐temporal graph convolutional networks (STGCN) [9] first collect spatial information by GNNs, then feed the outputs into a sequence model like sequence‐to‐sequence model or convolutional neural networks (CNNs). On the other hand, structural‐recurrent NN [10] and STGCN [11] collect spatial and temporal messages at the same time. They extend the static graph structure with temporal connections so that they can apply traditional GNNs to the extended graphs.
5.1.2 Propagation Types
The propagation step and output step in the model define the hidden states of nodes (or edges). Several major modifications have been made to the propagation step from the original GNN model, whereas in the output step a simple feedforward neural network setting is the most popular. The variants utilize different aggregators to gather information from each node’s neighbors and specific updaters to update nodes’ hidden states.
Convolution has been generalized to the graph domain as well. Advances in this direction are often categorized as spectral approaches and non‐spectral (spatial) approaches. Spectral approaches work with a spectral representation of the graphs.
Spectral network: The spectral network was proposed in Bruna et al. [12]. The convolution operation is defined in the Fourier domain by computing the eigen decomposition of the graph Laplacian. For the basics of these techniques, see Appendix 5.A. The operation can be defined as the multiplication of a signal x ∈ ℝN (a scalar for each node) with a filter gθ = diag(θ) parameterized by θ ∈ ℝN:
(5.8)
where U is the matrix of the eigenvectors of the normalized graph Laplacian
ChebyshevNet [13] truncates gθ(Λ) in terms of Chebyshev polynomials Tk(x) up to Kth order as
(5.9)
Graph convolutional network (GCN) [14] : This limits the layer‐wise convolution operation to K = 1 to alleviate the problem of overfitting on local neighborhood structures for graphs with very wide node degree distributions. By approximating λmax ≈ 2, the equation simplifies to
(5.10)
with two free parameters
Non‐spectral approaches define convolutions directly on the graph, operating on spatially close neighbors. The major challenge with non‐spectral approaches is defining the convolution operation with differently sized neighborhoods and maintaining the local invariance of CNNs.
Neural fingerprints (FP) [15] use different weight matrices for nodes with different degrees,
(5.12)
where
Diffusion‐convolutional neural networks (DCNNs) were proposed in [16].Transition matrices are used to define the neighborhood for nodes in DCNN. For node classification, it has the form
(5.13)
where X is an N × F tensor of input features (N is the number of nodes, and F is the number of features). P* is an N × K × N tensor that contains the power series {P, P2, …, PK} of matrix P, and P is the degree‐normalized transition matrix from the graph adjacency matrix A. Each entity is transformed to a diffusion convolutional representation that is a K × F matrix defined by K hops of graph diffusion over F features. It will then be defined by a K × F weight matrix and a nonlinear activation function f. Finally, H (which is N × K × F) denotes the diffusion representations of each node in the graph. As for graph classification, DCNN simply takes the average of nodes’ representation,
(5.14)
and 1N here is an N × 1 vector of ones. DCNN can also be applied to edge classification tasks, which requires converting edges to nodes and augmenting the adjacency matrix.
Dual graph convolutional network (DGCN) [17]: This jointly considers the local consistency and global consistency on graphs. It uses two convolutional networks to capture the local/global consistency and adopts an unsupervised loss function to ensemble them. As the first step, let us note that stacking the operator in Eq. (5.11) could lead to numerical instabilities and exploding/vanishing gradients, so [14] introduces the renormalization trick: