rel="nofollow" href="#fb3_img_img_8317e7c1-3dc7-585c-b186-a7c079f2d1b7.png" alt="images"/> otherwise. Vectors bn and matrices An,u do not depend on the state x, but only on the node and edge labels. Thus, ∂Fw/∂x = A, and, by simple algebra we have
which implies that Fw is a contraction map (w.r. t. ‖ ‖1 ) for any set of parameters w.
1 Nonlinear (nonpositional) GNN. In this case, hw is realized by a multilayered feedforward NN. Since three‐layered neural networks are universal approximators [67], hw can approximate any desired function. However, not all the parameters w can be used, because it must be ensured that the corresponding transition function Fw is a contraction map. This can be achieved by adding a penalty term to Eq. (5.79), that is where the penalty term L(y) is (y − μ)2 if y > μ and 0 otherwise, and the parameter μ ∈ (0, 1) defines the desired contraction constant of Fw . More generally, the penalty term can be any expression, differentiable with respect to w, that is monotone increasing with respect to the norm of the Jacobian. For example, in our experiments, we use the penalty term , where Ai is the i‐th column of ∂Fw/∂x. In fact, such an expression is an approximation of L(‖∂Fw/∂x‖1) = L(maxi‖Ai‖1).
5.3.2 Computational Complexity
Here, we derive an analysis of the computational cost in GNN. The analysis will focus on three different GNN models: positional GNNs, where the functions fw and gw of Eq. (5.74) are implemented by FNNs; linear (nonpositional) GNNs; and nonlinear (nonpositional) GNNs.
First, we will describe with more details the most complex instructions involved in the learning procedure (see Table 5.2 reproduced from [1]). Then, the complexity of the learning algorithm will be defined. For the sake of simplicity, the cost is derived assuming that the training set contains just one graph G. Such an assumption does not cause any loss of generality, since the graphs of the training set can always be merged into a single graph. The complexity is measured by the order of floating point operations. By the common definition of time complexity, an algorithm requires O (l(a)) operations, if there exist α>0,
We will assume that there exist two procedures FP and BP, which implement the forward phase and the backward phase of the back propagation procedure, respectively. Formally, given a function lw : Ra → Rb implemented by an FNN, we have
Table 5.2 Time complexity of the most expensive instructions of the learning algorithm. For each instruction and each GNN model, a bound on the order of floating point operations is given. The table also displays the number of times per epoch that each instruction is executed.
Source: Scarselli et al. [1].
Instruction | Positional | Nonlinear | Linear | Execs. |
---|---|---|---|---|
z(t + 1) = z(t) ⋅ A + b | s2 ∣ E∣ | s2 ∣ E∣ | s2 ∣ E∣ | itb |
o = Gw(x(t), lw) |
|
|
|
1 |
x(t + 1) = Fw(x(t), l) |
|
|
s2 ∣ E∣ | itf |
|
1 | |||
|
|
|
– | 1 |
|
∣N∣ | ∣N∣ | ∣N∣ | 1 |
|
|
|
– | 1 |
|
|
|
|
1 |
|