1.21. The generic algorithms of data analysis, including statistical data processing and classification roles, are classified as computer science and data mining algorithms; other high‐level algorithms are included in AI algorithms. The main function of the engine processor enables the managing of big data and of data processing. As previously mentioned, a basic concept of algorithm classification is in the learning supervision: in a supervised learning model, the algorithm learns on a pre‐selected dataset with specific labeled attributes filtered by the user; in an unsupervised model all the attributes are unlabeled, and the algorithm tries to extract features and patterns without a guideline. The supervised algorithms mainly support the user to find a solution for a specific problem such as finding a specific defect category or a specific failure system.
Figure 1.21 Algorithm classification and Industry 5.0 facilities.
A simple way to analyze the data trend is regression analysis, where a linear approach is enough to model a relationship between dependent variables and independent ones (see plot example of Figure 1.22a). Typically, line regression provides information about a linear trend prediction. The classification is based on the concept of data categorization: data are classified by considering a generic classification pattern curve defined as shown in Figure 1.22b, where all data above and below the curve appertain to a particular class. Data classification is often used to solve supervised learning models. Finally, data clustering is based on the grouping of datasets forming clusters having similar features. Data clustering is commonly used to solve unsupervised learning models. All analysis can be performed by analyzing a multidimensional domain by taking into account the variable time, which is fundamental for forecasting approaches.
Figure 1.22 (a) Regression analysis, (b) data classification, and (c) data clustering.
The ensemble approach is an alternative method for data classification. An ensemble is a set of classifiers that learn a target function. By combining different outputs of several classifiers, the risk of selecting a poorly performing classifier is reduced. The typical ensemble procedure is provided by the following pseudocode where T denotes the original training dataset, κ is the number of base classifiers, and B is the test data:
1. For i=1 to k do 2. Create training set Ti from T. 3. Build a base classifier Ci from T. 4. end For 5. for each test record x∈ B do: 6. C*(x)=Vote (C1(x),C2(x),…,Ck(x)) 7. end For
The individual predictions of classifiers are combined to classify new samples, thus optimizing the classifier performance on a specific domain. Figure 1.23 shows an example of an ensemble approach based on the best classification of a single feature.
Figure 1.23 Ensemble method and classification.
The RFo method [73] is a class of ensemble methods specifically designed for DT classifiers. The main property is to combine predictions performed by more DT models: each tree is generated from a part of the training dataset and the values are grouped into a set of random vectors. This algorithm is structured as follows (Figure 1.24):
F input features are randomly selected to split at each node (step 1 of creation of random vectors).
A linear combination of the input features is created to split at each node (step 2 of using a random vector to build multiple DTs).
A combination of DTs is created (step 3).
Figure 1.24 Ensemble method and classification.
The RFo classification technique is also applied in image processing detecting defect features. The logic of the DT algorithm is reported by the following pseudocode:
Decision_Tree Function. 1. Compute Gain values for all attributes and select an attribute having the highest value creating a node for that attribute. 2. Make a branch from this node for every value of the attribute. 3. Assign all possible values of the attributes to branches.
Follow each branch partitioning the dataset to be only instances whereby the value of the branch is present (or for similar values) and then go back to 1.
A particular feature of neural network algorithms are the long short‐term memory (LSTM) networks, which are artificial recurrent neural network (RNN) architectures [74] used for DL applications. The basic architecture is shown in Figure 1.25a: the structure is a cell (cell state), an input gate (input gate state activated at the time step t), an output gate (output gate state at the time step t), and a forget gate (forget gate state at the time step t). The gates calculate their activations at time step t, by taking into account the activation of the memory cell C at time step t ‐
1. Figure 1.25b shows the basic model of a network composed of LSTM nodes.
Figure 1.25 (a) LSTM unit cell. (b) LSTM network and its memory.
Another important parameter used for the choice of the training dataset is the correlation coefficient, indicating a relationship between two attributes. The correlation coefficient indicates a value between −1 and 1 (−1 when the attributes are inversely correlated, +1 when the attributes are absolutely correlated, and 0 represents no correlations). The correlation coefficient considers the magnitudes of the deviations from the mean value. In particular, the Pearson–Bravais correlation coefficient is estimated as the ratio between the covariance of the two variables and the product of their standard deviations as follows [75]:
(1.32)
The correlation coefficients are plotted in a correlation matrix with the structure shown in Table 1.9.
Table 1.9 Example of a correlation matrix for a model with five attributes (v1, v2, v3, v4, v5).
v 1 | v 2 |
v
|
---|