are “extracted” by more accurate inspections and by the application of specific intelligent image processing algorithms enhancing anomalous features. Intelligent algorithms are usually named machine learning (ML). In Table 1.6 the main ML unsupervised and supervised algorithms are classified [42]. The supervised learning algorithm processes a known input dataset and data outputs to learn the regression/classification model. In supervised learning approaches, the training is performed by “labelled” data, selecting specific variables to focus the analysis: some data are already tagged with the requested answer, and the labeled data are adopted for the self‐learning of the algorithms predicting outcomes of the labeled variables. Unsupervised learning is the training modality of the algorithm which processes a dataset that is not classified/labeled. In the unsupervised learning approaches the model does not need to be supervised: the models discover information and common features of the variables (attributes) and find all kinds of unknown patterns in the data. The learning phase is structured in the following sequential steps:
Training dataset construction.
Features vector extraction.
Algorithm application setting data processing parameters.
Training model construction.
Table 1.6 Classification of machine learning algorithms.
Machine learning algorithm class | Unsupervised | Supervised |
---|---|---|
Continuous | Clustering:K‐meansMean shift clusteringDensity‐based spatial clustering of applications with noiseExpectation maximization Clustering using gaussian mixture modelsAgglomerative hierarchical clusteringDimensionality reduction:Principal component analysisSingular value decomposition | Linear regressionPolynomial regressionArtificial neural networkRandom forestsDecision trees |
Categorical | Association analysis:AprioriFP‐growthHidden Markov model | Classification:k‐nearest neighborsDecision treesLogistic regressionNaïve BayesArtificial neural networkSupport vector machine |
Both classes of supervised and unsupervised algorithms are typically applied for data processing applications of image processing for feature classification.
A typical class of unsupervised algorithms are the clustering ones. Clustering methods are able to group objects into homogeneous classes. A cluster is a set of objects that have similarities to each other, but which have dissimilarities with objects in other clusters. The input of a clustering algorithm consists of elements, while the output are clusters where the elements are divided according to a similarity measure. Clustering algorithms also provide a description of the characteristics of each cluster, which is essential for decision‐making processes. Concerning the continuous class of unsupervised algorithms, the K‐means algorithm is the classical approach estimating the centroid of the clustered data named clusters. The mean shift clustering (MSC) is a sliding‐window‐based algorithm executed to find dense areas of data points defining the centroids. The density‐based spatial clustering of applications with noise (DBSCAN) has the main characteristic to find arbitrarily sized and arbitrarily shaped clusters. The expectation maximization (EM) clustering uses the gaussian mixture model () approach assuming data points distributed as a Gaussian function and characterized by mean and standard deviation. Agglomerative hierarchical clustering (AHC) group clusters follow a hierarchy represented by a tree or by a dendrogram: the root of the tree is the main cluster grouping all the samples, and the leaves are clusters with only one sample. Principal component analysis (PCA) is able to reduce the dimensionality of a dataset made up of many more or less correlated variables. The single value deposition (SVD) technique is a particular factorization of a matrix based on the use of eigenvalues and eigenvectors. Concerning the categorical class of unsupervised algorithms, the Apriori algorithm is adopted for association rules for frequent item set mining. The algorithm of FP‐growth is able to complete a set of frequent patterns by pattern fragment growth, using frequent patterns. The hidden Markov model is a statistical Markov model with hidden states. Concerning the continuous class of supervised algorithms, the regression is a statistical process that tries to establish a relationship between two or more variables. If a regression model is given an x value, this returns the corresponding y value generated by the processing of x. The linear regression differs from the classification, since the latter is limited to discriminating the elements in a given number of classes (label), while for the linear regression approach the input is data and the system gives a real output (unlike the classification method which receives as input data and returns as output a label dataset). Polynomial regression uses the same method as linear regression, but assumes that the function that better describes the data trend is not a straight line, but a polynomial. Artificial neural networks (ANNs) are able to classify and predict data. Specifically, ANNs are made by three types of layer: the input layer, the hidden layers, and the output layer. In the input layer, the neural network receives the data in the form of inputs, activates and processes them according to the classification capacity for which it is trained, and passes the information obtained to the next layer as in neuron propagation. At each step, the starting information takes on an increasingly refined meaning due to the interpretations of the different nodes. Finally, the processed data arrive at the output layer, which collects the results. Concerning ANNs, the network is structured to learn automatically in self‐learning modality. Each perceptron has the task of categorizing objects by referring to common characteristics, following a score system calculated on each analyzed element. In AI, the perceptron represents a binary classifier selecting data input, and provides a features vector. The classifier makes its predictions based on a linear predictor function combining a set of weights with the feature vector. In AI machine learning algorithms, the perceptrons are important for their self‐learning capacity, thus addressing these tools for auto‐adaptive production solutions. A multilayered perceptron (MLP) is a particular class of feedforward ANN characterized by multiple layers of perceptrons having a threshold activation. The MLP consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer. Except for the input nodes, each node is constituted by a neuron implementing an activation function. MLP utilizes a supervised learning technique called backpropagation for model training. In the class of categorical supervised algorithms, the random forest (RFo) represents a type of ensemble model, which uses bagging (the bagging aims to create a set of classifiers of equal importance) as an ensemble method and the decision tree (DT) as an individual model algorithm. Also DTs are a supervised learning tool, mainly solving classification or regression issues, capable of learning nonlinear associations and very easy to interpret and apply. DT algorithms work on both numeric and categorical data, and are categorized with respect to the output variable as categorical DT, and continuous DT. In the categorical class of a supervised algorithm, the k‐nearest neighbors (KNN) algorithm is an algorithm used for pattern recognition and for classification based on the characteristics of the objects close to the one considered. The logistic regression algorithm allows to generate a result representing a probability that a given input value belongs to a certain class. In binomial logistic regression problems, the probability that the output belongs to one class is P, while that it belongs to the other class is 1 ‐
P (where P is a number between 0 and 1 because it expresses a probability). Naïve Bayes is a supervised learning algorithm suitable for solving binary and multi‐class classification problems ant it is based on Bayes’ theorem defining the conditional probability: let A and B be two events, and let B be a possible event having a probability of occurrence P(B) ≠ 0, if A∩B indicates the intersection of the two events (both occurred events), it is defined the conditional probability P(A|B) (probability of A conditioned by B) as:
(1.1)
The