Группа авторов

Reservoir Characterization


Скачать книгу

3.5 Median of posterior AUC for three AD classifiers as a function of the size of the training set.

      According to Figure 3.6, divergence has the narrowest AUC quantile region. AUC quantile regions of two other AD classifiers are significantly wider than that of divergence. The width of AUC quantile regions for these classifiers decreases with the increasing size of the training sets. It is still about three times as wide as the AUC quantile width of divergence for the size of training set 20, 25 records.

      It was shown in the previous section that a divergence AD classifier is very efficient in detecting an anomaly with known properties. It was noted also that universal classifiers have lower efficiency compared to divergence.

      1 1. Detection of a part of the anomaly using universal classifiers, such as the distance or the sparsity.

      2 2. Optimization of aggregated classifier on the detected part of anomaly.

      The structure of the aggregated classifier is defined by a set of coefficients sm (Eq. 3.7). These coefficients are chosen so that they maximize the ratio:

      where sm; 1 ≤ mM are weights and aggr() is defined by Eq. 3.7, classifier Anomaly Records are records identified by a universal classifier as anomaly, trainSetRecords are records from the training set.

      (3.9)image

      The efficiency criterion (3.8) is calculated for each combination of coefficients sm at individual grid nodes. An adaptive aggregated AD classifier maximizes efficiency criterion on the grid. As soon as the aggregated classifier is synthesized it may be used for anomaly detection using the test set.

      We illustrate construction of an adaptive aggregated classifier using the sparsity classifier at the first optimization step. The classifier to be synthesized is of the following form:

      (3.10)image

      Therefore, search is done on the two-dimensional grid.

Schematic illustration of histograms of posterior true discovery rate (TDR) for two values of expected false discovery rate.
Mean Quantiles Width of quantile region
P=0.05 Median P=0.5 P=0.95
Divergence 0.897 0.89 0.895 0.909 0.019
Aggregated 0.866 0.862 0.87 0.901 0.039
Distance 0.795 0.633 0.818 0.885 0.252
Sparsity 0.765 0.576 0.786 0.868 0.292