as an alternative solution, where an anomaly‐based IDS builds only normal profiles from the normal data that is collected over a period of “normal” operations. However, the main drawback of this learning method is that comprehensive and “purely” normal data are not easy to obtain. This is because the collection of normal data requires that a given system operates under normal conditions for a long time, and intrusive activities may occur during this period of the data collection process. On the another hand, the reliance only on abnormal data for building abnormal profiles is infeasible since the possible abnormal behavior that may occur in the future cannot be known in advance. Alternatively, and for preventing threats that are new or unknown, an anomaly‐based IDS uses unsupervised learning methods to build normal/abnormal profiles from unlabeled data, where prior knowledge about normal/abnormal data is not known. Indeed, this is a cost‐efficient method since it can learn from unlabeled data. This is because human expertise is not required to identify the behavior (whether normal or abnormal) for each observation in a large amount of training data sets. However, it suffers from low efficiency and poor accuracy.
This book provides the latest research and best practices of unsupervised intrusion detection methods tailored for SCADA systems. In Chapter 3, framework for a SCADA security testbed based on virtualisation technology is described for evaluating and testing the practicality and efficacy of any proposed SCADA security solution. Undoubtedly, the proposed testbed is a salient part for evaluating and testing because the actual SCADA systems cannot be used for such purposes because availability and performance, which are the most important issues, are most likely to be affected when analysing vulnerabilities, threats, and the impact of attacks. In the literature, the k‐Nearest Neighbour (k‐NN) algorithm was found to be one of top ten most interesting and best algorithms for data mining in general and in particular it has demonstrated promising results in anomaly detection. However, the traditional k‐NN algorithm suffers from high and “curse of dimensionality” since it needs a large amount of distance calculations. Chapter 4 describes a novel k‐NN algorithm that efficiently works on high‐dimensional data of various distributions. In addition, an extensive experimental study and comparison with several algorithms using benchmark data sets were conducted. Chapters 5 and 6 introduce the practicality and possibility of unsupervised intrusion detection methods tailored for SCADA systems, and demonstrate the accuracy of unsupervised anomaly detection methods that build normal/abnormal profiles from unlabeled data. Finally, Chapter 7 describes two authentication protocols to efficiently protect SCADA Systems, and Chapter 8 nicely concludes with the various solutions/methods described in this book with the aim to outline possible future extensions of these described methods.
PREFACE
Supervisory Control and Data Acquisition (SCADA) systems have been integrated to control and monitor industrial processes and our daily critical infrastructures, such as electric power generation, water distribution, and waste water collection systems. This integration adds valuable input to improve the safety of the process and the personnel, as well as to reduce operation costs. However, any disruption to SCADA systems could result in financial disasters or may lead to loss of life in a worst case scenario. Therefore, in the past, such systems were secure by virtue of their isolation and only proprietary hardware and software were used to operate these systems. In other words, these systems were self‐contained and totally isolated from the public network (e.g., the Internet). This isolation created the myth that malicious intrusions and attacks from the outside world were not a big concern, and such attacks were expected to come from the inside. Therefore, when developing SCADA protocols, the security of the information system was given no consideration.
In recent years, SCADA systems have begun to shift away from using proprietary and customized hardware and software to using Commercial‐Off‐The‐Shelf (COTS) solutions. This shift has increased their connectivity to the public networks using standard protocols (e.g., TCP/IP). In addition, there is decreased reliance on specific vendors. Undoubtedly, this increases productivity and profitability but will, however, expose these systems to cyber threats. A low percentage of companies carry out security reviews of COTS applications that are being used. While a high percentage of other companies do not perform security assessments, and thus rely only on the vendor reputation or the legal liability agreements, some may have no policies at all regarding the use of COTS solutions.
The adoption of COTS solutions is a time‐ and cost‐efficient means of building SCADA systems. In addition, COST‐based devices are intended to operate on traditional Ethernet networks and the TCP/IP stack. This feature allows devices from various vendors to communicate with each other and it also helps to remotely supervise and control critical industrial systems from any place and at any time using the Internet. Moreover, wireless technologies can efficiently be used to provide mobility and local control for multivendor devices at a low cost for installation and maintenance. However, the convergence of state‐of‐the‐art communication technologies exposes SCADA systems to all the inherent vulnerabilities of these technologies.
An awareness of the potential threats to SCADA systems and the need to reduce risk and mitigate vulnerabilities has recently become a hot research topic in the security area. Indeed, the increase of SCADA network traffic makes the manual monitoring and analysis of traffic data by experts time‐consuming, infeasible, and very expensive. For this reason, researchers begin to employ Machine Learning (ML)‐based methods to develop Intrusion Detection Systems (IDSs) by which normal and abnormal behaviors of network traffic are automatically learned with no or limited domain expert interference. In addition to the acceptance of IDSs as a fundamental piece of security infrastructure in detecting new attacks, they are cost‐efficient solutions for minoring network behaviors with high‐accuracy performance. Therefore, IDS has been adopted in SCADA systems. The type of information source and detection methods are the salient components that play a major role in developing an IDS. The network traffic and events at system and application levels are examples of information sources. The detection methods are broadly categorized into two types in terms of detection: signature‐based and anomaly‐based. The former can detect only an attack whose signature is already known, while the latter can detect unknown attacks by looking for activities that deviate from an expected pattern (or behavior). The differences between the nature and characteristics of traditional IT and SCADA systems have motivated security researchers to develop SCADA‐specific IDSs. Recent researches on this topic found that the modelling of measurement and control data, called SCADA data, is promising as a means of detecting malicious attacks intended to jeopardize SCADA systems. However, the development of efficient and accurate detection models/methods is still an open research area.
Anomaly‐based detection methods can be built by using three modes, namely supervised, semi‐supervised, or unsupervised. The class labels must be available for the first mode; however, this type of learning is costly and time‐consuming because domain experts are required to label hundreds of thousands of data observations. The second mode is based on the assumption that the training data set represents only one behavior, either normal or abnormal. There are a number of issues pertaining to this mode. The system has to operate for a long time under normal conditions in order to obtain purely normal data that comprehensively represent normal behaviors. However, there is no guarantee that any anomalous activity will occur during the data collection period. On the other hand, it is challenging to obtain a training data set that covers all possible anomalous behaviors that can occur in the future. Alternatively, the unsupervised mode can be the most popular form of anomaly‐based detection models that addresses the aforementioned issues, where these models can be built from unlabeled data without prior knowledge about normal/abnormal behaviors. However, the low efficiency and accuracy are challenging issues of this type of learning.