Example 3.11 (Airbag system in a car)
A new car model was launched and a person driving such a car crashed into another car. The airbags did not operate as intended and the driver was critically injured. After the accident, it was found that the airbag system was not correctly installed. Later, it was found that the same error was made for all cars of the same type. The airbag failure was due to a systematic cause and all the cars of the same type had the same systematic fault. All these cars had to be recalled for repair and modification. There was nothing wrong with the airbag system as such and the airbag system manufacturer could not be blamed for the accident (unless the installation instructions were misleading or ambiguous). The car manufacturer had to cover the consequences of the failure. For drivers and passengers, the cause of the failure does not matter. A systematic failure has the same consequences as a primary (random hardware) failure.
Example 3.12 (Failure causes of a gas detection system)
A heavy (i.e. heavier than air) and dangerous gas is used in a chemical process. If a gas leakage occurs, it is important to raise an alarm and shut down the process as fast as possible. For this purpose, a safety‐instrumented system (SIS) is installed, with one or more gas detectors. The SIS has three main parts (i) gas detectors, (ii) a logic solver that receives, interprets, and transmits signals, and (iii) a set of actuating items (e.g. alarms, shutdown valves, door closing mechanisms). The purpose of the SIS is to give an automatic and rapid response to a gas leakage. Many more details about SIS may be found in Chapter 13.
Assume that a gas leak has occurred without any response from the SIS. Possible causes of the failure may include the following:
A primary (i.e. random hardware) failure of the SIS.
The installed gas detectors are not sensitive to this particular type of gas, or have been mis‐calibrated.
The gas detectors have been installed high up on walls or in the ceiling (remember, the gas is heavier than air.)
The gas detectors have been installed close to a fan (no gas will reach them.)
The gas detectors have been inhibited during maintenance (and the inhibits have not been removed.)
The gas detector does not raise alarm due to a software bug. (Most modern gas detectors have software‐based self‐testing features.)
The gas detector is damaged by, for example, sand‐blasting. (Has happened several times in the offshore oil and gas industry.)
Security Failures
A security failure is a failure caused by a deliberate human action. Many systems are exposed to a number of threats. The threats may be related to physical actions or cyberattacks. Physical threats include arson, sabotage, theft, and many more. A cyberattack is only relevant for systems that are connected to a cyber network (e.g. Internet, or mobile phone network). A threat may be used by a threat actor to attack the system. The system may have a number of vulnerabilities (i.e. weaknesses) that may be exploited by the threat actor to make a “successful” attack.
With the development of new technologies, such as cyber‐physical systems, the Internet of Things (IoT), smart‐grids, smart cities, remote operation and maintenance, and many more, cyberattacks come more frequently and we can now hardly open a newspaper without articles about cyberattacks. Many of these attacks are directed toward critical infrastructure and industrial control and safety systems.
The structure of a security failure is illustrated in Figure 3.11. A threat, a threat actor, and a vulnerability are required “inputs” for a security failure. The threat actor uses a threat to attack the system, and the threat inspires the threat actor. The attack can only be successful if the system has one or more vulnerabilities.
A security failure is not a random event, but the consequence of a deliberate action made by the threat actor. To reduce the likelihood of security failures, vulnerabilities should be identified and removed during system design.
Figure 3.11 The structure of a security failure.
Additional Types of Failures
When an item fails, the failure is often claimed to be caused by the control of the item, the input/output to/from the item, or misuse of the item. These causes are usually outside the boundary of the item and not something the manufacturer of the item can be responsible for.
Control failures. A control failure is an item failure caused by an improper control signal or noise, that is, due to factors outside the boundary of the item. A repair action may or may not be required to return the item to a functioning state. Failures caused by inadequate, or not followed operating procedures may also be classified as control failures.
Input/output failures. An input/output failure is a failure caused by inadequate or lacking item inputs or outputs, that is, due to factors outside the boundary of the item. For a washing machine, the washing service is stopped due to inadequate or lacking supply of electricity, water, or detergent, or due to inadequacies of the drainage system. Input/output failures will stop the service provided by the item but will usually not leave the item in a failed state. The item may not need any repair after an input/output failure. Input/output failures tell very little about the reliability of the item as such.
Misuse/mishandling failure. A misuse/mishandling failure is a failure that occurs because the item is used for a purpose that it was not designed for, or is mishandled. The mishandling may be due to a human error or a deliberate action such as sabotage. Some laws and standards (e.g. EU‐2006/42/EC) require that foreseeable misuse shall be considered and compensated for in the design and development of the item, and be covered in the operating context of the item.
The categories of failures listed above are not fully mutually exclusive. Some control failures may, for example, also be due to systematic causes.
Remark 3.2 (Functionally unavailable)
The US Nuclear Regulatory Commission (NRC) introduces the term functionally unavailable for an item that is capable of operation, but where the function normally provided by the item is unavailable due to lack of proper input, lack of support function from a source outside the component (i.e. motive power, actuation signal), maintenance, testing, the improper interference of a person, and so on.
The NRC‐term is seen to cover failures/faults of several of the categories above, most notably input/output and control failures.
Failures Named According to the Cause of Failure
Failures are sometimes named according to (i) the main cause of the failure, such as corrosion failure, fatigue failure, aging failure, calibration failure, systematic failure, and so forth, (ii) the type of technology that fails, such as mechanical failure, electrical failure, interface failure, and software bug, and (iii) the life cycle phase in which the failure cause originates, such as design failure, manufacturing failure, and maintenance failure.
When using this type of labeling, we should remember that the failure description does not tell how the failure is manifested, that is, which failure mode that occurs. The same failure mode may occur due to many different failure causes.
3.6.3