first three — identification, quantification and probability — are sometimes grouped together under Risk Analysis or Risk Assessment. With these four functions completed, the last step is to exercise:
• risk vigilance
Risk vigilance is simply the recognition of the risk conditions, the ongoing response to the risk conditions on the ground, and implementation of the appropriate risk responses. Vigilance requires you to identify an appropriate trigger and this trigger defines the parameters for your vigilance. We say appropriate risk responses because different risks require different approaches. For example, you might have a smoke detector to monitor for the risk of fire and machine guards to manage the risk of injury.
Here is an example of this approach:
• Risk identification: Failure of the bearings on a turbine is a risk.
• Risk quantification: In the event of a failure, will anyone get hurt and how hurt? How much money will the event cost in downtime per hour or per day? What is the cost if the repair takes a few weeks or months instead of a few hours or days?
• Risk probability: What is the chance this risk will happen? Has it happened before? Does the manufacturer warn us about the risk? Do we have statistics about MTBF (Mean Time Between Failures) for the bearings in question?
• Risk response: Can we eliminate, mitigate, or anticipate the failure? What PM will extend life? What parts will be needed if the breakdown does occur? How costly is the kit? Can the risk be transferred to someone else (by using supply contracts or buying insurance)? Does the waiting time for the part introduce any unanticipated risk?
• Risk vigilance: How do we organize our team and maintenance strategy so that an event becomes apparent quickly enough that we have time to respond? In addition to vigilance, this aspect includes responding to changes in the character of the risks over the life span of the plant.
RISK MANAGEMENT OPTIONS
In all cases of risk, you have four options for managing that risk. As you evaluate each risk, you need to adopt a management strategy based on the option you choose. The risk management options are presented here in the order in which they should be considered.
You can mix and match options to get the loss level you want. For example, the probability of losing small tools when you let outsiders into the plant is fairly high, but the loss consequence is generally more of a nuisance than a catastrophe. Typical strategies include placing a guard and an entry gate (partial mitigation), but that is pretty much it, you don’t scan or search the workers. You just accept the loss of some items. If the loss level gets too high, you can add weight to the mitigation strategy perhaps by locking up more tools or adding surveillance cameras. Your response could escalate until the losses are reduced to an acceptable level.
1. Avoid the risk. One way to avoid risk is to re-design the work. In many circumstances, this might involve reengineering, choosing long-lived assets, or even replacing the asset. The best way to avoid the risk of an iatrogenic failure (failure caused by the mechanic or electrician) is to design the system to not break down! Of course, that is tough, but improvements in reliability that are based on equipment design are made every day. If you can’t eliminate the risk, the next step is to mitigate it.
Most plants have an Environmental Health and Safety (EHS) department that reviews requests from contractors to bring specialized chemicals on-site. If a chemical is exceptionally risky, EHS can reject the request and literally eliminate the risk posed by that chemical completely. However, that chemical might be the best of the bunch; the others may add other problems.
2. Mitigate the risk. Mitigation involves reducing the probability of the risk happening (using existing technology instead of new technology) or reducing the consequence of the risk or some combination of both.
Most types of PPE are mitigation in action. Hard hats, steel toes boots, safety glasses, and arc flash suits are all examples of mitigation of various (relatively common) hazards.
In the aircraft industry, the risk of incorrect repairs has both safety and economic consequences. The industry mitigates this risk through rigorous repair procedures, certification of operators and mechanics, and close-in inspection. While these actions mitigate the risk, they do not eliminate it.
In an industrial situation, one way to minimize the consequence of a breakdown risk is to have backup systems in place (such as a backup generator on a truck) or to have a trusted vendor on call. Another way to reduce risk is to follow the concept of precision maintenance. This is maintenance with all the specifications included for a proper repair (such as torque, belt tension, tolerance allowances, etc.).
3. Insure the risk. Insurance is a form of risk mitigation in that it minimizes the economic consequence of the risk.
Modern insurance started around 1691 in a coffeehouse frequented by ship owners, sailors, and investors in London. When a ship went on a voyage, the owner could insure the value. Different investors would insure the amount of the ship they were comfortable losing until the ship’s value was covered.
The investors literally wrote the risk they were willing to underwrite by writing it under the ship’s name on a chalkboard. The premium was collected and split proportionally to the amount of the risk. The coffee shop, which was called Lloyds, is still in operation (although it doesn’t still serve coffee) as an insurance market. The partners covering the risk are still called names. The Lloyds syndicate works together to pool and spread risk.
Insurance is included here as a separate option; its purpose is to shift the financial impact of the risk from you to the insurer. Here are some common types of insurance:
a. Fire insurance for fires
b. Liability insurance for accidents to visitors or users of product
c. Workmen’s compensation insurance for employee injuries
d. Business continuity insurance to cover catastrophic interruptions to business activity.
e. Stop loss insurance (see below)
4. Accept the risk. You decide that the risk probability or consequence is sufficiently low that you can handle it without help, active mitigation, or additional systems. Sometimes this approach is referred to as “self-insurance.”
As an example, companies with large vehicle fleets often do not take out external collision insurance. They accept that they will need to repair/replace vehicles involved in an accident on the basis that, in the long run, the cost is less expensive (because of the large population) than the insurance would be. The problem — the greatest risk — with self-insurance is the possibility of disaster; for example, if the trucks are parked in one yard, a fire could destroy all the trucks in one swoop. To limit your risk, you might buy a “stop loss policy” that pays only after $1 million of loss. You also might not want to park the trucks close together (risk reduction strategy).
Another reason to accept the risk is that the chance of the problem has been examined and it is so remote that mitigation might not be cost effective. Geological events such as earthquakes are remote risks for people living in Pennsylvania or New Jersey. If we were planning a refinery in that region we might not worry too much about earthquake risks. On the other hand, New Jersey is close to the Atlantic Ocean; therefore, storm events must be included in the risk review.
WHAT IS THE RISK IN RISK MANAGEMENT?
Understand the consequences of a loss of function. For our purposes, loss of function can be and usually is the result of a breakdown. Other causes could be operational error, problems with raw material, etc. If a spare part is used to “fix” the loss of function, then we assume something broke down (or wore out and had to be replaced).
Loss of function has several consequences, usually classified as safety, environmental, and economic. Some loss of function — for example, Bhopal India’s