Machine Damage
Since the compressor is rated at over 2000 hp, the economic consequences of a catastrophic failure are high.
We can now summarize our risk assessment, as seen in Table 1.2, and conclude the highest consequences are associated with process losses. Our summary only contains significant consequences of interest.
Machines associated with potential high consequence events are considered critical machines, while those associated with low consequence events are considered non-critical machines. In our example, the extreme consequences are related to extended outages and catastrophic machine damage. It makes sense to label this machine a high criticality machine in your listing of plant assets. When you are done with your site assessment, you should have a listing of all your machines along with their criticality ratings. If you force-rank the list, the machines at the top of the list should be your highest priority in terms of potential impact on the plant.
Equipment History
The second step of a rotating machinery reliability assessment is to review the machine’s operating history. A historical review means to 1) review the history of site failures, and 2) discuss operating issues with plant personnel. The goal of this step is to determine the predominant failures modes associated with the machine in order to better understand which failure modes are key to controlling process reliability.
Let’s go through a simple example. Let’s say you are evaluating an electric motor driven, centrifugal pump that has two predominant failure modes: Mechanical seal failures and bearing failures. Through your research you discover that 70% of the failures are related to mechanical seals, 20% related to bearings, and the balance are related to other component failures. The failure history also indicated the motor can run 5 years between repairs. It’s clear that, if you want to improve the pump’s reliability, most of your efforts should be applied around the mechanical seal.
A historical review is a great starting point for an assessment because it allows you to identify the vital issues and then focus on them. Going back to our centrifugal pump example, if you discovered that there had been 2 seal failures per year over the last 4 years, it would be abundantly clear where you needed to direct your analysis efforts. During the historical review, it is important to maintain a broad view of the equipment and system and not focus on a single issue in case there might be hidden issues affecting reliability. For example, a seal issue might actually be related to a control issue and may not be associated with a seal design deficiency. A poorly designed tank level control scheme may be causing pump cavitation, which in turn leads to a seal failure. This hidden control issue highlights why you need to also talk to plant personnel about what is going on around the equipment you’re assessing. For example, an operator might tell you that there are times the pump seems to be pumping gravel. The key at this step of your analysis is to take a holistic approach to your evaluation.
Figure 1.2 Machinery failure modes distributions.
To assign a risk level to a failure mode, you must also understand the type of failure distribution associated with each failure mode. Figure 1.2 shows that there are numerous failure modes seen around machinery. Machinery can experience bathtub, wear-out, fatigue, break-in period, random, and infant mortality type failure distributions. It is also possible to have a combination of these distributions.
When reviewing the failure distributions of a given machine, the question to ask is: What is the failure distribution associated with the failure mode of interest? Knowing the failure distribution is key to selecting the best maintenance mitigation strategy for each failure mode. The failure distribution of a failure mode tells you if the probability of failure is time related or not. If the failure mode is time related, then you might have a time-based activity aimed at servicing to replacing that component. If the failure mode is not time dependent, then a time dependent activity may not make sense. Your overall maintenance strategy is an accumulation of the best maintenance strategy for each historical failure mode.
Let’s go back to our centrifugal pump example. Since the mechanical seal failure mode is the dominant failure mode, we must determine what its failure distribution looks like. Let’s say that after studying the data, you conclude the failure distribution is random, i.e., not time dependent. (A technique called a Weibull analysis is commonly used to identify the failure distribution. Discussion of this analysis method is outside the scope of our discussion. To learn more about Weibull distributions and other reliability tools, visit http://www.barringer1.com/Contents.shtml.) It doesn’t make sense to attempt to manage random failure distributions using time-based activities since they are not time related. One way to address machines with random failure distributions is to incorporate and install spare units to allow operators to swap pump in the event of an unannounced failure.
Here are some common strategies for the various failure distributions (see Figure 1.2):
1 Bathtub failure distributions: a) incorporate detailed start-up procedures to maximize the probability of a successful start-up, b) build in spare capacity to allow for random failures, c) incorporate a condition monitoring program that can detect early end-of-life failures.
2 Wear-out failure distributions: Use time-based inspections and replacements to minimize the risk of end-of-life failure.
3 Fatigue failure distributions: Use time-based replacements to minimize the risk of failure.
4 Initial break-in period failure distributions: Incorporate detailed break-in procedure to maximize the probability of a successful start-up and maximize the probability of successful start-ups.
5 Random failure distributions: a) build in spare capacity to allow for unannounced failures, b) incorporate a condition monitoring program that can detect early failures.
6 Infant mortality failure distributions: a) incorporate detailed start-up procedures to maximize the probability of a successful start-up, b) develop detailed repair procedures to minimize the probability of premature failures.
Detectable versus Undetectable Defects
A key concept in preventing potential machine failures is understanding whether an internal machinery defect is detectable or undetectable.
In Figure 1.3, the iceberg analogy is used to represent possible internal issues that might be found in rotating machinery. Notice that some failure modes represent detectable issues, such as high vibration or high temperature. I define detectable issues as those that can be sensed externally and easily monitored. However, there are also hidden issues, such as internal corrosion, erosion, stress corrosion cracking, etc., that can only be assessed through disassemble and inspection. During a machinery assessment, the analyst must tabulate both detectable and hidden failure modes and then determine the best strategy to address them so that surprise failures do not occur.
Figure 1.3 Machinery failure modes can be broken down into those that are readily detectable and those that are hidden from view. Note: failure modes related to high temperatures, i.e., creep, oxidation, etc., are related