we must evaluate the cause-and-effect relationships between adverse machine conditions and the various failure modes. We need to ask ourselves: Are these cause-and-effect relationships understood and repeatable? If they are, then we can begin to design proper safeguards to avoid future failures as a way of improving availability. For example, if we know that a low pump flow causes high vibration due to internal recirculation, which in turn causes a mechanical seal failure, then we may need to consider an automatic flow spillback valve to preclude low flow conditions. Keep in mind that every safeguard must be cost justified, i.e., the cost of each safeguard must return value in the form of reduced risk associated with the improvement.
Reliable operation requires: 1) proper physical safeguards, such as surge control, temperatures and vibration monitors, load monitors, speed sensors, etc., to prevent and detect unwanted operating conditions, 2) well-written operating procedures, and 3) judicious design improvements. We must be ever vigilant if we wish to keep our rotating machinery out of harm’s way and operating reliably by avoiding undesirable operating conditions at all stages of a machine’s lifetime. Before writing compressor operating and monitoring procedures, you need to clearly understand your machinery’s operating limits. If in doubt, talk to the original equipment manufacturer about any concerns you may have. Always try to be conservative when setting operating limits. If you faithfully protect your process machinery with effective safeguards, you will be rewarded with many years of safe and reliable service.
2
Useful Analysis Tools for Tracking Machinery Reliability
By Robert X. Perez
Figure 2.1 Machinery reliability metrics are essential to ensure that a site’s reliability efforts are effective.
Reliable machinery is essential to realizing efficient and profitable plant operations. Most sites achieve world-class reliability through steady and continuous improvements in design, repair, and operating procedures. If we want to continue to improve the mechanical reliability of our machinery, we need to adopt and then maintain metrics and reporting tools that will help track and analyze how your equipment is doing with respect to reliability (Figure 2.1). Carefully selected machinery reliability metrics and reports can:
1 Assist in identifying problem areas or applications.
2 Assist in identifying significant changes in reliability performance.
3 Document the impact of design improvements.
4 Document the impact of reliability programs.
5 Provide visual trend of machinery failures.
6 Allow us to compare the reliability performance of different sites and benchmark your overall performance with industry data.
7 Assist users in making economic evaluations of design modifications.
8 Assist in evaluating maintenance department productivity.
9 Assist in maintenance budgeting and planning.
When we talk about machinery reliability tracking, we typically break down the discussion into spared machinery and unspared, or critical machinery categories. Spared machines are less critical machine installations that have installed backup units (see Figure 2.2) to ensure the process is not interrupted in the event of one machine failure. On the other hand, critical machines are unspared machines that are essential to the operation of the process. If critical machines fail or shut down for an extended period of time, the entire process has to shut down, resulting in significant economic losses.
Figure 2.2 Main and spare pump arrangement is used to ensure reliability pumping service. If one pump fails, there is a standby pump ready for service.
Because these machinery categories expose the plant to different types of risks, their reliability performance is measured and reported differently:
1 1. Spared machines usually make up 90 to 95% of the plant population, which means they tend to consume most of the day-to-day maintenance resources. Therefore, the primary aim of tracking the reliability performance of spared machinery is to control maintenance resources, i.e., costs. When dealing with these large machine populations, we use reliability metrics such as MTBF, MTTR, and failure trends, bad actor lists, and planned maintenance percentage.
2 3. Critical machines make up a small percentage (less than 10%) of the population but tend to have a huge impact on the plant’s operational reliability. Instead of maintenance costs, management is more concerned with production losses, environmental, releases, safety events, etc., which can be an order of magnitude larger in consequence than repair costs. Therefore, when dealing with critical machines, we tend to use metrics such as availability, trends of process outages, cumulative downtimes (hrs.), production loss reports, Pareto downtime causes (machinery, exchangers, controls, etc.), Pareto of root cause, etc.
Figure 2.3 The ultimate goal of machinery reliability metrics is to provide reliability professionals a way to reduce reliability data so that it can be presented in an easy-to-understand format.
There is an endless number of machinery reliability metrics and reporting methods that can be maintained and reviewed (see Figure 2.3). However, in this brief survey, I will present tracking methods that I have used and believe have merit. The best metrics and reporting methods are those that are frequently used and have been proven to be useful in understanding the state of reliability at your site.
Commonly Used Metrics for Spared Machinery:
Mean Time to Repair (MTTR)
MTTR, also referred to as maintainability, measures the ability of a maintenance organization to restore equipment that has failed to a serviceable condition. Using MTTR, you can determine the average time it takes your maintenance staff to prepare, mobilize, and repair a machine that has failed and then get it back into service. MTTR is calculated as follows:
You can use this metric to find your site’s current MTTR. If your current average repair time is unacceptable, you may need to look for ways to expedite the machine’s restoration time. Reducing your MTTR can help decrease production losses resulting from maintenance downtime.
Mean Time Between Failure (MTBF)
MTBF forecasts the average time between one machine failure to the next under normal operating conditions. In other words, this metric can be used to predict the average life expectancy for a piece of equipment. To calculate MTBF, use the following formula: