Peter M. Curtis

Maintaining Mission Critical Systems in a 24/7 Environment


Скачать книгу

performed in the system acceptance test procedures used during the acceptance phase and use the original data for trending any changes in the system. The Reliability Assurance Testing should be performed after the vendor has provided Preventive Maintenance (PM). The reason we perform these tests after the vendor preventive maintenance routine is that the vendor just interacted with a commissioned system and disassembled some portions. In some cases, they are providing updated software or control boards. The system now needs to be certified through Reliability Assurance Testing to be worthy of critical load. Remember that the vendor provided PM does not measure performance or track system degradation, so without a Reliability Assurance Testing program, the quality control process had been compromised.

      Before the facility goes on‐line, it is crucial to resolve all potential equipment problems (Technology, Operations, etc.). This is the construction team’s sole opportunity to integrate and commission all the systems, due to the facility’s 24/7 mission critical status. At this point in the project, all systems installed were tested at the factory and witnessed by a competent Commissioning Authority (CxA) familiar with the equipment processes and procedures.

      Once the equipment is delivered, set in place, and wired, it is time for the second phase of certified testing and integration. The importance of this phase is to verify and certify that all components work together and to fine‐tune, calibrate, and integrate the systems. There is a tremendous amount of preparation in this phase. The facilities engineer must work with the factory, field engineers, and independent test consultants to coordinate testing and calibration. Critical circuit breakers must be tested and calibrated prior to placing any critical electrical load on them. When all the tests are completed, the facilities engineer must compile the certified test reports, which will establish a benchmark for all future testing. The last phase is to train the staff on each major piece of equipment and prepare for the transition to operations.

      Many decisions regarding how and when to service a facility’s mission critical electrical/mechanical equipment are going to be subjective. The objective is easy: a high level of safety and reliability from the equipment, components, and systems. But discovering the most cost‐effective and practical methods required to accomplish this can be challenging. Network with colleagues, consult knowledgeable sources and review industry and professional standards and best practices before choosing the approach best suited to your maintenance goals. Also, keep in mind that the individuals performing the testing and service should have the best training and experience available. You depend on their conscientiousness and decision‐making ability to avoid potential problems with perhaps the most crucial equipment in your building. Most importantly, learn from your experiences and those of others. Maintenance programs should be continuously improving. If a scheduled procedure has not previously identified a problem, consider adjusting the schedule respectively. Examine your maintenance programs on a regular basis and make appropriate adjustments to improve constantly.

      The importance of taking every opportunity to perform preventive maintenance thoroughly and completely ‐ especially in mission critical facilities‐cannot be stressed enough. If not, the next opportunity will come at a much higher price: downtime, lost business, lost potential clients, and not to mention the safety issues that arise when technicians rush to fix a maintenance problem. So, do it correctly ahead of time and avoid shortcuts because it will be very difficult to do it again.

      The mission critical industry’s focus on physical infrastructure enhancements descends from the early stages of the trade when all efforts were placed solely in design and construction techniques to enhance mission critical equipment.

      Twenty‐five years ago, the technology supporting mission critical loads was simple. There was little sophistication in the electrical load profile; at that time, the industry was in its infancy. Over time the data centers have grown from a few mainframes supporting minimal software applications to server farms that can occupy 100,000 ft2 or more – with Google and Microsoft being prime examples.

Snapshot of SmartWALK dashboard.

      Figure 1.2 Typical screenshot of SmartWALK™ dashboard

      (Courtesy of PMC Group One, LLC.)

The Issues: Employee Turnover, Retirement, Sick Leave or Vacation
Was knowledge lost?
Where is existing documentation?
How are new employees trained?
What risks are faced during the transition?
The Issue: Traditional documentation systems are inconsistent, inaccessible, and unstructured.
How is information shared?
Is system data readily available?
Where is the documentation?
How are revisions approved and made available to all users?
The Threats: Fires, Natural Disasters, Blackouts,