Intelligent Failures
Intelligent failures occur when answers are not known in advance because this exact situation hasn’t been encountered before and experimentation is necessary in these cases. For example testing a prototype, designing a new type of machinery or operating a machine in different operating condition. In these settings, “trial and error” is the common term used for the kind of experimentation needed. These type of failures can be considered “good,” because they provide valuable insight and new knowledge that can help an organization to learn from past mistakes for its future growth. The lesson here is clear: If something works, do more of it. If it doesn’t, go back to the drawing board
Building a Learning Culture
Leaders can create and reinforce a culture that makes people feel comfortable for surfacing and learning from failures to avoid blame game. When things go wrong, they should insist to find out what happened – rather than “who did it.” This requires consistently reporting failures, small, and large; systematically analyzing them; and proactively taking steps to avoid reoccurrence.
Most organizations engage in all three kinds of work discussed above – routine, complex, and intelligent. Leaders must ensure that the right approach to learning from failure is applied in each of them. All organizations learn from failure through following essential activities: detection, analysis, learning, and sharing.
Detecting Failure
Spotting big, painful, expensive failures are easy. But failure that are hidden are hidden as long as it’s unlikely to cause immediate or obvious harm. The goal should be to surface it early, before it can create disaster when accompanied by other lapses in the system. High‐reliability‐organization (HRO) helps prevent catastrophic failures in complex systems like nuclear power plants, aircraft through early detection.
In a big petrochemical plant, the top management is religiously interested to tracks each plant for anything even slightly out of the ordinary, immediately investigates whatever turns up, and informs all its other plants of any anomalies. But many a time, these methods are not widely employed because senior executives – remain reluctant to convey bad news to bosses and colleagues.
Analyzing Failure
Most people avoid analyzing the failure altogether because many a time it is emotionally unpleasant and can chip away at our self‐esteem. Another reason is that analyzing organizational failures requires inquiry and openness, patience, and a tolerance for causal ambiguity. Hence, managers should be rewarded for thoughtful reflection. That is why the right culture can percolate in the organization.
Once a failure has been detected, it’s essential to find out the root causes not just relying on the obvious and superficial reasons. This requires the discipline to use sophisticated analysis to ensure that the right lessons are learned and the right remedies are employed. Engineers need to see that their organizations don’t just move on after a failure but stop to dig in and discover the wisdom contained in it.
A team of leading physicists, engineers, aviation experts, naval leaders, and even astronauts devoted months to an analysis of the Columbia disaster. They conclusively established not only the first‐order cause – a piece of foam had hit the shuttle’s leading edge during launch – but also second‐order causes: A rigid hierarchy and schedule‐obsessed culture at NASA made it especially difficult for engineers to speak up about anything but the most rock‐solid concerns.
Motivating people to go beyond first‐order reasons (procedures weren’t followed) to understanding the second‐ and third‐order reasons can be a major challenge. One way to do this is to use interdisciplinary teams with diverse skills and perspectives. Complex failures in particular are the result of multiple events that occurred in different departments or disciplines or at different levels of the organization. Understanding what happened and how to prevent it from happening again requires detailed, team‐based discussion, and analysis.
Here are some common root causes and their corresponding corrective actions:
Design deficiency caused failure → Revisit in‐service loads and environmental effects, modify design appropriately.
Manufacturing defect caused failure → Revisit manufacturing processes (e.g. casting, forging, machining, heat treat, coating, assembly) to ensure design requirements are met.
Material defect caused failure → Implement raw material quality control plan.
Misuse or abuse caused failure → Educate user in proper installation, use, care, and maintenance.
Useful life exceeded → Educate user in proper overhaul/replacement intervals.
There are various methods that failure analysts use – for example, Ishikawa “fishbone” diagrams, failure modes and effects analysis (FMEA), or fault tree analysis (FTA). Methods vary in approach, but all seek to determine the root cause of failure by looking at the characteristics and clues left behind.
Once the root cause of the failure has been determined, it is possible to develop a corrective action plan to prevent recurrence of the same failure mode. Understanding what caused one failure may allow us to improve upon our design process, manufacturing processes, material properties, or actual service conditions. This valuable insight may allow us to foresee and avoid potential problems before they occur in the future.
Share the Lessons
Failure is less painful when you extract the maximum value from it. If you learn from each mistake, large and small, share those lessons, and periodically check that these processes are helping your organization move more efficiently in the right direction, your return on failure will skyrocket. While it’s useful to reflect on individual failures, the real payoff comes when you spread the lessons across the organization. As one executive commented, “You need to build a review cycle where this is fed into a broader conversation.” When the information, ideas, and opportunities for improvement gained from an failure incident are passed on to another, their benefits are magnified. The information on root cause failure analysis should be made available to others in the organization so that they can learn too.
Benefits of Failure Analysis
The best way to get risk‐averse managers and employees to learn to accept higher risks and their associated failures are to educate them on the many positive aspects and benefits of failure. Some of those many benefits include:
Failure tells you what to stop doing – Obviously, failure reveals what doesn’t work, so you can avoid using similar unmodified approaches in the future. And over time, by continually eliminating failure factors, you obviously increase the probability of future success.
Failure is the best teacher – Failure is only valuable if you use it to identify what worked and what didn’t work and to use that information to minimize future failures. In the corporate and engineering worlds, learning from failure starts with failure analysis. This is a process that helps you identify specifically what failed and then to understand the “root causes” of that failure (i.e. critical failure factors). But since failure and success factors are often closely related, the identification of the failure factors will likely aid you in identifying the critical success factors that cause an approach to succeed. The famous auto innovator Henry Ford revealed his understanding of learning from failure in this quote: “The only real mistake is the one from which we learn nothing.”
A failure factor in one area may apply to another area – Failure analysis tells you what failed and why. But the best corporations develop processes that “spread the word” and warn others in your organization about what clearly doesn’t work so that others don’t need to learn the hard way. On the positive side, lessons learned from both