Can we rely on Wikipedia to help us the Root Cause Analysis?
Andrew Morton, Senior Tutor at Kelvin TOP-SET, explores the definition of Root Cause Analysis.
The definition of Root Cause Analysis below is taken directly from Wikipedia. It is worth considering, because an appreciation of the errors made in the definition should strengthen our understanding of the fundamentals of RCA. What is remarkable about this definition is how badly wrong it seems to be.
Root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. A factor is considered a root cause if removal thereof from the problem-fault-sequence prevents the final undesirable event from recurring; whereas a causal factor is one that affects an event's outcome, but is not a root cause. Though removing a causal factor can benefit an outcome, it does not prevent its recurrence within certainty.
For example, imagine an investigation into a machine that stopped because it overloaded and the fuse blew. Investigation shows that the machine overloaded because it had a bearing that wasn't being sufficiently lubricated. The investigation proceeds further and finds that the automatic lubrication mechanism had a pump which was not pumping sufficiently, hence the lack of lubrication. Investigation of the pump shows that it has a worn shaft. Investigation of why the shaft was worn discovers that there isn't an adequate mechanism to prevent metal scrap getting into the pump. This enabled scrap to get into the pump, and damage it. The root cause of the problem is therefore that metal scrap can contaminate the lubrication system. Fixing this problem ought to prevent the whole sequence of events recurring. Compare this with an investigation that does not find the root cause: replacing the fuse, the bearing, or the lubrication pump will probably allow the machine to go back into operation for a while. But there is a risk that the problem will simply recur, until the root cause is dealt with.
We would disagree with this definition in a number of significant ways:
1. Fixing the Root Cause will not necessarily prevent the same incident happening again. This is because the two events - the incident and its Root Cause are too far removed from one another for there to be a certainty that fixing the Root Cause will prevent recurrence of that incident. Why bother fixing the Root Cause then? Simply because it is likely to improve the whole workplace environment and reduce the likelihood of many other related incidents occurring. Its impact will be wide but more diffuse. On the other hand, fixing an Immediate Cause should almost certainly prevent that particular event recurring, but is unlikely to have a wider impact.
2. We don't agree with the definition of a 'causal factor'. In fact we don't use the term because we feel there's no need for it. Root Causes are described above as not being 'causal'. We would disagree, and assert that Immediate, Underlying and Root causes are all 'causal'. After all they are 'CAUSES'.
3. If causal factors are defined only as Immediate and Underlying Causes, as they are above, then we would claim that fixing a causal factor would be much more likely to prevent recurrence, and not less likely. So, our view is totally opposite to that described in the first paragraph.
The Root Cause is described as 'Metal scrap contaminated the lubrication system'. This may well be the deepest technical failure, but it certainly isn't remotely close to the Root Cause of the failure. In TOP-SET, we would describe this as a technical Root Cause which we would not allow. The reason for this is very simple and cannot be refuted:
To 'fix' the problem we need to know why metal got into the lubrication system; we can't leave the machine to repair itself! We are forced inevitably to ask another question:
"Why did metal scrap get into the lubrication system?"
The 'metal in the lubrication system' now becomes an Underlying Cause which is being questioned further to reach a Root Cause.
Was there a fault in the filter? If so, what was it and why did it occur?
Was the maintenance schedule skipped? If so, why was that?
Did the engineer forget to check the bearing and filters? If so, why was that?
There are many possible explanations, but all lead inevitably to questions about human activity.
There's no debate about that - it's a hard fact.
In this, we are not trying to apportion blame. Someone does not need to 'be in trouble', but we do at least bring it to the attention of the people involved, because it is very likely they will have to undertake the remedial work. Another considerable advantage in taking it further than 'scrap metal in the system' is that we are looking at wider issues here: if the maintenance schedule is at fault, or the engineer is taking short-cuts, then we want to know, because this may affect the maintenance of other machines in the factory. Likewise, if the QC system employed by the manufacturer is defective, then other manufacturing errors could be made. Is there a design fault? If so, do other similar machines have the same flaw?
The Forth Road Bridge was recently closed because of a crack in a girder. Is that the Root Cause? Of course not. The Root Cause lies in the design and manufacture of the bridge 50 years ago, when load bearing predictions were wildly out. Are we going to 'blame' the designers and engineers? Well, yes, but obviously they should not be punished or ‘be in trouble’. They did the best they could at the time, given the information they had to hand. Would you like to predict road use over the new Forth Road Bridge in 50 years' time? And what about the prediction of freak weather conditions over the next 50 years?
Hopefully by taking the time to consider how the definition of Root Cause Analysis provided by Wikipedia can be improved, we can all strengthen our understanding of the fundamentals of RCA.
It is also important to remember that before carrying out any Root Cause Analysis the incident must be thoroughly investigated, all available information should be gathered and all avenues explored.