Root cause analysis (RCA) is a process for finding the fundamental cause of a problem, issue, or incident. It’s also how you determine corrective actions and outline and implement steps to reduce the risk of future occurrences. If an organization only ever treats the symptoms without addressing their source, you waste time and money reacting to the same failures.   

What is the definition of root cause analysis (RCA)?  

Just like the name implies, it’s the process of finding the original cause for an incident. To fully understand that definition, you need to know the answers to two questions:   

  1. What’s the difference between accidents and incidents? 
  2. What’s the difference between a cause and a symptom?  

Incidents vs accidents 

Strictly speaking, an incident is something that happened, usually as the result of an earlier event. So, someone gets arrested after committing a crime. Or someone gets promoted after boosting uptime and cutting costs in a facility.  

In one case, it’s something positive. In the other, it’s negative. But both are incidents, and this fact is important to remember when looking at RCA. Your root cause can be the reason why something didn’t work but it can also be the reason why something did.  

Accidents, though, are always bad. Separate from questions about who’s to blame or if it was avoidable or not, you don’t want accidents. 

Causes and symptoms  

Causes are the reasons why. Symptoms are the results. The classic example is when you get sick and suffer from: 

  • Headache  
  • Chills  
  • Fever  
  • Cough  
  • Sore throat  

Everything on that list is a symptom, a result of your being sick. But they are not the reason you’re sick. The reason is likely a bacteria or viral infection. Why is the distinction important?  

If you only ever treat symptoms, you always run the risk of getting sick again. Sure, you can use all kinds of medicine to lessen the effects of the symptoms, but because you aren’t attacking the root of the problem, it can come back.  

What are the benefits of root cause analysis?  

RCA helps you reveal the real reasons something happened, which helps you both now and in the future. First, you can more easily fix a problem when you know what’s causing it. For example, if a piece of equipment has a leak, you can patch it, dealing only with the symptoms. But if you dig deeper and discover the leak is the result of a broken internal seal, you can replace the seal, both stopping the leak now and avoiding leaks in the future.   

The second benefit comes from how you can take what you learned with the broken seal and apply it to other equipment. You might decide, after a bit more digging for the root cause, that the team is not inspecting the seals often enough, leading you to add more preventive maintenance inspections and tasks to the schedule. Or it might be the case that your seals are generally low quality, and moving forward, you want to change suppliers. In either case, RCA helps you develop insights into your operations, which leads to better decision-making.     

What are the common strategies and examples of RCA? 

There are various methods, but they’re all connected.  

5-whys method 

The first method for root cause analysis is to ask yourself why something happened. Sounds easy at first, but it can become challenging as you drill down.  

There’s a story about a Ph.D. candite defending his doctoral thesis in astrophysics. Everything was going fine until close to the end when one of the professors ask him what seemed like a simple question: “Why is the sky blue?” 

Every time he answered, he was met with, “Could you be more specific?” And so, he kept digging deeper, until he was at the level of explaining molecular energies, optics, and the inner workings of the human eye. 

RCA works the same way. You keep asking yourself why something happened until you’re about five or so levels down. Five is the general rule of thumb for root cause analysis. In some cases, you only need to dig down twice, while in others, it’s deeper.  

Examples of the 5-whys method for RCA  

So, back to our earlier example of the leaking equipment. How could you use the 5 whys?  

Why is it leaking? The connection between the two pieces is not tight enough to keep the liquid from coming out. 

Why is it not tight enough? The rubber seal is damaged, preventing a tight fit.  

Why is the seal damaged? It was not installed properly. When the techs tightened the pieces together, the threads bit into the seal, damaging it.  

Why did the techs install it improperly? The seals are different than the ones they’re used to working with, and they did not receive new training.  

So, why is there a leak? The root cause of the leak is a lack of proper training on how to install the new seals. 

Change analysis and event analysis  

In the leaking example, we’re drilling down from one connected cause to the next, looking for the root cause. 

For change and event analysis, we’re sorting through different changes, looking for the one that was the root cause. We must decide if each change leading up to the incident was unrelated, correlated, contributing, or the root cause.  

Unrelated means there is no relationship, and the change did not cause the incident. Correlated means there is a relationship, but the change did not cause the incident. How is this possible?  

The classic example of “correlation is not causation” is murder and ice cream. Whenever ice cream sales increase, there is a corresponding increase in the murder rate. For example, a small increase in sales is followed by a small increase in murder. But ice cream is not affecting the crime rate. Instead, there is a lurking variable in the background pushing up both numbers: heat. When the temperature rises, people eat more ice cream. And they have much shorter fuses. 

Contributing means the change helped but wasn’t the only cause.     

Example of change analysis and event analysis for RCA  

The maintenance manager notices an increase in the monthly close-out rate for preventive maintenance inspections and tasks. Hoping to recreate the success, they come up with the following list of recent changes that could explain it:  

  • A new tech started two months ago 
  • The maintenance department switched supplies for some parts and materials  
  • A different tech has been taking care of the PMs while the regular tech is on vacation 

Which change is the root cause? The first one turns out to be unrelated. The new tech was hired specifically for a maintenance project that’s separate from the PM program. Looking at the parts and materials, it’s hard to say they’re affecting close-out rates. They might last longer and cost less, but that wouldn’t make them easier to use. 

But are they easier to find? The manager notices that the packaging is better. The writing is nice and clear, and many of the boxes are color coded. It’s a small change, and likely only saving the techs a few minutes per PM, so the manager decides it’s only a contributing cause.  

That leaves the fact that a different tech had been doing the PMs, which looks great for the tech. But is it the root cause for the better close-out numbers? When the maintenance manager compares the new tech to the old one, they’re very similar, with roughly the same amounts of experience and time with the company. 

Digging deeper and asking between two and 5 whys, the manager finds the root cause. The first tech tends to work on a later shift, which means they’re constantly being called away to deal with on-demand work orders. The second tech usually works the earlier shift, before any equipment has had a chance to break down. They’re able to get more PMs closed out because no one is reprioritizing the work order schedule. 

Now that the manager has found the root cause, they can recreate that success by actively scheduling more of PMs for the earlier shift.  

In this example, the maintenance manager was able to find a root cause they could control. That’s not always the case. There are times when you can easily identify the root cause, but you can’t easily do anything about it. What if the difference between the two techs was that one had a young baby at home and the other one didn’t? Because the baby’s up all night, the tech’s not getting enough sleep, and it’s affecting their performance? RCA can tell us why something happened, but that doesn’t mean it also always tells us the best way to fix it. 

Ishikawa or fishbone diagrams (aka Fishikawa)  

When you’re first brainstorming possible causes, there’s no such thing as a bad idea. But once you have all the ideas out in front of you, it’s time to start organizing them, deciding which are the best.  

Ishikawa diagrams, named after Kaoru Ishikawa, a key figure in Japanese quality management innovations, show the causes leading up to an event. The name fishbone diagram comes from their resemblance to a fish skeleton with the effect at the head.  

By building out the diagram, you can get a better understanding of the causes, their relationships to one another, and their relative contribution to the final effect. From there, you can work on finding ways to either re-enforce or remove them.  

What connects these RCA methods is the need for good data and careful thinking.  

“What you really want people to do is think through in a very logical, evidence-based fashion as to why things happen,” explains Bob Latino, principal at Prelical Solutions, LLC, in an episode of the Asset Champion podcast. 

How can maintenance departments implement root cause analysis?  

Now that you know what it is, it’s time to get it working for you. One of your most important goals needs to be getting accurate information. If you want to prevent a problem from popping up again, you need to know why it happened in the first place. But we know that there are times when getting to the bottom of things can be challenging.  

Make reporting incidents easier  

When something goes wrong, is there a process in place for reporting it? How comfortable are the techs using this process? Does it encourage them to be open and honest?  

There are different ways to approach the situation. For example, you can have a frank discussion with the team, assuring them that you’re more interested in avoiding problems than punishing people. And when you have the chance, go out of your way to show that you mean it. 

You can also look at ways to allow for anonymous reporting. Techs might feel more comfortable if their name is not attached to the report. 

Get standardized maintenance processes with an asset management solution 

Before you can reliably use RCA, the maintenance team needs to be performing inspections and tasks the same way every time. If your processes are not standardized, there’s no way for you to look back for changes. Remember, with change and event analysis, you’re looking for what was different. If the team does the work differently every time, it’s harder to find root causes. 

Modern asset and facility management platforms help you standardize processes with work orders packed with step-by-step instructions and customizable checklists. Now, instead of techs winging it when they must complete an unfamiliar task, they can easily access the department’s best practices.  

And if they have any questions, they can quickly reach out from anywhere by using the software to add comments directly to work orders.     

Capture accurate data 

With paper- and spreadsheet-based methods, there’re too many chances for bad data to creep in. Old-fashioned paperwork makes it hard to create copies, which are then easy to lose. And with spreadsheets, you can make many copies quickly, but you don’t have any way to keep them all connected and up to date.  

Modern maintenance management solutions keep everything in a central database your team can access from any connected device, from desktops to smartphones. And because everyone is working from the same data, it’s always accurate and up to date.  

And that means when you go back and start looking for causes, you know you can trust your data. 

Avatar photo

By

Jonathan writes about asset management, maintenance software, and SaaS solutions in his role as a digital content creator at Eptura. He covers trends across industries, including fleet, manufacturing, healthcare, and hospitality, with a focus on delivering thought leadership with actionable insights. Earlier in his career, he wrote textbooks, edited NPC dialogue for video games, and taught English as a foreign language. He hold a master's degree in journalism.