Reliability-centered maintenance (RCM) is the ongoing, systematic process of matching critical systems with the most cost-effective maintenance strategy to maximize overall reliability. 

There’s no such thing as a one-size-fits-all solution in maintenance management, and unfortunately, it’s because there are so many ways to fail. In fact, every asset can have its own special ways of failing, reasons behind those failures, consequences to the failures, which means you need specific strategies for predicting and avoiding them.  

RCM is all about finding what works best and then getting it working for you. 

What is reliability-centered maintenance (RCM)?  

Reliability-centered maintenance is the process of finding the best possible maintenance strategy for every asset in your organization. The guiding principle is that different assets require different styles of maintenance management. Some demand continuous high-tech monitoring, while others are best left to the run-to-failure model. For a lot of your assets, your best bet is preventive maintenance.    

Remember, run-to-failure often has a bad name in maintenance, but there are times when it’s your best choice. The classic example is light bulbs, which almost always have the lowest level of criticality. They are cheap to buy and carry in inventory. When they fail, there’s little to no safety risk and you’re not running the risk of lowered productivity. And even the most inexperienced tech can replace them. So, you simply run them until they burn out. Then you replace them.  

The process of finding the best strategy begins with looking at your history of breakdowns and the steps you’ve been taking to maintain and repair your assets. From there, you choose the best maintenance strategy. The goal of reliability centered maintenance is achieving consistently high levels of reliability at the lowest possible costs. The non-technical expression is “getting the most bang for your buck.”  

What are the reliability-centered maintenance principles?  

Reliability-centered maintenance started in the aviation industry, which is unsurprising given all the parts and components that make up aviation equipment, their heavy use, and the risks and potentially catastrophic consequences of aviation equipment failures. Over time, organizations across industries have implemented the minimum criteria set out for RCM methods in technical standard SAE JA1011 — Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes.  

What are the reliability-centered maintenance principles? They’re a set of 7 questions.  

1. What is the asset or equipment supposed to do, and what are the associated performance standards? 

Here, you’re trying to identify the system or equipment maintenance functions. In other words, you need to know how the equipment performs and its ability to meet company needs within the parameters of environmental safety and government standards. You can find this information in the manufacturer documentation. You want to know the scope of the functions as well as their limitations and methods of use relating to safety and environmental measures. 

For example, an industrial scale may have a weight limit. As soon as you exceed it, that scale starts becoming inaccurate or stops functioning. The documentation also explains how to use the asset to ensure both safety and accuracy. There could be instructions on how to place or handle the items you want to weigh and where to keep the scale. 

2. What does this asset do, how much of that is it doing currently, and how much of that would I like it to be doing?  

For example, you have a conveyor belt that moves boxes. Currently, it’s moving 5000 boxes between breakdowns, and each of those breakdowns lasts about three hours. Based on a combination of what the belt’s manufacturer says, what your maintenance team says, and data in your asset management software, you think you can get that number up to 7000 boxes between breakdowns. You can also reduce each breakdown from three to two hours. 

3. In what ways can equipment fail to provide the required functions?  

Simply put, this means being able to identify failure modes in a piece of equipment. In other words, it involves determining the nature of the equipment failure. For example, does the failure relate to one part or is it a systemic failure? The key is to identify exactly how a piece of equipment has failed, how often, and if it involves the same equipment part. In companies with several pieces of the same types of equipment, it is important to determine if a particular failure is occurring systematically on all pieces or if the failure is limited to only one piece. For example, is the same issue affecting all or most of the vehicles in a fleet? 

4. What are the events that cause each failure?  

Closely related to finding equipment failure modes, you also need to identify the causes of the failures. It’s important to determine why, when, and how equipment failures most typically happen. This is particularly true of heavy-use equipment, which could suffer from operating fatigue. Also, you need to know when equipment is most likely to fail and the nature of the failure.    

For example, you might run a water pump continuously, and at some point, the equipment starts to fatigue from constant use. Another common type of equipment stress leading to failure is exposure to harsh environmental conditions such as heat, cold, or moisture. There is also human error as well as inherent design or manufacturing flaws that cause equipment failure.  

Finding out the cause of the failure is important to understanding how to prevent or minimize it. For example, you can solve part-based problems by switching suppliers, while laws from the factory might mean participating in a recall. 

5. What happens when each failure occurs?  

To improve your operations, you need to do more than just identify equipment failures. You also need to know their effects, which can range from nearly undetectable to complete loss of function. For example, a failing piece of equipment might lead to a decrease in output speed or quality. Or it might smoke, stutter, and seize. In the end, all forms of equipment failure impact productivity, operations, and capital costs. They also lead to unplanned disruptions in production and expensive repairs you wish you had avoided. 

6. In what way does each failure matter?  

Here you’re looking for failure fallout. Apart from the financial and logistic consequences of equipment failure, you need to think about safety risks for operators as well as possible environmental impacts. You also need to consider how a failure affects the integrity and condition of an asset overall. Consider how even something as simple as a flat tire can quickly damage brake lines, rotors, calipers, suspension components, wheels, and the fender.  

7. What systematic task can I do proactively to prevent or diminish the consequence of the failure?  

The answer to this question is hiding inside the asset’s maintenance and repair history. By looking at who did what and when they did it, you can start to see breakdown patterns. Once you have the pattern, you can start to slot in proactive preventive measures between breakdowns. For example, the conveyor belt generally runs fine for about 5000 boxes before requiring some sort of repairs. If you add visual inspections after every 4500 boxes, you have a good chance of stretching out your uptime.  

But be careful. The wording of this question can be a bit misleading. It’s about what you can do, but you also must consider what you should do. There are situations when you should take steps to avoid breakdowns. But there are also situations where it’s going to be better to simply continue to use the run-to-failure maintenance strategy. When the cost and trouble of avoiding breakdown are more than the value of the increased uptime, it makes more sense just to let things run until they fail.  

What should I do if I can’t find a suitable preventive task?  

Here we’re dealing with a specific situation: Although the best maintenance strategy is not run-to-failure, we can’t find a good preventive maintenance plan to apply. Imagine you have an old air conditioning unit in your facility. In fact, it’s so old that you can’t source parts for it anymore. And it runs on a coolant that used to be common but is now being phased out through environmental-protection legislation. You can’t maintain it by refilling the coolant and you can’t repair it by swapping in new parts. 

Because you can’t set up a maintenance strategy, all you can do is have a plan in place for when the unit inevitably dies. That might mean having money already set aside in the budget to buy a replacement. It might mean borrowing a unit from another department’s inventory. There’s no perfect answer, but you want a solution that can be implemented quickly, with the least amount of disruption. Most of all, you want to set up a solution before you need it.       

As you work through the seven questions, you find the best possible maintenance strategy for each asset. But it’s important to remember that answers can change over time. Any given asset can move up or down in criticality, and the costs associated with different maintenance strategies can increase or drop due to many factors, both internal and external. For example, buying asset-based sensors might not have made sense five years ago, but drops in prices might now make them a more attractive option.   

What are the differences between risk-based maintenance and reliability-centered maintenance? 

Now that we’ve firmly established a definition of reliability-centered maintenance, we can quickly clear up any confusion between it and risk-based maintenance management 

With RCM, we’re choosing the best maintenance strategy for each asset. So, with light bulbs, it’s run-to-failure, but for a forklift, it’s more likely preventive maintenance.  

With risk-based maintenance, we start with some unavoidable truths about maintenance:  

  • There are always more things to do than time to do them.  
  • We have more work than workers. “Short-handed” is often the default setting for the operations and maintenance departments.  
  • No matter how generous, the maintenance budget has its limits.  

That means we must prioritize which assets get our time and attention. We can’t do it all. Risk based maintenance is a process of deciding how we use our limited resources by:  

  1. Establishing criticality for each asset  
  2. Developing a risk-based maintenance program  
  3. Planning maintenance based on risk reduction  
  4. Allocating parts and repairs based on risk  
  5. Assets that carry a higher consequence of failure get more attention. Assets with lower criticality get less. There’s a spectrum between the highest and lowest, and each has a corresponding level of maintenance.  

Reliability-centered maintenance implementation  

Organizations need to begin by looking at their assets in terms of criticality. Basically, you should ask, “How bad is it if this asset fails?” Then start to look at other factors, such as costs for maintenance and labor, risk of injury, environmental damage, lost productivity, and compliance-related fines. Once you’ve determined criticality, rank your assets from most to least critical.  

Then, starting from the top of your list, use the seven RCM questions on each asset. Based on the answers, you can determine the best maintenance strategy for each asset. 

Crucially, RCM is an on-going process. Organizations need to periodically revisit earlier decisions, ensuring that their maintenance strategies change as business goals, asset criticality, and failure histories evolve. For example, the best maintenance strategy for an asset early in its useful life is different from the one that’s the ideal fit 15 years later. 

Avatar photo


Jonathan writes about asset management, maintenance software, and SaaS solutions in his role as a digital content creator at Eptura. He covers trends across industries, including fleet, manufacturing, healthcare, and hospitality, with a focus on delivering thought leadership with actionable insights. Earlier in his career, he wrote textbooks, edited NPC dialogue for video games, and taught English as a foreign language. He hold a master's degree in journalism.