Understanding Risk vs. Criticality Can Make All the Difference in Rail

Posted by Jason Kasper on Wed, Apr 27, 2016 @ 10:34 AM

Healthy assets are the foundation of a healthy business. And how can organization make sure their assets are healthy? There are two ways an organization can maintain healthy assets. One is much discussed as of late, using data and the promise of Industrial Internet of Things (IIoT) to create prescriptive analytics to understand the state of asset infrastructure. The other that's just as important is the regular, physical inspection of critical assets through operator rounds. Skipping this step will be detrimental to any organization looking to improve Operational Excellence.

A recent example is DC Metro, the Unites States second busiest transit system. It serves over 700,000 daily riders, including about 75% of the federal workforce. Comprised of 91 stations in Virginia, Maryland, and the District of Columbia. Unfortunately, this 40 year old system has deteriorated from years of deferred maintenance, a common theme among many transit systems in the United States.

The Impact of Failure is Clear

Recently, the entire DC Metro was shut down after a frayed cable caused an electrical fire (see photo). What was unfortunate is that the warning signs were already there, when a similar incident last year resulted in a fatality.


Image: www.metro-magazine.com, a frayed cable discovered when Metro shutdown the rail system

The incident last year resulted in the NTSB issuing an urgent safety recommendation to inspect all power cables in order to ensure the connection assemblies met the design specs and were installed correctly. All indications are this process was not implemented and resulted in another shutdown of the system in its entirety

With the second incident within a year, DC Metro shutdown the entire network to inspect all power cables and connections. This resulted in uncovering 27 problems that need remediation. It is unfortunate that it took a second shutdown to put a plan of action in place. There is a better way.

Use a Criticality vs. Risk Approach to Assets

One of the big issues in transits is the huge backlog of deferred maintenance that appears to be insurmountable to tackle in an efficient and effective way. A potential solution is to understand the criticality and risks each asset represents to the organizations Operational Excellence goals.

Understanding risk of failure to an asset can help rail organizations prepare for an event, understand how to resolve an event, and reduce the potential for it to happen in the first place. Assigning criticality to an asset can allow organizations to understand the failures impact and its relationship to the entire asset infrastructure.

Example: Criticality vs. Risk Matrixjason.jpg

In the case of the DC Metro power cable failures, the risk and criticality would both be ranked high. Therefore, the planning and inspection of this asset would take priority over other deferred maintenance. 

To make an effective Criticality vs. Risk matrix, LNS recommends getting input from key stakeholders including maintenance, engineering, operations, and materials groups. These teams can help identify what asset failures would affect the organization the most from completing its goals.

The Physical Data Source, Operator Rounds

As the NTSB recommended during the original incident at DC Metro, inspections are key. This can also help with tackling the deferred maintenance issue. Once the criticality and risk is associated with assets, operations and maintenance can then rate what maintenance to do when and understand the associated costs.

Using operator rounds effectively will provide a rich source of data to make decisions, and potentially avoid unplanned downtime. Often one fault of this type of inspection process it that it is done using paper-based systems (logbooks, clipboards). In order for it to be the most effective, mobile devices should be used that have easy-to-use interfacing and integrate back into systems that can use the information to take immediate action if necessary.

Including operators in the detection and elimination of problems is an excellent way to facilitate all reliability efforts as it makes reliability a joint effort by both operations and maintenance. Most importantly, it makes a tremendous amount of sense as many basic equipment inspections can greatly benefit from a frequency of once per shift or more. Combine these with routine process inspections being carried out by operators and maintenance can stay focused on critical repairs.

This Isn’t the First, Nor the Last

DC Metro is the latest in a continuing flow of commuter rail organizations. It is essentially overwhelmed with years of deferred maintenance that resulted in a failure, which impacted thousands who expect on time performance.

To tackle the ever mounting deferred maintenance backlog, rail organizations should look to risk vs. criticality analysis working with the critical groups internally. This helps determine the ratings of assets from all angles to ensure success. It can then be used to support effective operations and maintenance planning, including operator rounds that can help round out any real-time data being received and correlate it with physical data from the field.

