This paper was written by Jim Gavigan and Scott Larson for the 2016 SMRP annual conference
Scott Larson - Sr. Operational Intelligence Consultant, Logical Systems, LLC
Jim Gavigan - Manufacturing Intelligence Consultant, Logical Systems, LLC
Many companies today are recognizing the importance and opportunity to better utilize operational and non-operational data in their maintenance and reliability programs. This paper will discuss where to begin by exploring the infrastructure and building blocks that need to be in place to move data capture to the next level. It will then move onto exploring the different ways that time series data, both automatically and manually collected, can be used to support and enhance a company’s maintenance and reliability programs by utilizing new technologies such as GIS Systems, along with other data and process analytical methods.
PI Data in a Maintenance Reliability Program
When we look at the Total Asset Management strategy pyramid in Figure 1, many of the components are supported and enhanced with intelligent use of data. Manufacturing data comes from many different sources and formats, complicating the collection process and hindering accessibility to the people who know how to use it. From the viewpoint of a maintenance and reliability program, data can be generally categorized as Operational or Non-Operational. Operational data may be collected in real time through a SCADA, DCS, PLC, data historian system, or it may be recorded manually by operational technicians and entered into spreadsheets or databases manually. Non-Operational data is typically collected through manual inspections performed at scheduled intervals, but may also be automatically collected electronically.
Establishing a path for the data to a maintenance system, providing easy access to the data, powerful analytic tools and providing simple visualization methods into the data and analytic results are all critical components to fully realizing the value that this data can provide.
Discussion Points Found in the Paper are:
- Capturing ALL the data is important (Manual entry, remote assets, etc.)
- How the data is collected is important (i.e. tablets for manual entry save time, reduced human interaction (automation) prevents collection mistakes)
- Easy access to the data once it is stored is important (CMMS for PM)
- Viewing the data correctly is important (Composite health index)
- Having tools to analyze the data and use it to improve reliability is important
- Increasing speed of collection and analysis of data
Where does data come from?
This data can be collected and stored in a number of ways. A SCADA system may historize data that can be utilized, however, it can sometimes be difficult to view and use this data outside the context of the SCADA system itself. There are also many data historians available that perform additional data contextualization, data visualization and data analysis that can provide additional value to a maintenance and reliability program.
The key to realizing the full value from any data historian is to automate the collection and storage of data from the operations equipment such as PLC’s, DCS Systems, and/or RTU’s. Manually collecting this data to then compile and analyze it manually does not provide the real time data access and introduces human error factor into the equation. Manual data collection is still more common than one would think. The data also must get captured with appropriate fidelity so that it is actionable. For instance, collecting data that changes rapidly at a one-minute interval may not provide the insight into what is happening in the process. Conversely, collecting data that changes very slowly at a one second interval may provide more data than is needed to do proper analysis.
Taking advantage of the analytics, visualization, and data access tools that many data historian software vendors now offer is a key aspect of utilizing this data in your maintenance and reliability program. There are many tools available to run reports, perform analytics, evaluate, trends and pass this data to external systems, such as a CMMS, an ERP or a Business Intelligence platform. Passing data from data historians to external systems is one of the largest growth opportunity areas in the market today.
Information captured on maintenance inspection records is a good example of non-operational data that can be difficult to collect and integrate into a database. Maintenance inspection data is one form of Non-Operational data that we will discuss in this paper.
Traditionally, maintenance inspections have been performed by a maintenance or inspection technician taking a clipboard with a route checklist and walking an equipment route. The data is collected on paper as the required inspections are performed. Following the inspection, the technician may manually enter the data into a spreadsheet or possibly a database after the route has been completed. More commonly, the paper is placed in a basket and not looked at for several days, weeks or potentially not at all.
Modern data historians offer a means to manually collect data either with handheld devices used during the inspection route or by using the paper method and a user data entry interface at a central location to enter the inspection data for the defined route and data points. This allows the user to visualize the non-operational inspection data in context of the real-time data from the data historian.
Manual Data Collection
At a previous company, I was a team member on a project that replaced a paper based inspection process by converting it to a digital program utilizing handheld devices.
We began the conversion by replicating the existing paper-based inspections as tags in a data historian. We assigned each tag a digital state so the inspector would be able to choose a predetermined value from a drop down list on their handheld device. Data entry on the handheld device was designed to be simple and easily used in an industrial environment, which was an important requirement since inspectors were required to wear full PPE, including thick thermal gloves, while performing the inspection. We chose to use drop down boxes for manual entry to make it simple for the technician to choose the proper values and not fight with the data entry technology.
After all of the manual data points were replicated, they were grouped into “routes” within the data historian manual data entry configuration tool. A technician could then follow a pre-determined route, ensuring all points were collected and saving time in the inspection process. While the routes were being
developed, our team also evaluated mobile devices for manual data collection. Maintenance technicians were heavily involved in the selection process which developed a sense of ownership in the program – a key part to ensuring success of the project.
When the program with the handheld collection devices first rolled out, there was some question as to the additional time that performing the inspection with a handheld device would take versus performing the same task with paper. When the first route was completed, the inspector was asked his opinion of the process. He indicated that with the paper method, he could inspect 7 units per hour and was able to perform 6 units per hour the first time using the handheld device.
Working with this inspector, it was found that there were several areas of improvement that could be identified to speed the process up. This included the optimization of the inspection order, and improving the methods used to collect several of the inspection points.
Not long into the program, the company decided that one of the plants with 360 units needed to be fully inspected. Three weeks were allocated for all of the inspectors to complete all of the inspections. With the newly optimized routes and data entry fields, all 360 units were inspected within a week and a half, saving 50% of the allotted time to perform the inspections.
While the time and efficiency gains were welcome, the real value of the program was realized in the analysis process.
When inspections were recorded on paper, the inspectors would give the records to maintenance planners who would sift through and determine what assets needed immediate attention. Alternatively, inspectors would manually go through each page and place a “star” next to areas that needed attention. This process didn’t take into account the production performance of the units, which in turn created a very narrow view of the work that needed to be completed and why it was being performed.
Once the handheld units started being used on inspection routes, the data immediately became available and we were able to calculate inspection scores and overlay the results with operational performance and financial data. Having immediate access to inspection scores also allowed us to combine them with unit production rates to prioritize our repair schedule based on the composite health index for each asset.
An example of how this was applied can be seen in Figure 2.The lower left chart shows a prioritization analysis based purely on inspection scores. Although you cannot read all of the asset numbers on this chart, the first ten (10) assets prioritized for repair would be: 25,16,13,15,41,5,77,10,8, and 30.
The lower right chart in Figure 2 shows asset prioritization based on the composite health index which includes performance and financial data along with the inspection scores. Based on the Composited Health Index, the first ten (10) assets prioritized for repair would be: 13,30,78,44,1,11,28,51,47 and 37.
The change in priority was substantial as only two (2) assets appeared in both lists. By utilizing and compositing key data sources, only assets above a defined rank prioritization would require repair. By executing the revised repair plan, and reallocating funds to the assets ranked highest based on the Composited Health Index, in this use case, a 34% reduction in cost was realized by avoiding unnecessary repairs. It should be noted that any assets that were ranked high by the inspection ranking, but not in the Composited Health Index ranking and were determined to be at risk for a failure were prioritized and addressed along with the assets in the Composited Health Index. Having a complete picture of the asset’s health was critical in determining the proper repairs.
CMMS – Preventative Maintenance
In a Preventative Maintenance program, the traditional method has been to schedule preventive maintenance (PM) tasks based on calendar dates. This method has been the standard for many companies for many years. The problem with this method is that unless you have equipment that is running 24x7X365, you may be over or under maintaining your assets.
When a manufacturer provides recommended maintenance intervals, they are most commonly provided in run hours. For example, a pump may have a set of maintenance tasks to be performed at 50 hours, a broader set at 100 hours, additional tasks at 500 and 1,000 hours and so on. By utilizing actual run hour data, these PM tasks can be generated and scheduled based on the actual run hours rather than a date, increasing the accuracy of the PM’s being performed.
The common element between these two methods is they both use measures of time (days vs hours) to schedule and perform the maintenance tasks. If an asset is scheduled for a three month PM according to manufacturer’s recommendations and the asset has not been utilized for two of the three months, then an organization is incurring maintenance costs that could be avoided.
Many times, a company has the run hours of an asset in its data historian, but this data is not made available to the CMMS system. To maximize the potential value when utilizing operational or non-operational data in a CMMS, creating an interface between these two systems is critical.
There are several methods and solutions for interfacing systems to provide data to a CMMS. The simplest method is to create a one-way path to send data to the CMMS on a regularly scheduled basis. This can be accomplished in a number of technical ways, with the key concept being that data is sent to the CMMS with no acknowledgement of receipt to the data historian.
When configuring data such as run hours to be sent to the CMMS, this one-way solution is the most common and effective method. Other more complex interface methods, such as query and bi-directional solutions are discussed in more details in other sections of this paper.
Condition Based Maintenance (CBM)
When examining the role operational and non-operational data can play in a maintenance and reliability program, one of the largest opportunity areas for this data is in a condition based maintenance program. We will look at a recent example that shows some of the potential opportunities around CBM. Recently, a customer asked us to look for an application to use their data historian in a maintenance function to track the condition of a piece of equipment. In browsing through the HMI system, we noticed that a loadout screw motor (Loadout Screw A on Loadout Station B) looked to be in an overcurrent condition a significant amount of the time that it was running. We took several screenshots for later review. We found that the motor was a 125HP motor and rated for a maximum of 155A and it was pulling over 210A at times during certain times of the loadout sequence. Knowing the process and that the customer ran three different feed stocks, we asked the customer if they were running feed stock from a certain plant during that particular timeframe that we noticed the overcurrent conditions. The customer confirmed that in fact, the feed stock being run in the timeframe in question was from this plant, which we will call “Plant A.” Plant A’s feed stock had caused them maintenance and reliability issues in the past because of its difficulty to convey. This material was much wetter and “stickier” than feed stock from “Plant B,” and it was unknown if “Plant C’s” feed stock was more similar to Plant A or Plant B.
So, we began an investigation to ask the following:
- Do Plant B’s and Plant C’s material exhibit the same effects on the same loadout screw?
- Do the other loadout crews see the same issue of overcurrent with any of the feed stocks and which ones pose the most problems?
- Was other equipment affected the same as the Loadout Screw A, with over current conditions from this feed stock?
- Does Loadout Screw A have a history of premature failure and can we potentially predict a failure?
We first added a few buttons on the HMI screen to allow the operator to tell us whether the feed stock was from Plant A, B, or C (this could also be classified as Non-Operational Data, and we commonly refer to this as “metadata”). We then tracked this variable in the data historian to be able to compare output current of major motors in the mixing and loadout areas when running the feed stocks from the three plants.
In looking at trends, there was a clear distinction that feed stock Plant A and Plant C caused overcurrent conditions on loadout screw A, while Plant B’s feed stock did not. The red flat line on the motor current trend denotes the 155A limit for the motor in the figures below.
Plant A to Plant B Comparison - Figure #3
Plant A to Plant C Comparison - Figure #4
It was found that Plant A’s and Plant C’s feed stock caused overcurrent conditions not only in this particular loadout screw, but also in the four mixer motors. The other loadout screw on loadout B also saw overcurrent conditions, but not as often. Loadout screw A on Station A saw very little issues with overcurrent.
Several conclusions were noted:
- Loadout Screw A on Loadout B does have a history of premature failure and it appears that these two feed stocks from Plants A and C could contribute to the failure
- There could be a potential design flaw with Loadout Screw A on Station B, as it bears the brunt of the extra load caused by Plant A’s and Plant C’s feed stock
- We did prove that if we could understand the maintenance history of these motors – i.e. when they were replaced and/or maintained, we could deliver a notification to proper personnel to maintain or change the motors after a certain amount of time in an overcurrent condition
- Other equipment is affected and the following needs to be investigated:
- Can the process at Plant A or Plant C be changed to minimize the effects seen? In other words, can the material sent be less wet and sticky?
- How much more energy is required to run feed stock from Plant A and Plant C versus Plant B?
- Is there a difference in production rate of the plant between the three feed stocks?
Having operational data available, and even adding some non-operational data, to investigate these types of issues was invaluable and plant personnel at our customer’s site didn’t realize any of this was going on. The education on true running conditions was and is extremely valuable to them. Another
opportunity for this customer would be to drive a work order to the CMMS system based on actual running conditions and/or the projected failure time based on historical data. This would need to be a bi-directional interface so that the application doesn’t generate multiple work orders based on the same et of conditions.
Root Cause Analysis & Root Cause Failure Analysis (RCFA)
Root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. RCA is applied to methodically identify and correct the root causes of events, rather than to simply address the symptomatic result. Focusing correction on root causes has the goal of entirely preventing problem recurrence. Conversely, RCFA (Root Cause Failure Analysis) recognizes that complete prevention of recurrence by one corrective action is not always possible.
Rather than one sharply defined methodology, RCA comprises many different tools, processes, and philosophies. However, several very-broadly defined approaches or "schools" can be identified by their basic approach or field of origin: safety-based, production-based, assembly-based, process-based, failure-based, and systems-based.
Despite the different approaches among the various schools of root cause analysis, all share some common principles and can all benefit by leveraging data in the analysis process.
A leading data historian manufacturer in the market offers a tool that can trigger an event start and stop, and group multiple data points together. A valuable option that this tool provides is the ability to set a root cause time. This will retrieve the data for x number minutes before the start of an event, allowing for examination of the data leading up to the event. We have found this to be a valuable feature in examining the data as part of a RCA or RCFA process. We can often better understand what causes a downtime event, a potential equipment failure, or other like events. We can then use this data
before the event to proactively notify plant personnel of a potential problem by recognizing the conditions that lead to the problem and notifying the proper personnel before the problem occurs again.
Predictive Maintenance (PdM)
PdM uses data with predictive tools and algorithms to predict when an asset will fail, and generate a work order to address the predicted failure prior to an unscheduled downtime event occurrence. The ultimate goal of PdM is to perform maintenance at a planned and scheduled point in time when the maintenance activity is most cost-effective and before the equipment loses performance within a threshold, or experiences an unscheduled outage.
PdM evaluates the condition of equipment by performing periodic or continuous (online) analytics based on operational data. The "predictive" component of predictive maintenance stems from the goal of predicting the future trend of the equipment's condition. This approach uses principles of statistical process control to determine at what point in the future maintenance activities will be appropriate.
Configuring a PdM solution requires a number of more advanced tools. A bi-directional interface that can send and receive data between a CMMS and data historian or analytic package is a good choice in a PdM solution.
For example, a company has several Heat Recovery Steam Generators (HRSG) that take heat from the process and convert it to steam that is then either used for generating electricity or sold to other industries. On occasion, the HRSG will have steam tube leaks that will shut the unit down, resulting in lost steam capacity and in turn reduced power generation or steam sales.
By applying an RCFA process to an outage, utilizing operational data, we were able to calculate that the issue could have been identified seven hours earlier. We then took the process to the next step and created a predictive analysis to predict when future leaks may appear. We looked at the historical trend of the HRSG makeup water for the previous 24 hours and compared it to the previous three hours. If there was more than a 3 sigma change, we notified maintenance, engineering and operations of the possibility of a leak.
In this case we chose not to automatically generate a work order as we would do in a CBM application, as there could have been maintenance being performed on the unit or a known reason that could cause the condition and triggered the notification. So, we simply chose an email notification to the potential of a leak.
There was at least one documented instance the PdM calculation did accurately detect and predict a leak. By providing a seven-hour advanced warning, the maintenance department was able properly mobilize equipment and resources and schedule a planned outage to correct the leak. Operations was able to plan the outage and adjust electrical output and notify downstream customers of the planned outage and bring the unit down in a planned fashion. The repair was able to be made quicker and more efficiently, and the unit was brought back on line much quicker than if it had been an emergency shutdown and repair. Therefore, the cost savings of the planned repair versus emergency repair was much higher than the cost of doing the predictive analytic solution.
As you can see from the example above, adoption of PdM on a broader basis has the potential to result in substantial cost savings and higher system reliability if done strategically across a large asset base.
Operational and non-operational data can support a maintenance and reliability program in many different aspects. We have attempted to capture a few of these areas and provide some examples on how companies can realize additional value from this data in a RCM program.
The maintenance and reliability concepts that were discussed in this paper are not new. What we have hopefully provided was a way to improve and enhance these concepts by utilizing data that is most likely currently being captured in your organization.
We did not touch on several key areas that have been around for many years but are becoming more common in organizations, such as vibration analysis, infrared thermography, ultrasonic analysis and oil analysis.
As we move forward into the world of big data, look for more opportunities to leverage data in your maintenance and reliability area.
Keywords: Asset Reliability; CMMS; Condition Monitoring; Data Analysis; Equipment Reliability; Failure Analysis; Metrics; PdM; PM; Preventive Maintenance; Proactive Maintenance; RCA; RCFA; RCM; Reliability; Work Orders