« Back To Artificial Intelligence Industry Insights
Association for Advancing Automation Logo

Member Since 1974


Content Filed Under:



Getting Started with AI-Based Predictive Maintenance

POSTED 07/16/2021  | By: Kristin Lewotsky, Contributing Editor

Predictive maintenance is not a new concept. The technology for implementing predictive maintenance has been available for years. The benefits are well-known, particularly given the high cost of downtime in most industries. Despite these points, adoption has lagged, particularly in the end-user community. It hasn’t been simply a question of capital outlay–data capture and logging/transfer capabilities are already built into many components, for example drives with current and temperature monitoring capabilities or HMIs that double as data loggers. Security concerns aside, the sticking point often has been lack of skills for converting data into actionable information and the cost and time associated with either bringing that in-house or dealing with third-party services. AI, and particularly machine learning (ML), provide effective tools for implementing predictive maintenance and saving big. Indeed, according to McKinsey & Company, AI-based predictive maintenance can boost availability by up to 20% while reducing inspection costs by 25% and annual maintenance fees by up to 10%.1

One of the primary challenges of predictive maintenance is combing through massive volumes of data to extract only meaningful, actionable information. Particularly given the rapid growth in the industrial Internet of things (IIoT), organizations can find themselves data rich, yet information poor. ML is an organized methodology for extracting insights that can be used to detect developing defects before they become major problems, determine the remaining usable life (RUL) of even troubled assets, allow repairs to be scheduled during minimally disruptive windows, and conduct a root-cause analysis to prevent similar failures in future.

The Basics of AI

Certain types of predictive maintenance modeling can be easily addressed with fairly simple, even manual, computations. The true value of ML is its ability to take into account large volumes and diverse types of data in the context of complex machine dynamics and real-world operations to arrive at a greater understanding of asset operation and health.

ML is part of a class of applications known as narrow AI. This refers to functions that are written and trained to perform specific tasks. That chatbot you just interfaced with during your online banking session, for example, was probably a narrow AI application set up to present a specific response to a specified set of inputs (and to elevate to a human being in the case of other requests). In the same way, ML can run sensor data through a statistical model to detect conditions defined as corresponding to a developing defect. “Machine learning is not intelligence in the true sense,” says Scott Genzer, data scientist at RapidMiner (Boston, Massachusetts). “It’s really nothing more than the good old fashioned mathematical modeling we’ve been doing for decades. The difference is that we have the computing power to mung massive amounts of data to find patterns, to find signal in a lot of noise that we used to do by hand.”

ML solutions are already in broad deployment for use cases like fraud alerts and predictive maintenance. In contrast, general AI, which encompasses the types of sentient machines that are the mainstays of pop culture, is enormously complex and will most likely remain a laboratory curiosity for some time to come.

In supervised machine learning, algorithms operate on a manually labeled training data set to create a model. This model can be used with production data to return results or predictions.In ML, one or more algorithms operate on a set of training data intended to describe factors like asset condition and performance, failures, maintenance processes, environment, records of failure or maintenance, etc. Using this data, the algorithm creates a mathematical model that describes the complex system and its interactions. the data is organized such that there is a dependent “target” variable to be predicted, in this case to describe asset health, remaining lifetime, etc. The goal is that when fresh data is put into the model, the model will return a status, prediction, etc. (see figure 1)

Machine learning can be divided into supervised learning and unsupervised learning. In supervised ML, a function (model) is trained to act on new input in a defined manner using large amounts of manually categorized data. Supervised learning for predictive maintenance is commonly addressed using classification or regression. In classification, discrete input maps to discrete output; with enough of the right kind of data, the model can classify an asset as healthy or not healthy, for example, or a product as acceptable or not acceptable (see figure 2). Output is not always binary; the model may return a range of possible outcomes. Regressions take quasi-continuous input, like time series temperature or vibration data, and returns a continuous output value in the form of a trend that can be used to predict future function. While classification might be used to determine whether an asset has a defect that can lead to unscheduled downtime, regression would draw on historic behavior plus current data to predict the remaining useful lifetime of the asset and estimated time to failure.

Supervised machine learning is generally used for two tasks: classification (left) and regression (right). In classification problems, the algorithms take input data to produce discrete output data (e.g., healthy asset or not healthy asset?) In regression problems, the model takes input data and produces a continuous output value that can be used, for example, for predictions (how soon will this asset fail?)In unsupervised ML, the algorithms work with unlabeled data to seek out patterns through clustering (what information belongs together? (see figure 3) and correlations (what events happen together?) Consider a bottling line. A supervised machine-learning classifier might detect an elevated temperature on a motor and based on the model, send an alert to maintenance to investigate it as a potential developing defect. An unsupervised ML model might discover that the temperature always rises on this motor when the machine is packaging a more viscous liquid during August, so maybe there is no developing defect after all. Unsupervised ML can uncover unexpected patterns that lead to valuable insights. For example, maybe the temperature doesn’t rise as much on days when Joe is running the machine. Now, the company not only avoids unnecessary replacement of a healthy asset but can review Joe’s techniques to potentially discover a way to improve machine operations across shifts, production lines, and even facilities.

Both approaches can be useful for predictive maintenance. Sometimes one can be used to inform the other, such as when correlations uncovered by an unsupervised learning model are used to update the supervised learning model.

Begin with the Business Case

One of the biggest mistakes with ML in predictive maintenance is diving directly into data gathering and model building. For success, projects need to start with the business understanding. What is the pain point in terms of costs and business impact? How is it addressed today and how would ML improve on that? What are the objectives, e.g., decrease unscheduled downtime, optimize product quality, boost throughput, etc.? Clearly articulating the goal is the first step to achieving it. “AI is a tool, not an outcome,” says Saar Yoskovitz, CEO of Augury (Haifa, Israel). “Focus on the use cases that you want to address and decide what is the best technology stack you need to use to do that.”

Be sure to quantify objectives–it’s important to not only define success but to also understand the cost basis to evaluate return on investment (ROI). Finally, don’t forget the human factors. Starting off with simple projects likely to provide rapid benefit will streamline approvals. “The computer side is not difficult,” says Genzer. “The hard part is getting the engineers to buy in, getting a manager to say yes, and getting it on the plant floor.”

Understand the Data

In clustering, and unsupervised machine learning approach, the algorithms seek out similarities and dissimilarities in unlabeled data.Once the business case is established, the next step is to gather and investigate the available data. Can the current data be used to answer the questions identified during the business analysis? Is there sufficient detail and context? Here, OT needs to work with IT and the data scientists to clarify how data maps to physical phenomena and make sure that input is coming from across the breadth of the organization. For example, motion control is essential to product quality in additive manufacturing. As a result, ripple and discontinuities showing up in a run can indicate systematic motion problems that can be tracked back to problems with components, even if sensors don’t directly reveal the concern.

Because of the data crunching capabilities of ML algorithms and models, the data can–and should–come from a wide variety of sources, so long as they are relevant. ”Machine learning works best when you have a lot of data,” said Genzer. “That is absolutely the golden rule. Storage is cheap. Start with the big data lake and cull it from there.”

“Data isn’t free, but it is much, much cheaper than down time,” observes Paul Ardis, technology manager for machine learning at GE Research (Niskayuna, New York). “So, if we consider the trade-offs and the decision-making process, it is often worth the time, trouble, and cost to set up a design that polls as much data as possible; even if we don’t necessarily know what it is going to be capturing from a failure-modes perspective.”

One of the challenges of predictive maintenance in discrete automation is that industrial equipment is built to last. That means even older machines have a limited failure history, while new designs, of course, will have little or none. That’s good for operations but can be problematic when the goal is to capture large amounts of data on machine behavior around degradation and failure. Running equipment to failure just to gather data isn’t practical. Fortunately, there are alternatives.

  • Historic records: Digitizing and formatting historic records on machine condition and maintenance can be time and labor intensive but is absolutely essential. Launching an ML project from scratch squanders all of the expertise and insights that the organization has accumulated over time. Moving legacy records into the digital domain should be a major priority from the beginning of the project.
  • Development data: For OEMs wanting to field ML-based predictive maintenance solutions, either as a feature for customers or a way to maintain their equipment “fleets,” data may be available from initial designs. Prototype testing can serve as effective run-to-failure, or at least run-to-degradation, exercises.
  • Surrogate modeling: Development data can be useful but limited, particularly in its ability to generalize conclusions to other assets. The solution is surrogate modeling, a data-driven process for streamlining computer simulation. Conventional computer simulations are expensive and time-consuming because they require extensive training to closely approximate real-world performance. In surrogate modeling, the process works back from simulation results from a baseline model to generate the “surrogate” model, which is then trained by running a full computation for only a limited set of conditions rather than comprehensively. “The intent is to come up with an effective model that can be developed at reasonable speed for whatever processing is needed says Ardis.
  • Transfer learning: Transfer learning builds a training set for new assets by finding a way to modify or map data from similar assets, even like assets within a particular manufacturing batch. It still requires a small amount of data to understand the actual transfer mechanism but users no longer need to train a new model from scratch. “We focus in particular on transductive transfer learning where we have some information about the difference in task,” says Ardis. “So we’re not just trying to match distributions but very explicitly looking for the same relationships, the same sorts of connections from a physics-based perspective that we should expect from the original design.”

“We don't need to build and train a model for one specific pump because we've seen over 20,000 pumps before,” says Yoskovitz. “We know what cavitation looks like, what bearing wear looks like. We’ve already built these models using a generic physics-based approach.” Of course, building a standard diagnostic model for a fixed-speed rotating asset is quite different than for a custom machine with complex dynamics. Here, mapping from asset to asset is more complex. “For anomaly detection, we build a baseline for a specific machine because the different recipes change the behavior, the environment from one site to another changes the behavior. We build a model for baseline machine behavior, conceptualized into operations, as well. Then we can detect anomalies if anything goes wrong.”

Timing Matters

Make sure that data acquisition and processing is fast enough to keep up with the industrial process and assets being monitored. This requires evaluating the data and its relationship to the physical entities.

  • Define the forecast window: This refers to the time elapsed between the indicator and failure. In other words, how much lead time is required to not only prevent catastrophic failure but to enable the least disruptive, least expensive repair? A small gearbox with a replacement on-site may only need minutes of notice for a repair. Replacing a custom kilowatt-class motor located on the roof of a factory might require days or even weeks of lead time both to order a new motor and to rent a crane to move it. What data and what ML model will deliver this?
  • Define the target window: Even before failure, machine performance will begin to degrade. The model should account for this. The goal is not just to continue making product or delivering services, the goal is to make salable product and meet service level agreements.
  • Define the feature window: If we are tracking a specific type of data logging an average or an FFT, for example, what is the window covered by that data set?

If a predictive-maintenance solution can’t provide results quickly enough for the user to take action in a timely fashion, then there is not a good business case to pursue it. ”A model is only as good as the data available to it at any point in time,” says Ardis. “The ability to update the model to reflect the latest sense of continuing operation–is it normal or trending abnormal?–is highly limited by the update rate of the data. So it is not so much a question of how much data we are able to store as how quickly we are refreshing the data stream to make sure that we are able to take action.” Conversely, if the timeframe of the activity is slow, an ML-based solution will probably be overengineered (and overly expensive) for the need.

Data Preparation

As with most types of computer modeling, data needs to be prepared before it can be applied to ML. This goes beyond simply cleansing and formatting. Given the sheer volumes involved, the raw data needs to be broken out into subsets of data that satisfy a particular business question or manipulated to deliver new insights. This takes place using a collection of techniques known as feature engineering, essential to the success of any ML project. “It’s actually more important than the modeling,” says Genzer. “You’re trying to find out which sensors and which parts are the most meaningful.”

Feature engineering consists of feature selection and feature generation:

  • Feature selection is the process of narrowing down the field by finding out which columns in the database are the most relevant, which ones are more correlated to the target variable. An installation might generate a thousand or more columns of data but only few of those columns are necessary to identify developing defects.
  • Feature generation is a process of combining columns to create more useful attributes, for example by adding a pair of columns, multiplying them, etc.

Between feature selection and feature generation, it’s possible to create an optimized set of data columns that will have the best possible ability to find meaningful models.

Feature engineering is an essential step but should be approached with care. “The danger that engineers almost instinctively get into is they think they know which columns are going to drive the predictive class,” says Genzer. “This biases the model. You really want the computer to find what it thinks the columns are and have an open mind.  You need to apply a little bit of judgment because you don’t want it to find things that are ridiculous, but we don’t want to limit the computer’s ability to find signal that you may not know exists.”


The whole point of ML is to use data and algorithms to develop a model that describes the physical system in operation and can be used on new data to deliver actionable insights in an ongoing fashion. A detailed discussion of ML algorithms and models for predictive maintenance is outside the scope of this article. Instead, we can focus on some big picture points. Let’s start with one of the most common user mistakes, which is to approach a project with a preconceived notion of which model to use before the data is ever captured. This gets the process backward. The key is to start from the data, from the business problem, and then find which algorithms best satisfy the criteria.

Another point to bear in mind is that building a model is not a case of one and done. Often, projects result in multiple models which then need to be evaluated to determine which model or models most effectively characterize the system.

“A lot of what we do in terms of machine learning in AI is explore a vast array of potential models simultaneously,” says Ardis. “There isn’t a simple answer to, ‘What are the models?’ because the answer is, “Everything that we can get our hands on.’ In addition, we look at dynamism of the ability to handle them together.” It’s not necessarily effective to train five models, for example, determine some relative effective weighting, then set that up as a constant model that will be used for the foreseeable future. A more dynamic approach might provide a better solution. “Can we have a setup that is actually selecting dynamically which models to include, which to exclude, and how to rebalance and utilize them best based upon their performance in terms of providing information over time that tracks with the real ground truth?” he asks. “”

Part of the challenge lies in the fact that machines are constantly changing over time; “normal” operation will look different today than it will next year. The training data needs to be broad enough to show some of that progression. Model validation also needs to test whether the model is resilient enough or whether it is overfitting to the initial data set. “The challenge is in getting someone to understand that sometimes it is better to have a model whose training performance is 10% lower but validates to understand the whole general space much more effectively,” says Ardis.

Particularly for engineers and data scientists, it’s easy to get caught up in the hunt, trying to optimize the model to increase the figure of merit by a fraction of a percent. This type of improvement may be conceptually satisfying but in the business context, the more relevant question is whether the change saves money or increases profitability through higher throughput or product quality. “I think it’s a mistake to optimize the models based on recall or precision,” says Genzer. “It’s better to optimize based on profit or loss. Build the model such that either the savings is optimized or the profit is maximized.”


ML-based predictive maintenance applications can be deployed in a variety of ways. They are most commonly deployed in cloud-based applications accessed via web server or run as dedicated appliances on the shop floor.

Deployment doesn’t mean that the job is over, however. Anybody who has ever had a carefully curated Pandora channel go from playing John Lee Hooker to Justin Bieber while they are not paying attention understands the issue of model drift, or concept drift. Best practices call for deploying the chosen model(s), then continuously building so-called challenger models based on new training sets. The challenger models should mimic the model in deployment. If they don’t, it could mean that there is a problem with the deployed model that requires a rebuild. It could mean that the initial conditions may have changed, either as the (still healthy) machine changes, environmental conditions change, or operating requirements evolve. Once again, the situation requires a value judgment: Is there an actual problem with the model that needs to be corrected or is the model different but the assets are still turning out quality parts at the desired throughput?


ML is a powerful tool for implementing a predictive maintenance program on the shop floor. Starting with the business case and taking the time to develop a deep understanding of the data will help ensure quality results. In general, today’s ML tools for predictive maintenance are designed to streamline the process of extracting insights from data to improve business operations. “Machine learning isn’t magic and shouldn’t be treated as such,” says Ardis.

“You don’t need to be a data scientist to realize savings in your plant with [ML predictive maintenance tools],” says Genzer. “It is not rocket science. It’s math and anyone can learn it.”


1. McKinsey & Company, “Smartening up with Artificial Intelligence (AI) - What’s in it for Germany and its Industrial Sector?”