Industry Insights

How Synthetic Data and Digital Twins Are Changing Manufacturing

By Nick Cravotta, A3 Contributing Editor

10/03/2022

7 minutes

Artificial Intelligence has the potential to fundamentally change industrial manufacturing.

AI is being used on factory floors to identify inefficiencies in production lines. It’s effectively predicting when equipment needs maintenance to avoid downtime. AI is being used to spot defects in products.

To do all of this, a model is created—or trained—using data collected from the process the AI is supposed to learn about. For defect identification, the model needs data about parts that are in-spec as well as out-of-spec. In general, the more data available, the more accurate the model.

Simulation allows training to take place faster than real time.

When used in conjunction with AI, the results can be staggering. Pepsi, for instance, uses Microsoft Project Bonsai, a low-code AI development platform, to achieve greater consistency with its Cheetos production line. Infrequent sampling meant a line produced out-of-spec until someone noticed. Now sensors oversee the line almost continuously, leading to high quality products while maximizing throughput. The use of simulation also significantly accelerated AI model development time since a day’s production on a line could be simulated in 30 seconds, 2880 times faster.

Synthetic Data

One problem that can arise with training is a lack of sufficient data. To train an AI to identify defects requires examples of the defect in various situations. “Defects are rare and hard to get pictures of,” says Michael Naber, technical founder at Simerse, which helps customers train AI models using synthetic data. The idea behind Simerse is that when there isn’t enough data to train a model, create it. With just five images, Simerse can generate tens of thousands of synthetic images that can be used for training. In addition, images are automatically labelled with pixel-perfect accuracy, a time-consuming process when it must be done manually (see Figure 1).

Figure 1: A) Synthetic data uses real-world data to create thousands of images that can be used to train AI models. B) Images are automatically labelled with pixel-perfect accuracy, eliminating the need to perform this task manually. (source: Simerse)

“Synthetic data can jumpstart AI training,” says Naber. “You can do 90% of training with synthetic data, then use 10% real-world data to tune the model. This is much more effective than waiting for enough real-world data to become available.” Synthetic data is a powerful tool, especially for applications like a new assembly line that have little, if any, real-world data available. “With synthetic data, you can build a defect identification AI before you even start up the pilot line,” says Naber. “Engineers can use CAD files to create images of in-spec and out-of-spec products. These images can then be used to generate thousands of synthetic images to train the AI before the first product is even assembled.”

Synthetic data can also quickly train an AI model to perform under different operating conditions. “There may be a wide range of environmental variation you have to consider, including lighting, position of the product, camera angle, and so,” says Naber. “On a factory floor, you can control many of these factors. By creating a consistent environment, you can eliminate variation and narrow down the problem.”

Digital Twins

An aspect of simulation is known as a ‘digital twin’. Digital twin technology allows the behavior of an asset or a system to be simulated and generate expected outputs for the corresponding physical asset, or twin. Such a twin can be used to facilitate design, control, process optimization, or predictive maintenance planning.

For instance, with a “digital twin,” simulation can be used to eliminate downtime due to failure by predicting when equipment will require maintenance.

“There’s a lot of confusion about what a digital twin is,” says Abhinav Saxena, Principal Scientist, Machine Learning at GE. Many images describing digital twins often show complex 3D models, perhaps of an entire factory floor. “This gives the impression that we have a complex 3D model of a physical system that can run like the real-world system.”

Automated
With Brian Heater

New podcast episodes on Wednesdays!

WATCH NOW

Saxena believes digital twins are simpler than this. “We create a model to predict a specific thing about a system. The model is defined by what you want to get from it. Say you want to predict how long a bearing is expected to last or if a bearing will last another three years as expected based on design reliability. Manufacturers will specify a predicted lifetime based on an average across a large population of the component. But lifetime depends upon how and where you are using it.”

Saxena continues, “In critical applications, we don’t care about average performance of thousands of components. We need to know how this particular component will perform so we can decide whether to continue using it or perform maintenance before the next operational cycle. A digital twin is a model that is tied to a specific asset that we want to know about, perhaps its performance, efficiency, degradation, or something else. It could be for a single component, a piece of equipment, or for a full system.”

Over the lifetime of the asset, data collected from the asset is added to the digital twin so the twin can learn as the physical asset evolves. In effect, the digital twin represents the specific asset over time.

With complex systems, thousands of variables might impact an asset’s performance and health. With a digital twin, it becomes possible to simulate how the asset performs across these different variables. For example, the simulation could introduce a defect in the asset and collect operational data. “The simulation provides a ‘signature’ of the defect and identifies which variables are important to track. Now we know what to look for—and where to look for it—in the real data to predict if there is a defect.”

Simulation also makes it possible to design effective analytics before an asset is deployed.

“Say you have a pipe sending steam from one end of the factory to other,” says Saxena. “How many sensors should we have and where should we put them? With a digital twin, we can place (simulate) thousands of sensors on the pipe to determine where the ‘failure points’ are and to learn where one or two sensors in real life will be the most effective in observing the majority of failure situations.”

Because the digital twin represents the actual asset, it can also be used to track how the asset is individually degrading. This is the heart of predictive maintenance. Maintenance can be scheduled proactively as required for the asset if a digital twin indicates how it is degrading and predicts performance degradation into the future. This maximizes resources utilization while minimizing downtime. It also eliminates other sources of disruption through cascading failures to other parts of the plant or system.

“Industrial equipment uses specialized parts,” says Saxena. “It might take several months for a part to come in. Alternatively, it is very expensive to maintain an inventory of every possible part you might need. With digital twins, you can generate early warnings, predict the exact time when maintenance would be needed, and order parts in advance. When you understand the regular behavior of a system and predict when something will go wrong, that gives you time to take care of problems before they result in production bottlenecks, delays, and loss of revenue.”

The value proposition varies depending upon the facility, but cost savings can quickly reach into the millions.

Digital twins also make it possible to identify failure points that might not have been identified otherwise. “The digital twin simulates how the asset should perform,” says Naber. “You can track predicted performance with actual performance. When they are different, that means there is a deviance.” You can then find and correct the deviance before it results in a failure.

Figure 2: Synthetic data will provide an increasingly higher percentage of the data that is used for AI training. (source: Gartner)

Simulation, synthetic data, and digital twins are driving the future of AI and automation (see Figure 2). Simulation accelerates the training of AI models, while synthetic data enables AI models to be trained with less real-world data. Digital twins make it possible to know the expected performance and reliability of each asset individually. Together, it becomes possible not to just improve the efficiency of industrial applications but to maximize their uptime and reliability as well.