How AI Will Revolutionize, Not Replace, Machine Vision
| By: Dan McCarthy, Contributing Editor
In just 10 years, artificial intelligence (AI) has evolved from a laboratory curiosity to an increasingly pervasive—if not always visible—part of our daily lives. Though many of us may not recognize the AI systems powering our daily interactions with smartphones, voice assistants, e-commerce sites, or streaming media services, those systems certainly recognize us, and they can predict our preferences with increasing accuracy.
Yet as eerie as that seems, many of these AI systems are simply comparing our choices to a structured database of similar options. Netflix AI, for example, can derive our preference for film noir by cross-referencing similarly labeled movies. But without those labels, it would need deep learning algorithms and a library of relevant images to distinguish a film noir scene from an Esther Williams swim sequence.
Generally defined as a subset of machine learning (itself a subset of AI), deep learning employs neural networks and algorithms that enable machines to learn without being explicitly programmed to perform a certain task. In addition to imagery, such data might include audio files, business documents, or weather models.
Within the consumer realm, most applications of deep learning to imagery concern recognition and classification, and they range from frivolous smartphone apps (e.g., Not Hotdog) to the powerful systems that enable Facebook to automatically identify images of the lump of fur on your couch as a cat for later search, reference, and possibly, targeted ad placement.
Deep learning also powers Google’s Vision AI API, which leverages the company’s native catalog of about 10,000 visually recognizable objects to perform the equivalent of a reverse image search across the World Wide Web. In addition to listing existing topics used to caption a given image wherever it has appeared, Google’s deep learning platform can generate new image labels as, for example, random events unfold on the daily news.
The AI Revolution Begins Here
While most consumers aren’t interested in what is inside deep learning’s black box so long as it works, the capabilities illustrated by Google’s Vision AI API have clear implications for the machine vision industry, which has relied for decades on fixed rule-based approaches and pass/fail interpretations of image data.
Where rule-based programming excels at measurement and alignment, deep learning tools enable classification of image data to perform complex cosmetic inspections, distinguish different materials, verify assembly, and generally adapt to unstructured image data. This isn’t to say that deep learning will one day replace traditional machine vision but rather expand its abilities.
“Deep learning is an easy and powerful solution in applications that are very easy to detect with the human eye but difficult when using a rule-based approach,” says Thomas Hünerfauth, Product Owner, HALCON Library at MVTec Software GmbH.
For example, deep learning solutions can help vision systems distinguish weeds from crops in an image to help farmers identify and appropriately scale countermeasures. In fact, Hünerfauth notes, virtually any food industry application that involves inspection of natural material can benefit from deep learning tools. “A rule-based approach designed for measuring or blob analysis finds this quite difficult, but such challenges can be solved with deep learning very easily,” he says.
As a subset of machine learning, deep learning technology doesn’t merely interpret image data, it helps expand it to enable image-processing systems to become even more precise. Unlike conventional machine vision solutions that rely on a developer to define and verify target features, deep learning software leverages neural networks that, like human intelligence, can be trained to distinguish features in an image yet tolerate variations. As the system captures new images, the software identifies objects and anomalies and assigns the new image data to the appropriate classes.
“If you want to train a neural network, and you only have 100 images but you need 1000 images, you can generate these artificially,” says Bruno Menard, Software Director at Teledyne DALSA. “It’s a form of data augmentation,” he adds.
Lifting the Lid
Deep learning is not without its challenges. Compiling image libraries and training the neural networks can be as laborious as programming a machine vision system for applications, such as object detection or segmentation. In response, vision providers such as MVTec and Cognex are developing easier interfaces and pretrained reading tools that help streamline the image library required to deploy deep learning tools.
Another challenge is that machine vision engineers and end users are often less comfortable ignoring what’s inside the black box than consumers are. “If you train the system and get good results, it's okay and all is good,” said Hünerfauth. “But if its results are wrong, it's hard to explain why, and this is, in certain industries, very hard to accept. So, we have to make a gray box out of the black box to give better feedback to these customers and try to explain what happened inside.”
Here too, Google’s research may offer insights. The company recently partnered with OpenAI to explore how – or rather what – AI sees when it views the world through a machine vision system. Leveraging what they call “activation atlases,” the collaborators are mapping how individual algorithms activate together to convert abstract shapes, colors, and patterns into recognizable images. By effectively lifting the lid off the black box in which visual data algorithms derive conclusions, the research aims to support development of more robust algorithms. Such insights could prove beneficial in machine vision applications for deep learning, which exacts a much higher standard for validating images.
For all of its power and adaptability, deep learning will revolutionize machine vision, not replace it. They are complementary technologies. Machine vision’s ability to discern geometric patterns and edges in image data remains the best way to achieve subpixel accuracy for high-precision measurements. Deep learning promises to extend the discipline’s capabilities by introducing a humanlike ability to judge and learn from image data. But deep learning still benefits from a human trainer – especially one knowledgeable in traditional machine vision techniques. Veteran engineers may often find their application expertise is valuable in optimizing a deep learning’s ability to learn.