Industry Insights
How Vision Technology Is Building Bridges Between Consumers and Brands
POSTED 11/08/2019 | By: Dan McCarthy, Contributing Editor
As many vision engineers have learned, the larger population isn’t always quick to grasp the concept or practical applications of programming a computer to “see.” Yet as consumers have become increasingly reliant on personal digital technology — including personal computers, smartphones, tablets, and the cameras embedded therein — it was only a matter of time before enterprising businesses found ways to apply and monetize vision technology within the consumer realm.
Many of these emerging applications have taken a familiar if innovative form, such as the use of cameras and software to actively enhance shoppers’ experience or optimize in-store advertising. But to translate to the consumer market, most vision technology often must adopt a more user-friendly form — even to the point of becoming invisible — as most of us are not engineers.
In December, we will examine more closely how vision is helping to drive Retail 4.0. For now, a glance at the value chain just above and below the retail outlet shows ample evidence that vision technology is also empowering both brands and the consumers they seek to directly engage.
Image Recognition Gets Smarter
In the industrial realm, much attention is paid to narrowing and optimizing the image data captured by cameras. The consumer realm is more forgiving in this regard. Millions of consumers snap photos and videos with their smartphones, tablets, and computers every day, often with little consideration for lighting or composition. But the loss in image quality is compensated by the sheer volume of image data captured — and often shared on social networks.
There is value in that massive library of image data for businesses able to automate search, sorting, and retrieval of relevant content. As artificial intelligence and deep-learning tools evolve, many businesses are leveraging these tools to equip computers to see and think more like consumers do.
Technology bellwethers such as Amazon, Google, IBM, and Microsoft, for example, have all invested heavily in AI-driven image recognition software able to automatically recognize and tag image content as well as or better than a human viewer. A study by Perficient Digital compared image recognition technology from all four of these companies with human operators to measure how many tags each returned per image, how accurate those tags were, and how confidence level correlated with accuracy.
Notably, the study found that all four software platforms tagged more items within images than the human operators but often at a cost to the accuracy of those tags.
The gap in accuracy is not significant, however. Where Perficient calculated an overall accuracy score of 87.7% for human operators, it found Google Vision, Amazon’s Rekognition, and Microsoft Azure labeled image content with accuracies of 81.7%, 77.7%, and 75.8% respectively. IBM’s Watson achieved only 55.6%, though it demonstrated better natural language processing in the descriptive labels it favors.
Overall, the study indicated that image recognition software has a way to go before it can match or exceed the human capability for identifying raw image content. Nevertheless, it has advanced far enough to add significant value under controlled settings, such as automating tagging of massive product inventories on e-commerce servers or searching vast image libraries for very specific content.
Engaging Consumers Through Imaging
As the social media sphere emerged, major brands quickly learned to monitor social networks to flag and respond to mentions about their products and services. The problem early on was that text formed only a part of what social media users posted online. Images and video of a product or logo are arguably just as important. But unless a consumer tagged a photo, brands were limited in their ability to track that image on popular networks such as Facebook and Twitter. They were virtually blind to consumer feedback on more visually oriented networks such as Instagram, Pinterest, and Snapchat.
Companies such as Talkwalker, Brandwatch, and LogoGrab saw an opportunity to change this by developing image detection software that can single out the use of brand logos in images shared across social networks. They now allow online community managers to enforce branding, track feedback, engage users, and potentially head off negative content before it goes viral.
Other companies, such as Clarifai and GumGum, also offer visually based social media listening tools, but they leverage deep-learning tools to deliver further intelligence. Clarifai offers custom and pretrained deep-learning models that enable brands to quickly apply image recognition to their internal image libraries. In addition to tagging logos, Clarifai’s solution can help identify discrete objects within an image or video, including specific faces. GumGum applies AI-based application programming interfaces (APIs) to scan images, videos, and importantly, the content that surrounds them to optimize ad placement and analyze the value of brand sponsorships.
Real-World Image Search and Beyond
Image recognition software not only helps brands to identify conversations about their products, it also enables consumers to identify brands. The traditional e-commerce model applied reductive filters to help potential buyers limit their search for a particular good, such as a sweater. Not only did this require a sweater vendor to manually assign tags to all of its products, it was also an imperfect solution for consumers who might not be familiar enough with sweater nomenclature to search for an athletic, ribbed, half-zip sweater in moss charcoal.
Visual recognition APIs from GumGum and Clarifai, as well as those driving Pinterest’s Lens feature, now allow consumers to snap a photo of a sweater they like and run it through a smartphone app. The app compares the image data to online databases of images and returns likely matches — along with information about where to buy them.
Augmented reality is yet another way that vision technology is visually pairing consumers with products. Sephora and Algoface, for example, both enable anyone with a smartphone camera to analyze their facial attributes, virtually match cosmetics to their skin tone, and try different looks before making a purchase.
Algoface, which offers its software development kit (SDK) to cosmetic brands, claims its app makes recommendations based on 20 different facial traits, including age, gender, ethnicity, skin tone, hair color, as well as size and shape of the face, eyes, nose, lips, and others. Its SDK further enables a smartphone camera to track 84 facial landmarks to capture the natural movement of a user’s face.
Learning Deep Learning
The potential applications for vision and deep learning in brand engagement have multiplied so quickly that enabling in-house application development has become a business all its own. Amazon Web Services (AWS) developed its programmable DeepLens deep-learning-enabled video camera for this purpose. The camera can be integrated with open source software in any industry to simplify development of branded computer-vision applications.
The system itself integrates a 4MP camera with H.264 encoding at 1080 p resolution, a Gen9 graphics engine, and an Intel Atom processor. The DeepLens console provides users with a prepopulated object-detection project template and integration with Rekognition for advanced image analysis. The concept is to help brands unfamiliar with deep learning to quickly learn, develop, and apply the technology to create vision applications, such as object and face detection, classification, automated tagging, and other brand-based apps. Amazon’s DeepLens website displays a growing community of users who are sharing their experiences and success to both inform and inspire future technology adopters.
One result of DeepLens is DermLens, which helps psoriasis patients to monitor and manage their skin condition in conjunction with their physician and care team. Developed by independent startup Predictably Well, the app leveraged DeepLens’ algorithms to recognize psoriasis by feeding it with 45 images of skin that showed typically red and scaly segments. Each image in the set came with a mask indicating the afflicted skin. Amazon’s device then sent data to Predictably Well’s companion app, enabling it to estimate the severity of a user’s condition.
Consumers may remain largely ignorant of the nuts and bolts underlying how vision technology works today or how it is changing their lives. But major brands are quickly becoming savvier about its potential for driving engagement and sales. As long as there is money to be made, it is likely that the technology will become increasingly pervasive in the consumer realm.