Detecting patterns and generalizing
Before the advent of modern deep learning, images and sounds were unstructured data files, there was no way computers could interpret their contents. The computer vision area was dominated by algorithms focused on the use of hand-crafted functions that sought to interpret the geometric elements that made up the images. The assertiveness of these algorithms was very low, and the applications were restricted to environments with very controlled conditions.
Several research institutes promoted challenges for the development of new algorithms which could interpret the content of images, being the most famous of them the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), which began in 2010, just before the advent and popularization of deep learning algorithms.
The Imagenet is a huge dataset, with more than 14 million images and 20,000 categories, and of course, has a huge diversity in the data. In the early years of the competition the error rates of algorithms were around 30%, but this changed drastically in 2012 with the advent of deep learning and Convolutional Neural Networks (CNN).
On 30 September 2012, a convolutional neural network (CNN) called AlexNet achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner up. (source Wikipedia)
We could talk about several aspects of why deep learning algorithms are far superior to traditional algorithms in the processing of unstructured data, but I would like to talk specifically about the ability of neural networks to recognize patterns and generalize.
Generalize is the ability of a Neural Network to learn to detect the data pattern in new situations after the training time with pre-defined examples.
This assertion is a bit weird for a procedural mind, accustomed to the classical if-then-else structure. However, this traditional procedural structured programming is very good to work with structured data, but when we get unstructured data, and the data of an object in an image never is exactly the same as before, we have an impossible problem for the procedural programming.
How can this be accomplished with Deep Learning Neural Networks?
It’s a complex matter, and Deep Learning is very hard to understand and master. Several articles can introduce it, and I’ll put some references below. But now, I want to show this in a practical example.
In the last post I show how can we get very good results with a Neural Network and very few examples. Now we’ll expand that detection to a new scenery and observe how the Neural Network will adapt.
My son’s toys are now on the floor (another backplane), and we have a different illumination, angles, and some manipulation. Let’s see how the last model detects on this video.
We can see that the detection fails in several frames, and we get a lot of false detections with low confidence.
But at Eyeflow the test process automatically sends random frames to the dataset and we can see frames with the failed detections.
The best strategy now is to select some random examples with failed detections and add them to our dataset. We take care to only add frames that present detection errors, and so we get the best result in expanding the dataset. That’s our semi-supervised learning.
And after we edit these new examples, correct the annotations and put them to train and after test again. We have done this 3 times, 20 minutes each train, and now we have 67 examples in the dataset, and the following result.
As we can see the detections have improved a lot, and the false positives and false negatives have disappeared. And this result was achieved only with 3 iterations, and minimal annotation effort, thanks to our integrated platform for test and improving datasets/models.
The key point here is that the neural network was able to generalize the detection of image patterns with very few examples, even though the scenario has changed a lot. This type of generalization would never be possible to be achieved with traditional computer vision algorithms.
We’ve been working with neural networks for five years now, and we’ve always been amazed at the results we’ve achieved. I believe neural networks are a major revolution in the way we build algorithms, but that’s a theme for another post.
Our platform is in beta period. Feel free to visit us at https://eyeflow.ai