Designing Machine Vision for Optical Character Recognition
| By: Winn Hardin, Contributing Editor
Optical character recognition (OCR) has come a long way since Gustav Tauschek filed the first related patent in 1929 on how to use letter templates and a photo detector to allow a machine to ‘read’ text. Since that time, templates have been replaced by digital representations of letters and photo detectors with cameras backed by computers running a sequence of image processing algorithms.
The world and OCR’s place in it has changed a lot since 1929 too. Today, OCR is at the heart of a great debate that may redefine personal property rights for the literary world versus the need to make information available to the masses around the world, also known as the Google Library Project.
But whether you’re trying to find a way to automatically sort mail, or save the world by bringing the combined wisdom of the literary world to developing nations, automated OCR isn’t perfect. Designing a system that allows a machine to read type or written text heavily depends on the particulars of the system design and application. Lets look at some ways vision and imaging companies have tackled the OCR application and developed systems that are as accurate as possible.
Clean, Crisp, Characters
OCR starts with an image, identifies type within that image, and extracts the character string from the image for use in downstream applications. But the first thing you need is a good image of clean, crisp characters.
“There are specific fonts for OCR, such as OCR A and OCR B, but we typically don’t care what font the customer uses as long as it is a standard, clear font,” explains Olaf Hilgenfeld, Senior Product Manager for VIPAC, VITRONIC’s (Wiesbaden, Germany) high-end, high-throughput OCR and parcel sorting system. “Whether it's a serif or not isn’t an issue because our classifiers are font independent. We suggest that customers say away from colored inks or you’ll need to make sure you can illuminate the field of view correctly to create an image with sufficient contrast between letters and background. And the letters need to have space between them; OCR read rates on linked letters are significantly lower.”
Unlike barcode and data matrix readers that have checksums and other features built into them so that defective, faded and damaged codes can still be read, machine vision OCR algorithms need a fairly complete letter to decipher, especially if the text is structured, or follow defined rules like those used to group products within a family together, etc.
“It’s pretty much impossible to do OCR on cast parts, for example, without seeing gross errors because you can’t count on the letters being complete enough to read,” adds Nicholas Tebeau, Manager Vision Solutions Business Unit Industrial Solutions at LEONI Engineering Products and Services, Inc. (Lake Orion, Michigan).
The Camera-Character Relationship
Camera resolution, stand off distance; field of view and depth of field are all critical considerations for a machine vision OCR solution. “We typically suggest that we need images of 200 dpi or better for normal-sized writing to do OCR to get a sufficient resolution,” adds Hilgenfeld.
“Label layout is critical,” Hilgenfeld continues. “It’s just not smart to let an OCR engine search the entire image for characters because you would need a lot of processing power to do that. We like to work with the customer to help them design the label when possible or at least add some characteristics in a known and repeatable location so that the machine vision system can quickly find it and then find the OCR regions based on the location of these characteristics.”
Markings or other optical hints can be processed very fast. Orientation is usually simple to determine because most writing is left justified.
Handwriting: A Stylish Challenge
If cast iron letters aren’t dependable enough for OCR, it may be a surprise that handwriting from hundreds of different sources is feasible for a machine vision system, however, MVTec Software GmbH (Munich, Germany) and integrator ECKELMANN AG did just that a few years ago for a large mailing house.
After locating the location of handwritten text on a label, the characters are analyzed against various handwritten character sets stored as vector sets. ECKELMANN designed the handwritten OCR analysis routine to use both a neural network classifier, and a support vector machine classifier, where results from both analyses can be used to check each other and improve the accuracy of the overall read.
The system first uses a neural network algorithm to identify the characters. The result is checked against stored “plausible” values, such as current supplier identification numbers, common numbers of cartons shipped, etc. If the neural network result is determined to be unlikely, the system uses the SVM’s built into the MVTec’s Halcon image processing software. SVM’s are an alternative training method for polynomial, radial basis function and multi-layer perceptron classifiers saved by a quadratic programming problem with linear constraints, rather than non-convex, unconstrained minimization problems used with standard neural network training.
Its easy to see how an OCR application can require up to four times the processing power to solve compared to a barcode or data matrix reader. But the true benefit of machine vision to automated tracking and sorting is its ability to do both simultaneously, using the same hardware. As online shopping continues to grow right along with globalization, the entire supply chain from component and product manufacturers through fulfilment and returns need cost-effective, powerful solutions for automated sorting and tracking. Luckily, machine vision has what warehousing needs…and they don’t even have to wait for the holidays.