Industry Insights
Machine Vision Makes the Leap to Consumer Gaming
POSTED 12/08/2010
| By: Winn Hardin, Contributing Editor
It&
rsquo;s already happened. Machine vision has made the leap from obscure industrial quality control technology to mainstream consumer technology thanks to advances in 3D triangulation and time-of-flight machine vision systems, system on a chip (SOC) technology, and highly optimized image-processing algorithms.
And it’s only the beginning, according to leaders in the consumer machine vision industry.
Project Natal
In March 2009, Microsoft Corporation, maker of the Xbox360 game console, purchased 3DV Systems. 3DV Systems was an Israeli company that specialized in developing time-of-flight sensors to create 3D depth maps using a single image sensor. Shortly thereafter, Microsoft announced Project Natal, an effort to create a new way to interact with a video game that tracked the user’s body movements rather than a handheld controller.
A little more than a year later, Microsoft had gone from initial prototype to finished product. The PC giant released its first machine-vision based product: Kinect – which is expected to be a big seller this Holiday season. Kinect is based on a completely different technology that uses laser-triangulation and a single image sensor to create 3D depth maps.
The machine vision industry has used TOF and laser triangulation for many years as relatively inexpensive ways to create 3D maps for robot guidance and many other applications. In many cases, the industry used stereovision, or multiple cameras instead of a single image sensor, to improve the resolution of the 3D measurements – a critical consideration for most industrial inspection and manufacturing processes.
The gaming world is different, however. Each of the leading solutions for hands-free game console interaction uses a single sensor. Rather than advances in image sensor technology, these systems all use highly optimized SOCs, similar in some ways to the vision system on a chip (VSOC) recently released by industrial machine vision supplier, Cognex (Natick, Massachusetts). This chip-scale integration allows the consumer machine vision systems to deliver next-generation, hands-free video game control for less than the all-important $100 price point expected by the video game industry.
Time and Phase for 3D Vision
The ZCam's time-of-flight camera system features a near-infrared (NIR) pulse illumination component, as well as an image sensor with a fast gating mechanism. Based on the known speed of light, ZCam coordinates the timing of NIR pulse wave emissions from the illuminator with the gating of the image sensor, so that the signal reflected from within a desired depth range is captured exclusively. The amount of pulse signal collected for each pixel corresponds to where within the depth range the pulse was reflected from, and can thus be used to calculate the distance to a corresponding point on the captured subject.
Due to the fast timing required for light-based time-of-flight, the ZCam uses custom hardware for illumination and gating. The illuminator is a series of NIR laser diodes around the lens barrel, switched by special high-speed driver circuits that produce pulses with a rise time and fall time of less than one nanosecond. Initially, 3DV Systems used an image intensifier for gating the image sensor. They later developed a special solid-state image shutter, in the form of a gallium arsenide-based electro-optical chip mounted atop the image sensor. The company was reportedly developing a gating solution based on less-expensive CMOS-process fabrication.
PMDTechnologies GmbH later perfected the silicon gating mechanism. “Generally, a human-machine-interface (HMI) based on machine vision requires a two-fold solution,” explains PMDTechnologies Jochen Penne, Business Development and Applications. “First, there needs to be an appropriate way of ‘sensing’ the interactor that means determining which action/pose/gesture he is currently performing. Second, there needs to be an easy-to-use and easy-to-understand transformation between detected interaction and triggered action at the machine site. The first part of the solution leads to a technological challenge; the second part leads to a design challenge…
The PMD technology provides distance measurements according to the ToF (Time of Flight) principle by measuring the phase delay (i.e. intensity differential value of two outputs of one pixel) of an actively emitted reference signal, which is usually in the near infrared range. Thus a PMD system basically consists of two parts: The PMD chip and the illumination unit. The distance measurement is accomplished in each pixel individually and these PMD pixels can be manufactured using standard CMOS structures. Consequently, PMD systems enable very small housings, can be integrated into various operation sites and provide the potential for cheap mass-market product integration. PMDTec is delivering mass products since over 5 years in quantities of several 100.000 units a year.”
PMD’s PhotonICs combines a CMOS sensor with gating and signal processing electronics to create a 3D TOF imager on a chip. PMD uses a combination of optical filters and thresholding set by ambient light conditions to reduce background noise and further isolate objects in the sensors field of view. Unlike stereoscopic systems used in industry, TOF systems do not require a baseline separation between two sensors or the associated calibration to provide accurate 3D data maps. PMD’s technology has already transferred to industrial applications through industrial supplier, ifm electronic GmbH and will be used in future cars for safety and comfort features.
“There are actually many applications investigated by our customers in a variety of markets,” adds Dr. Bernd Buxbaum, CEO of PMDTec. “Autonomous robotic navigation, surveillance, people counting, automotive collision detection, industrial assembly line surveillance, box full/empty recognition, touch less 3D game control and many more. PMDTec’s CamCube as a modular and compact 200x200 pixel 3D range imaging system enables the investigation of application-specific solutions by potential customers, since the CamCube can be adapted according to the required field of view and distance measurement range. The highest resolution of an PMD imager today is 352 x 288 (i.e. CIF), but we will go beyond this pretty soon. However, no other 3D approach benefits from the improvements in semiconductor technology in the way ToF will, as the ToF depth resolution directly profits from miniaturization as well as bandwidth enhancement as Moore’s law will be valid further on. This benefit is not usable for active/passive triangulation systems as there are geometrical issues independent from that progress, thus there is a bright future for PMD technology.”
Triangulation Enables High-Res 3D Interaction
“The problem with TOF solutions is that you need to operate your sensor at high frequency to collect a signal at the right depth,” notes Adi Berenson, Vice President of Marketing at PrimeSense Ltd. “This requires a highly sensitive pixel, which means large pixels, which translates to smaller sensor arrays at standard CMOS sensor sizes. You can also have problems with depth of field. TOF is good for a person
standing 2 meters away or 6 meters away, but getting both at he same time is difficult.”
“The bottom line: Microsoft uses our technology and they’ve sold 1 million units. TOF has been around for 10 years or more, but it only sells thousands of units a year. Not a million,” says Berenson.
PrimeSense’s technology uses a standard CMOS sensor and continuous wave (CW) versus pulsed laser illumination to create a higher resolution (VGA) 3D depth map. The NIR projector projects a coded light pattern on the sensors field of view. The CMOS sensor collects the reflected NIR light and – based on distortions in the reflected pattern – uses on-chip algorithms to extract 3D data for each pixel in the array. “We don’t use a ‘grid,’ per se; our projected pattern uses information theory to project a much richer pattern for spatial 3D maps than a simple grid.”
“We’re essentially a semiconductor company,” says Berenson. “Our product is a system on a chip (SOC) and we offer an entire reference design around it: sensors, optics, projecting elements, and so forth.”
While PrimeSense concedes that their technology could transfer to the industrial sector, the small start up is exclusively focused on developing next-generation consumer interaction solutions. “Next year, you’ll see a variety of other launches using PrimeSense technology designed for hands-free interaction with other living room electronic devices. Then, mobile devices, followed by automotive and domestic robot devices. It needs to be step by step so we can focus our resources.”
In the meantime, Berenson says the company is focused on improving it’s products and working with customers to improve related issues, such as generating efficient skeletal models, improving gesture recognition, hand detection, people separation: “You’ll see great advances in these areas in the next year,” Berenson concludes.
And it’s only the beginning, according to leaders in the consumer machine vision industry.
Project Natal
In March 2009, Microsoft Corporation, maker of the Xbox360 game console, purchased 3DV Systems. 3DV Systems was an Israeli company that specialized in developing time-of-flight sensors to create 3D depth maps using a single image sensor. Shortly thereafter, Microsoft announced Project Natal, an effort to create a new way to interact with a video game that tracked the user’s body movements rather than a handheld controller.
A little more than a year later, Microsoft had gone from initial prototype to finished product. The PC giant released its first machine-vision based product: Kinect – which is expected to be a big seller this Holiday season. Kinect is based on a completely different technology that uses laser-triangulation and a single image sensor to create 3D depth maps.
The machine vision industry has used TOF and laser triangulation for many years as relatively inexpensive ways to create 3D maps for robot guidance and many other applications. In many cases, the industry used stereovision, or multiple cameras instead of a single image sensor, to improve the resolution of the 3D measurements – a critical consideration for most industrial inspection and manufacturing processes.
The gaming world is different, however. Each of the leading solutions for hands-free game console interaction uses a single sensor. Rather than advances in image sensor technology, these systems all use highly optimized SOCs, similar in some ways to the vision system on a chip (VSOC) recently released by industrial machine vision supplier, Cognex (Natick, Massachusetts). This chip-scale integration allows the consumer machine vision systems to deliver next-generation, hands-free video game control for less than the all-important $100 price point expected by the video game industry.
Time and Phase for 3D Vision
The ZCam's time-of-flight camera system features a near-infrared (NIR) pulse illumination component, as well as an image sensor with a fast gating mechanism. Based on the known speed of light, ZCam coordinates the timing of NIR pulse wave emissions from the illuminator with the gating of the image sensor, so that the signal reflected from within a desired depth range is captured exclusively. The amount of pulse signal collected for each pixel corresponds to where within the depth range the pulse was reflected from, and can thus be used to calculate the distance to a corresponding point on the captured subject.
Due to the fast timing required for light-based time-of-flight, the ZCam uses custom hardware for illumination and gating. The illuminator is a series of NIR laser diodes around the lens barrel, switched by special high-speed driver circuits that produce pulses with a rise time and fall time of less than one nanosecond. Initially, 3DV Systems used an image intensifier for gating the image sensor. They later developed a special solid-state image shutter, in the form of a gallium arsenide-based electro-optical chip mounted atop the image sensor. The company was reportedly developing a gating solution based on less-expensive CMOS-process fabrication.
PMDTechnologies GmbH later perfected the silicon gating mechanism. “Generally, a human-machine-interface (HMI) based on machine vision requires a two-fold solution,” explains PMDTechnologies Jochen Penne, Business Development and Applications. “First, there needs to be an appropriate way of ‘sensing’ the interactor that means determining which action/pose/gesture he is currently performing. Second, there needs to be an easy-to-use and easy-to-understand transformation between detected interaction and triggered action at the machine site. The first part of the solution leads to a technological challenge; the second part leads to a design challenge…
The PMD technology provides distance measurements according to the ToF (Time of Flight) principle by measuring the phase delay (i.e. intensity differential value of two outputs of one pixel) of an actively emitted reference signal, which is usually in the near infrared range. Thus a PMD system basically consists of two parts: The PMD chip and the illumination unit. The distance measurement is accomplished in each pixel individually and these PMD pixels can be manufactured using standard CMOS structures. Consequently, PMD systems enable very small housings, can be integrated into various operation sites and provide the potential for cheap mass-market product integration. PMDTec is delivering mass products since over 5 years in quantities of several 100.000 units a year.”

PMD’s PhotonICs combines a CMOS sensor with gating and signal processing electronics to create a 3D TOF imager on a chip. PMD uses a combination of optical filters and thresholding set by ambient light conditions to reduce background noise and further isolate objects in the sensors field of view. Unlike stereoscopic systems used in industry, TOF systems do not require a baseline separation between two sensors or the associated calibration to provide accurate 3D data maps. PMD’s technology has already transferred to industrial applications through industrial supplier, ifm electronic GmbH and will be used in future cars for safety and comfort features.
“There are actually many applications investigated by our customers in a variety of markets,” adds Dr. Bernd Buxbaum, CEO of PMDTec. “Autonomous robotic navigation, surveillance, people counting, automotive collision detection, industrial assembly line surveillance, box full/empty recognition, touch less 3D game control and many more. PMDTec’s CamCube as a modular and compact 200x200 pixel 3D range imaging system enables the investigation of application-specific solutions by potential customers, since the CamCube can be adapted according to the required field of view and distance measurement range. The highest resolution of an PMD imager today is 352 x 288 (i.e. CIF), but we will go beyond this pretty soon. However, no other 3D approach benefits from the improvements in semiconductor technology in the way ToF will, as the ToF depth resolution directly profits from miniaturization as well as bandwidth enhancement as Moore’s law will be valid further on. This benefit is not usable for active/passive triangulation systems as there are geometrical issues independent from that progress, thus there is a bright future for PMD technology.”
Triangulation Enables High-Res 3D Interaction
“The problem with TOF solutions is that you need to operate your sensor at high frequency to collect a signal at the right depth,” notes Adi Berenson, Vice President of Marketing at PrimeSense Ltd. “This requires a highly sensitive pixel, which means large pixels, which translates to smaller sensor arrays at standard CMOS sensor sizes. You can also have problems with depth of field. TOF is good for a person

“The bottom line: Microsoft uses our technology and they’ve sold 1 million units. TOF has been around for 10 years or more, but it only sells thousands of units a year. Not a million,” says Berenson.
PrimeSense’s technology uses a standard CMOS sensor and continuous wave (CW) versus pulsed laser illumination to create a higher resolution (VGA) 3D depth map. The NIR projector projects a coded light pattern on the sensors field of view. The CMOS sensor collects the reflected NIR light and – based on distortions in the reflected pattern – uses on-chip algorithms to extract 3D data for each pixel in the array. “We don’t use a ‘grid,’ per se; our projected pattern uses information theory to project a much richer pattern for spatial 3D maps than a simple grid.”
“We’re essentially a semiconductor company,” says Berenson. “Our product is a system on a chip (SOC) and we offer an entire reference design around it: sensors, optics, projecting elements, and so forth.”

In the meantime, Berenson says the company is focused on improving it’s products and working with customers to improve related issues, such as generating efficient skeletal models, improving gesture recognition, hand detection, people separation: “You’ll see great advances in these areas in the next year,” Berenson concludes.