Eye-Tracking Systems Turn Vision on the Viewer
| By: Winn Hardin, Contributing Editor
Machine vision hasn’t yet cracked the code of what humans are thinking, but thanks to steady advances in eye-tracking technology, it’s getting better at guessing what has our attention.
Most eye-tracking systems today share machine vision’s same basic mix of components: camera, light source, and image processor. The difference is that eye-tracking systems turn the camera back on the viewer to track and interpret the activity of the eye, whether that be where and how long the eye fixates on something, what features attract its gaze, or how the eyes’ pupils react to visual stimuli. Eye tracking needn’t be a passive observer of eye movement either: The technology also promises to augment the computer mouse and keyboard by enabling navigation and control with a gaze.
Pioneered by Raymond Dodge and T. S. Cline in 1901, eye tracking predates machine vision by more than half a century. More recently, eye-tracking technology has caught the eye of tech industry bellwethers. During the months straddling 2016 and 2017, Google, Apple, and Facebook respectively purchased eye-tracking start-ups Eyefluence, SensoMotoric Instruments, and The Eye Tribe. Microsoft has not yet joined the buying frenzy, but it’s introduced new application programming interfaces that add eye-tracking support for Windows and further partnered with leading system integrators such as Tobii and EyeTech DS to publish a new industry standard for the technology.
If there is a common goal driving these recent acquisitions, it is likely the race to develop commercially successful headgear for virtual reality (VR) and augmented reality (AR) applications. Eye-tracking technology is critical to both.
It is a fundamental enabler of foveated rendering, which is a digital display technique that sharpens only that part of a VR image on which the eye is focused. As with real-life vision, the periphery remains blurry. Used in VR headsets, foveated rendering significantly reduces the load on the graphics processing unit (GPU), allowing current-generation GPUs to render 4K display quality and higher frame rates for a much lower cost and power budget.
Eye-tracking technology supports a similar function — focused rendering — in AR headgear, where the challenge is to selectively display only the most relevant information over a user’s vision. Too much information overwhelms the user or occludes their view. By identifying what the eye is looking at, AR headsets can implement focused rendering to only provide data on the object of attention and provide the user with greater control.
Component specifications may vary by application, but most wearable eye-tracking systems rely on a pair of CMOS cameras positioned below the eyes and operating within the 850–900 nm range. While the eyes are illuminated by infrared LEDs, cameras are often equipped with a bandpass filter to minimize reflections off the cornea from visible light. The result is a high-contrast offset image of the eyes that enables a GPU or field-programmable gate array (FPGA) to easily resolve both pupils, calculate a sight vector for each, and by determining where those vectors converge, identify their direction and depth of focus.
“This typically only applies for [tracking] objects within arm’s length,” explains Andrew Duchowski, Professor of Visual Computing at Clemson University. “Beyond that range, the eye’s basically looking at infinity.”
Consequently, most eye-tracking systems still operate at close range to the viewer. Beyond wearable headsets, eye-tracking systems might be embedded in stand-alone desktop configurations or computer monitors to enable user interface or advertising design or medical research and diagnosis. Eye-tracking is also a key component for automotive driver assistance systems that foster safe driving by detecting how attentive a driver is on the road.
Frame rates between 60 and 120 Hz are generally enough to accurately sample saccades — or eye movements — in most of these applications. For example, AR/VR headsets that implement cameras operating within these frame rates will prevent users from detecting a lag between eye fixation and image rendering. More sophisticated systems, such as those designed to track microsaccades for research or diagnosis of neurological conditions, must operate at frame rates of 300 Hz and higher.
“Head-mounted systems generally demand precise synchronization between the two cameras,” says Mike Fussell, Product Manager for FLIR Systems. “Low latency triggering over GPIO [general purpose input output] makes this possible. For our Firefly product, the GPIO pins for external power and synchronization and 60 FPS specification were driven by the requirements of eye tracking for powering lighting and synchronizing two cameras in as small a package as possible.”
Increasing frame rates and processing performance also places high demands on custom image processing to handle the incoming sensor data, notes Dale Hitt, Director of Market Development at Xilinx. “However, there is also a transition from simplistic computer vision algorithms to AI inference processing to deliver more robust tracking and advanced capabilities like drowsiness and emotional-state detection,” Hitt says. “With Xilinx [FPGA] technology, eye-tracking system developers can deploy low-latency AI inference hardware acceleration, custom computer vision hardware pipelines, and high-speed image processing capabilities to meet the most demanding requirements.”
Image sensor resolution is less critical, though still important for eye-tracking systems. For head-mounted tracking systems, Fussell says that 0.4 to 1.6 MP is the resolution range where most of FLIR’s eye-tracking customers sit.
“Higher pixel counts give a bigger head box, which is an important performance metric for eye tracking,” says Keith Jackson, Director of Sales and Marketing for EyeTech Digital Systems. The head box effectively defines the three-dimensional area in which a user’s head may move without significantly diminishing the performance of an eye-tracking system.
Conversely, higher pixel density also helps track the gaze within smaller target dimensions. EyeTech’s current eye-tracking systems, for example, are designed to provide between 0.5 and 1 degree of accuracy, which corresponds to a 0.5 inch accuracy on a computer monitor viewed from just over two feet away. EyeTech also makes eye trackers that work at distances up to 10 feet.
Were we able to peer into the future, what advances would we see in eye tracking? “Putting my futurist hat on, I’d say stacked, backside-illuminated CMOS sensors are going to enable smaller image sensors with higher-speed operation and onboard processing,” Fussell says. “At some point, some image sensors may be able to perform eye tracking on-sensor.”