Industry Insights
Specialized Chips Could Help Smart Cameras Leapfrog PC Host Systems
POSTED 05/06/2008 | By: Winn Hardin, Contributing Editor
The machine vision market owes its success to the development and proliferation of the microprocessor. Today, advances in electronic design automation (EDA) programs and semiconductor manufacturing equipment are greatly expanding the options available to machine vision system designers, and in particular, to the exploring smart camera market.
Beyond x86
Since the microprocessor, the semiconductor industry has been kind enough to develop a host of specialized chips that compliment or beat the general purpose CISC, RISC, and x86-based microprocessor architectures when it comes to specific functions. Digital signal processors (DSPs) are the most notable.
While general-purpose microprocessors can also perform DSP functionality, DSPs with floating-point computational units and parallel accumulator/multipliers perform these types of calculations faster than microprocessors while consuming less energy and producing less heat, two key design considerations for the small-form-factor smart camera machine vision market.
‘‘We’re working on a next generation platform, in addition to our Wintel-based P and E Series smart cameras,’‘ explains Fabio Perelli, Product Manager at Matrox Imaging (Dorval, Quebec, Canada). ‘‘While the details are proprietary, the objectives are simple: the next platform needs more computational power, faster networking, a smaller footprint and to consume less power than today’s smart cameras.’‘
Other architectures exist that can do far more than a microprocessor and DSP, such as application specific integrated circuits (ASICs), designed to solve a unique set of computational requirements. However, ‘‘ASICS are still too expensive for the machine vision market,’‘ said Matrox Imaging’s Perelli, ‘‘but you can do a lot more with FPGAs [field programmable gate array] these days.’‘
Perelli continues, saying that the smart camera market can learn from how image processing daughter cards with additional DSPs and FPGAs were first used to increase the performance of high-end frame grabbers. ‘‘For point-to-point type operations, filters, and distortion correction with look-up tables [LUTs], FPGAs are a great way to do specific operations. The problem can remain, however, that programming of FPGAs is not easy and takes good software tools to do it right.’‘
Multicores: Too Much Is Never Enough
As illustrated by the floating-point power of the DSP, and the fast calculations of the FPGA for specific operations, machine vision is benefiting from the development of specialized chips that do certain microprocessor functions better, faster, and cheaper. As these specialized chips find their way into more applications – as the FPGA has done during the past 7 years or so – they become less expensive, better supported through software and hardware, and more attractive to ancillary consumers of technology, such as machine vision.
As mentioned briefly in a recent article on 3D imaging, smart stereoscopic camera manufacturer, TYZX (Menlo Park, California) has combined the power of a low-end microprocessor with FPGA and ASIC. The G2 Embedded Vision System is a PowerPC-based computer that runs Linux. At the component level, the G2 consists of an embedded PowerPC chip (666 MHz AMCC 440GX) connected to two imagers, a TYZX DeepSea II stereo processor, and some memories using an FPGA. This basic configuration is designed to be extremely flexible – any component can be configured to talk to any other component by means of FPGA configuration. Implementing a new configuration for the FPGA, in principle a simple task, can take weeks or months of effort. Thus, the G2 is designed with a particular dataflow architecture / FPGA configuration in mind that allows the various components to be reconfigured at a course level of granularity. This approach allows the system to be applied to distinct tasks without modifying the firmware in the FPGA.
The dataflow for the G2 is a generalized version similar to any computer that is intended to perform vision tasks: on a frame-by-frame basis the CPU has access to input imagery. Left and right rectified source imagery can be DMAed into the main memory on the PowerPC. In principal, this data is enough to perform any vision or stereo-based task. However, the cost of computing stereo depth at frame rate alone would swamp the embedded PowerPC. To perform real-time 3D vision tasks, the PowerPC has access to several other input image sources on each frame: a range image, a foreground image, and a ‘‘ProjectionSpace’‘ image computed directly by hardware as the pixels are input from the imagers. This hardware acceleration is key to real time performance in an embedded 3D vision system.
Defense & Security Drive Image Processing
At a recent defense and security conference, two companies showed new computational designs that could significantly move image processing closer to the way human brains operate.
Human brains have approximately 1,012 neurons that can both process and store information, plus 1,015 synapses for connecting the neurons, creating a massively parallel processing architecture. Compare this to the 100 millions of logic gates in today’s microprocessors served by a few hundred I/O pipes, and it’s apparent that microprocessors still have a long way to go to match the human CPU.
Irvine Sensors’ (Costa Mesa, California) Silicon Brain Architecture was originally developed through SBIR grants and is now being developed into a smart camera design for DARPA. While much of the design of this chip is secret at this point due to the defense ramifications, Irvine Sensors’ concept is to stack FPGAs while providing a mechanism to greatly increase connectivity among the various process units. The final stack would form a pyramid neural network where values from lower chips are passed upward for ever more powerful, higher-level image processing decisions.
Based on a related development at CalTech to emulate the visual cortex using FPGAs, a stacked FPGA approach was selected for the Silicon Brain Architecture SBIR Phase II brain emulation. The 3D silicon stacking technology is important to provide the very short distance, completely parallel interconnectivity that enables the silicon brain to achieve peta-ops performance while consuming less than 10 watts.
Irvine Sensors’ design uses thinned and stacked integrated circuit chips to emulate the highly integrated neural circuitry of the brain. These circuits are weighted synapse arrays (WSAs) terminating in neurons connected to other WSAs. Crossbar switches in each chip enable all possible interconnects within and between stacks. The outer peripheral stacks interface with sensor chips -- imaging optical sensors, or other. A bus and control plane analogous to the brain mid-line plane separates the two halves of the silicon brain and provides clock signals, ground, power and interlobe communications. The use of FPGAs is important reportedly because many potential custom IC configurations can be implemented without committing funds to expensive IC design and development.
The difficulty with the conventional FPGA-based implementations of large electronic functionality is that advanced FPGAs have several hundred pin-outs each and to obtain enough gate count in a stack will require several chips to be interconnected resulting in over several thousands of interconnects. The combination of hybrid 3DFET with FPGAs and field programmable interconnect devices (FPIDs) using Irvine Sensors’ newly developed neochip technology will allow the company to realize core processing functions in a very flexible Silicon Brain Architecture. The insertion of the FPIDs will allow the stack interconnections to be totally reconfigurable. Therefore, different interconnect schemes such as mesh, butterfly and crossbar can be implemented on-request and on-the-fly between the processing nodes formed in FPGAs without any hardware changes. This flexibility of the resulting stack can be used to rapidly prototype any digital processor.
Another interesting development is the Ambric Am2000 massively parallel processor array (MPPA). Although the Am2000 hasn’t been designed in a camera yet, Ambric Business Development Director Paul Chen explains that the design allows 5-20X TI64X DSP performance with high scalability. Comparing with high-end FPGAs, it offers software programmability with 2x to 5x performance at lower unit costs. The chip uses arrays of small computing processors with distributed memory. Each processor runs asynchronously, while globally all the processors are synchronized automatically via the programmable ‘‘channels’‘ that inter-connect them. A producer processor stalls when its output channel is full, and a consumer processor suspends its execution if its input channel is empty. A blocked processor automatically resumes its execution once the stalled condition is removed without the need of a real-time OS. The MPPA processor code size is typically 2/3 less than that of a DSP, and the development time is 1/10th to 1/20th of doing the same on an FPGA, according to Chen. A software engineer becomes productive in design after 1-2 days of tools training that includes a structural object programming design tool, a functional simulator and a multi-processor debugger.
Today, new microprocessor architectures that use arrays of computational units are becoming more common and cost effective. Recently, several imaging companies have gone public with new IC and camera designs that could reveal the future of the machine vision industry, taking the next step from multi-core to neural networks, moving towards matching the architecture and capabilities of the human brain.