Comparing the Performance of Different Embedded Systems
Designers building embedded vision or edge computing systems can choose from a broad range of compact embedded computers. There is a large variety of processor architectures and memory and storage options available in the sub-100mm x 100mm board-level form factor. This makes selecting the correct computer for a project more complicated than with traditional PCs.
Comparing the number and type of I/O connections of two embedded computers, their physical dimensions, and cost is straightforward; evaluating differences in computing power and power consumption requires an in-depth approach.
Selecting hardware by comparing the performance specifications published by hardware manufacturers can be difficult. The processors available in similarly-sized board-level computers range from high-performance Intel Core i7 x86 CPUs on the COM Express platform to low-power ARM Systems on Chips (SOCs), designed for mobile applications. These systems are based on their manufacturers’ architectures and will likely run different operating systems. With so many variables, specifications alone are unlikely to provide much in the way of practical guidance.
Real-world performance testing is the best method for comparing different hardware and system configurations. Openbenchmarking.org provides a public archive of test data from a wide range of systems including high-performance PCs and low-power embedded systems. This allows designers to compare entire systems, or evaluate the performance impact of changes in hardware or software configuration.
Benchmark tests are designed to highlight different performance dimensions. Selecting a benchmark that accurately represents the needs of your vision system is critical.
CPU performance benchmarks including x11perf ( https://openbenchmarking.org/test/pts/x11perf), and dcraw (https://openbenchmarking.org/test/pts/dcraw) are useful starting points for system designers looking to evaluate relative CPU performance.
CPU performance alone is not the only important performance parameter. Certain computing tasks including deep learning are greatly accelerated when run across many parallel processor cores. The architecture of graphics processing units (GPUs) is ideal for accelerating these applications. Many of the GPU performance benchmarks are focused on 3D gaming performance and are not representative of the type of computing used by vision systems. For Nvidia GPU hardware, CUDA specific benchmarks like SHOC Scalable HeterOgeneous Computing (https://openbenchmarking.org/test/pts/shoc) should be used to evaluate GPU computing performance.
Memory bandwidth is measured in GBit/sec and is the speed at which information can be read from, and stored to, a system’s memory. This is an important specification for applications performing complex CPU or GPU vision processing. Capturing high-resolution images at high speeds or from multiple cameras produces large amounts of image data that must be loaded into memory and acted upon by a vision application. The stream benchmark (https://openbenchmarking.org/test/pts/stream) is useful for evaluating a system’s memory performance.
Systems relying on GPU-accelerated computing will require much higher memory bandwidth than systems using a CPU with a small number of cores. The 36 GBit/sec of memory bandwidth available on an Intel Core i7 7700K would be insufficient for the 2560 CUDA cores of an Nvidia GTX 1080 GPU. To provide the CUDA cores with sufficient memory access, this GPU has eight gigabytes of dedicated high-speed memory, providing 320 GBit/sec of memory bandwidth. The cl-mem benchmark (https://openbenchmarking.org/test/pts/cl-mem) can be used to evaluate GPU memory bandwidth.
System configuration including operating systems, software, drivers and cameras can have a big impact on the performance of a vision system. When using GigE Vision cameras, a filter driver can significantly reduce the system resources required to receive incoming ethernet packets and assemble them into images. FLIR machine vision cameras have advanced onboard image processing including Bayer pattern demosaicing, image sharpening, color correction, and pixel format conversion. By offloading image processing tasks to the camera, less processing power is required by the host system. This allows designers to increase performance or reduce costs by using less powerful hardware.
Not all performance criteria can be evaluated with common benchmarks
Embedded systems based on field programmable gate arrays (FPGAs) are also available. FPGAs enable system designers to carry out complex vision processing tasks in hardware, which is much faster than executing them programmatically on a traditional CPU. Many new FPGAs, like the Intel Stratix 10, are hybrid SOCs. They combine physical ARM cores with FPGA-programmable logic. The suite of benchmarking tools available from openbenchmarking.org can be used to evaluate the performance of the physical ARM cores in a hybrid SOC. Standard benchmark tests will not be useful for measuring the performance of IP cores implemented in programmable logic. Comparing the performance of different IP cores, or a specific IP core implemented on different hardware, requires real-world testing.
CPU manufacturers may specify the thermal design power (TDP) of their CPUs. TDP is an estimation of the power dissipation of a CPU under real-world conditions. It is not the maximum power a CPU can dissipate. While this is a useful starting point when comparing different CPUs, it should not be relied on heavily. The real-world applications CPU manufacturers test with are not necessarily representative of the applications a vision system will be running in the field. While Intel and AMD both specify the TDP of their processors, they define and measure it differently.
Evaluating power consumption is further complicated by SOCs equipped with low-power states designed to minimize their power consumption under light loads. The Exynos 5422 SOC features Samsung’s big.LITTLE architecture which pairs low-power, low-performance CPU cores with high-power, high-performance cores. The power consumption of an Exynos SOC will differ significantly depending on which cores are being used.
When selecting a computer for an edge computing or embedded vision system, designers should consider how their cameras will be powered. For each USB 3.1 Gen 1 camera, USB ports must be able to supply 900 mA at 5V. If the system will use Power over Ethernet (PoE) cameras, additional power must be supplied via the ethernet ports. If a computer does not support PoE, GigE vision cameras will require external power via GPIO or a PoE injector.
TDP and camera power requirements are a good starting point for evaluating the power requirements of an embedded computer. However, with the large number of variables involved, the best way to measure power consumption for an edge computing or embedded vision system is direct measurement with a power meter.
In addition to power consumption, power dissipation is an important design criterion to consider. In compact embedded systems with many components sharing a small mechanical footprint, the heat given off by components may result in performance loss. If systems get too hot, most CPUs will enter a thermal protection state, reducing their processor speeds. It is essential that the CPU of a system be adequately heatsinked. In outdoor systems that must be enclosed for environmental protection and will be exposed to temperature extremes, other components may also require heatsinking or active cooling. The FLIR ETS320 thermal imaging system for electronics testing can easily identify additional components requiring heatsinking.
Vision systems designers have access to a wide range of embedded computing platforms. The public archive of system test data available at Openbenchmarking.org can help system designers to compare their embedded computing options before purchasing samples. The tools they provide can also be used to perform in-depth comparisons later in the evaluation process. Selecting benchmarks which are relevant to vision applications, and understanding their limitations crucial. Power consumption and dissipation are more difficult to estimate without direct measurements from systems running the production vision application.