Industry Insights
Safety Begins at the Silicon
POSTED 09/14/2011
| By: Kristin Lewotsky, Contributing Editor
Self-testing, redundant MCUs support smart drives for safety applications.
With high-speed, high torque motion applications proliferating, machine designers and OEMs increasingly focus on safety. Properly executed, safety not only prevents operator injury, it minimizes machine damage and downtime, increasing revenues. Modern motion components have become ever more intelligent, boasting on board memory and sophisticated processors. As a result, safety capabilities have propagated out from the programmable logic controller (PLC) to peripheral components like drives, providing machine builders and integrators with a range of options for integrating safety into their systems.
Although occupational health and safety underlie any safety discussion, a key motivator has been increasing uptime and, thus, productivity. Functions like Safe Limited Speed provide users a more productive way to work with a machine. In the event of a jam, the power of the machine itself can be harnessed to assist in the removal of blockages, for example. “Pretty much any application can benefit from Safe Torque Off, but safe-motion-related functions like Safe Speed, Safe Acceleration, and Safe Direction monitoring have been widely adopted in applications in which there is a significant amount of interaction between the operator and the machine," says Matheus Bulho, Director of Product Management for the motion control business at Rockwell Automation (Mequon, Wisconsin). "Applications like printing and web handling operations are areas where they have been able to easily justify the use of safety and see the benefits earlier.”
There was a time drives consisted of basic circuitry that merely carried out commands generated by the controller. No more. Today's drives integrate microcontroller units (MCUs) that feature multiple processing cores, on-board memory, and peripherals like analog-to-digital converters (ADCs). “The adoption has been significant," says Bulho. “Today, a very large percentage of the drives that we ship go with some level of safety. In the next five to seven years we expect it to be almost 100%."
At its most basic, safety encompasses three fundamental concepts: redundancy, diversity, and self-monitoring. Of course, when it comes to components like smart drives, safety begins at the silicon. “When you talk about motion control, you're talking about interrupts that are running 50 or 100 µs at a time," says Jeff Stafford, C2000 MCU applications engineer for Texas Instruments. “You need to verify that the torque being applied to the motor is the correct torque and that entails a lot of things, from the software that calculated the torque to the pulse-width modulators that generate the waveform to the inverter, all the way to the signal coming from the ADC and diagnostic coverage of the processor. They're all the pieces of the puzzle for system-level compliance."
Multicore MCUs can be divided into two classes: homogeneous, which feature two of the same processor; and heterogeneous, which integrate two different processors. A dual-core homogeneous MCU satisfies the function of both redundancy and self -monitoring. The chip can provide dual-core, lockstep operation, which means that each of the two cores processes the same operation in parallel, and then the chip compares the results to look for any errors. As a result, at every clock cycle, the drive can confirm that the CPUs are executing their commands correctly, in addition to checking operation of the memory and other peripherals. The idea is to establish what Stafford calls a safe island - a known, working CPU that provides a trustworthy foundation for diagnostics that addresses the rest of the system. “When we think about system-level compliance to safety, we get all the way down to a granular level of looking at dual core CPUs during each instruction, each clock cycle," says Stafford.
The alternative is a heterogeneous MCU, which typically operates in asynchronous mode. Unlike homogeneous designs, the chip cannot perform the lockstep comparison. It does, however, offer the option of two separate CPUs running completely different instruction sets with completely different software. At the same time, one of the processors can host supervisory level safety functions, although at a level below that of the main control processor. The approach introduces both redundancy and removal of a common failure mode. The drawback is that the process of failure detection in a heterogeneous MCU is both more complex and slower. Instead of lockstep failure detection, the system scans the CPU via software, or, more commonly, hardware at a speed slower than the clock cycle.
At first consideration, the reduced speed and elimination of lockstep failure detection seems like a significant drawback, but one of the truisms in engineering is that a solution does not need to be perfect; it just needs to be good enough for the application at hand. "What's happening in the PLC and programmable automation controller market is that instead of OEMs selling add-on safety boards that give high-level capabilities like Save Torque Off, they're now integrating that board into the actual control board," says Stafford. "Their failure response time is typically not in the tens or hundreds of nanoseconds, it's in the milliseconds range. Knowing that, with the dual-core, heterogeneous solution, you can still have that safe island approach as long as you're completing the CPU scan and the amount of time you have available." Properly designed, a smart drive featuring heterogeneous, asynchronous MCU can provide safety at a lower price point for less demanding applications.
Conducting self-diagnostics is essential to confirming chip health, but testing data, processors, memory, etc. can take hundreds of microseconds. That can be a problem for high-speed operations like 300-bottle-per minute packaging lines that might feature control loops of about the same duration. One way to avoid introducing latency is to interleave the diagnostics with the control routines. “To run the whole set of self-tests and get full coverage of the device could take anywhere up to 150 µs," says Dev Pradhan, Product Line Manager at TI. “If you're running a control loop that is 100 µs or something like that, you can actually time slice some diagnostics within it so you do not have to take your system down to run the diagnostics on the microcontroller. This allows users to run periodic diagnostics even in tight control loops."
Planning for safety
Safety-enabled drives give designers the opportunity to use centralized safety components such as safety PLCs for simple machines and distributed safety via safe drives for more complex or larger machines. It's not enough to have the right hardware and software, however. Sometimes the biggest challenge to implementing safety lies in the overall approach.
Designing in safety from the beginning is essential to success, yet all too often, it's just tacked on to the finished design. "The biggest obstacle is making sure that people are stopping to understand the purpose and not just putting a contactor in front of a panel or putting in a disconnect and assuming that everything else is going to be safe," says Bulho. “Safety has to be an integral part of the machine design process. It cannot just be an afterthought that people put in place to achieve what they think is compliance."
Building a safe machine is something like the fabled 1:1 map that is perfectly detailed but too large to unfold anywhere. A completely safe machine would be shut off, or utterly enclosed. In reality, machines operate along a continuum ranging from perfectly safe to maximally productive. Safety introduces the opportunity to ensure operator and machine health, while dramatically increasing productivity. According to a study by the Aberdeen Group, for example, companies effectively implementing safety boast overall equipment effectiveness (OEE) 14% greater than lagging companies, logging an average of 12% less unscheduled downtime.
“You can just lock the machine and it will make it very safe, but it is also not very productive because [in the event of a fault or a jam], it may take you hours to get in and out and put it back into production," says Bulho. “The biggest leap is to embrace safety as much more than just protecting people, to think of it as a way to improve productivity, to improve bottom-line results. It is one of the biggest opportunities that machine builders as well as the end-users have.”