Teleoperation has become a hot button topic in the world of humanoid robots. Particularly controversial is the question of whether such control amounts to legitimate tool or cheap, Wizard of Oz-style trickery. As with all other aspects of robotics demos, the truth boils down to transparency and context.
VR tele-op is an extremely valuable tool for training robots and deploying robots, as well as a fallback override for when things don’t work out as planned. It has proven especially helpful for tasks that still demand a human touch. Robotic surgery pioneer Intuitive, for instance, recently orchestrated a joint procedure involving two surgeons on opposite sides of the Atlantic. Working out of France and the United States, the medical professionals operated together on a tissue sample using a da Vinci robot.
Highly specialized — and delicate — tasks such as these seem destined to have humans in the loop for the foreseeable future. For now, surgeons can lend a guiding hand. In the future, they will likely continue to provide a well-tuned eye. Technologies like remote surgery hold the promise of democratizing access to those unique skillsets. The true efficacy of implementation, however, is a much longer conversation involving healthcare access, politics, economics, and regulatory bodies, among other massive topics.
Other clear applications for teleoperation include access to those places where humans can’t — or simply don’t want — to go. Search and rescue is already a key use case, in the wake of earthquakes, fire, and severe weather. It can be hugely valuable to offer remote human oversight by way of a robot or drone. Space exploration opens similar opportunities.
Teleoperation is also one of several tools being deployed to help close robotics’ massive physical AI data gap. NVIDIA, for instance, has begun to integrate mainstream VR headsets from Meta and HTC to help human control system manipulation, in order to transfer human-to-robot skills.
If tele-op is such a valuable tool, why has it gotten such a bad rap? The problem is – pardon the pun — a matter of perception. Back in 2022, I wrote a piece that noted the technology had, “become something of a dirty word” in robotics? The quote in question was a reference to vehicle autonomy (last mile delivery robots in that specific instance), which very much set the stage for contemporary humanoid conversations.
It was vitally important to loop humans into the testing, training, and implementation stages of self-driving cars. While an industrial robot can present a safety risk at any speed, most of us have — at least — second- or third-hand knowledge of the damage a car can do. Having a human behind the wheel in the early days was about more than just safety optics (though both can be true). Systems were simply nowhere near adept enough at dealing with edge cases to be trusted on the road without immediate human intervention.
For the past several years, discussions around self-driving cars have largely been about recalibrating expectations. We can — and still — should be impressed when genuinely impressive milestones are reached. Waymo getting the greenlight to cruise down California highways is, indeed, a big deal. Level 4 autonomy still limits driving to specific areas, but this news is, indeed, a major step.
Watching humanoid robots capture the public imagination in a very real way, I can’t shake the sense that the category is due its own recalibration. The mainstreaming of generative AI has no doubt primed the pump for unrealistic expectations around these technologies. People love to laugh off polls about public perception around AI sentience or the existence of human-level AI, but can you blame lay people for arriving at such conclusions when suddenly confronted with a magical program that can write a story or paint a picture from a simple prompt?
Something humans excel at is projecting humanness onto non-human objects. Doing so imbues automation with a sense of intentionality, fills in gaps, and risks dramatically overestimating what systems are capable of. The phenomenon is further muddied when the embodiment of these systems are expressly designed to look human. Add in conversations about teetering on the precipice of AGI and general purpose you have an ecosystem perfectly positioned to over promise and under deliver.
I’ve written a good bit about the recent blurring of lines between robotics demos and ad spots more akin to car commercials. The “How to Fake a Robotics Demo for Fun and Profit” story I wrote last year, with an assist from Brad Porter is probably overdue for an update. The piece lays out several ways in which more cynical robotics developers can juice demos to make it appear as though these systems are far more capable than reality bears out.
Porter cites the “Wizard of Oz demo,” specifically referring to the use of teleoperation. He writes, “Unfortunately, it’s really hard to know if someone is doing this or not, but its a really low-integrity thing to do to show a robot doing something and not reveal the human controller behind it. But people do it. If you're considering a significant investment in a robotics company, never go just off the video... go see it first-hand.” That quote wasn’t specifically referring to humanoids, but it might as well be. These videos are perfect fodder for the public imagination.
Pareidolia is the product of the millennium of human evolution that has hard wired our brains to identify familiar objects like animals, people, and faces in otherwise random formations. Throughout their history, humans have personified and anthropomorphized animals, objects, nature, abstractions, and deities. Designing a robot that looks specifically like a human only further muddies the water. If people are prone to ascribing intentionality to an LLM’s output, it’s not a particularly large leap to view a humanoid robot’s actions through a human lens.
Our last few conversations with Berkeley professor Ken Goldberg have attempted to unpack some of the complexities of Morovec’s Paradox — that is to say, why some things that are easy for people are difficulty for AI/machines and vice versa. Viewed through a human lens, none of this is immediately apparently. It’s strange to watch a video of a robot do a running wall flip one day and then struggle to tie a tie the next. AI’s evolution is an intriguing and frustrating combination of outpacing some expectations, while falling dramatically behind on others.
While viral videos have become a powerful tool for promoting the industry in recent decades, they’ve helped reframe the parameters of robot videos. Lab videos, which are so closely tied to published research, tend to be transparent by default, with the playback speed clearly labeled on screen. They are traditionally not the domain of short cinematic cuts. While it’s often not clear how many takes a certain shot required, those results should (hopefully) be reflected in supplementary research.
Tele-op has become an increasingly popular tool in the research space. It is — again — an important tool for controlling and training systems. Academia appears to have largely accepted its role as such – assuming that, like manf other popular tools, usage is disclosed. As far as I’m aware, there are no rules requiring its disclosure in product advertising.
Failing to disclose its use can lead to several issues. First it can set unrealistic expectations about what a robot is currently capable of autonomously executing. If abused in that way, the public may, understandably, write the technology off as a way for manufacturers to overpromise on timeline and features. It certainly doesn’t scale to have a one-to-one human operator for every robot in the world, but using the tech an arrow in the overall training quiver can be immensely useful.
This is particularly the case in unstructured environments — which, as it so happens, involves the home. Barring some game changing breakthrough, remote operation is likely to play a key role in making systems more robust and adaptable to these challenging and changing environments. It is fair, however, to ask, whether a system that requires significant training of this nature ultimately qualifies as a commercial product, or just a glorified early beta. Some degree of customization is certainly acceptable, even expected. Warehouses deploying AMRs, for instance, still require scans, while advanced robot vacuums like Matic require an initial mapping run of your home, which is continually updated as objects — and people — move around.
Once again, this only functions with true transparency. That’s particularly true in the home. Not only is this an environment that requires extra mapping attention, for many — or most — it’s a place where privacy is paramount. Think of the various privacy scandals over the years emerging from microphone and cameras on smart speakers and robot vacuums, and then extrapolate that into a humanoid robot fully remotely controlled by a human on the other end.
Few would argue that this is, at very least, a tradeoff of early adoption. That’s not to say it’s entirely a non-starter. I suspect there are plenty who would consider the benefits of a properly encrypted system. There’s a reason robotics companies often look to older adults seeking to retain independence. As populations continue to age, in-person care becomes much harder to come by, and teleoperation offers potential for more efficient remote assistance — though not without its own risk when dealing with especially vulnerable populations.
The best way companies can address inevitable privacy issues is to disclose early and often. That includes betas, advertising material, and most significantly, when the product is out in the world.
Be clear what sensors are on-board, what they’re collecting, when, and why. Make it obvious to anyone within that scope that they are being recorded. Make data collection opt-in and off by default. Execute as much compute as possible on-device. Vet and properly train operators. Give users access to all data collected, including the ability to perform remote wipes at will. Collect data points, not sound and video.
That’s the tip of the iceberg, but it seems like a decent enough place to start.