Amazon FAR Humanoid Wall Flip

This story first appeared in A3's Automated newsletter. Subscribe here

I want to talk about Moravec’s Paradox, so I put the question to LinkedIn. A handful of responses suggest a handful of experts, including recent Automated guest, Rodney Brooks, and the eponymous Hans Moravec himself (thanks Helen G.). The Austrian computer scientist remains on my guest wish list at the moment. For the sake of keeping to my weekly newsletter cadence, I instead reached out to an expert in the field I’ve spoken with frequently over the past several years.  

I emailed Berkeley professor and Ambi Robotics chief scientist Ken Goldberg a note with the pertinent paradox serving as the subject line. “Hi Brian,” Goldberg responds almost immediately, “that’s funny because I was just talking about him.” Before replying, I take a moment to consider the number of people on Earth who could have genuinely responded to my initial message in this manner. I imagine it’s a fairly small club. 

As for why I, specifically, was thinking about Mr. Moravec and his wonderful paradox, the short answer is humanoid robots. Humanoid robots, expectations, and the disconnect between what these unquestionably technologically impressive machines can and cannot do, right now.  

It’s difficult to pinpoint the origin of the paradox as an abstract thought, but the nameable phenomenon can be traced back to roughly the same time the Oakland Athletics won a World Series title (I’m not bitter). While serving as the director of CMU’s Mobile Robot Lab in the late-80s, Moravec observed that it was far easier for artificial intelligence systems to master chess and checkers than mobility or perception. The inherent paradox there being, of course, that humans generally learn such basic motor skills at a younger age than abstract strategy games.   

The details and timing of Moravec’s observations forever linked roboticist and concept, though he certainly wasn’t alone among contemporaries in arriving at such conclusions. Brooks and Marvin Minsky are also cited as key contributors. 

I believe I was first made aware of the phrase around 2019, in connection with MIT’s mini-Cheetah robot. As one CSAIL instructor succinctly put it, when it comes to robots, “Flipping is easier than walking.” 

Along with locomotion, dexterity remains a major source of Moravecking. In September, Brooks addressed the latter with a highly publicized post simply titled, “Why Today’s Humanoids Won’t Learn Dexterity.”  

“[Y]ou see [humanoids] dancing and doing Kung Fu and they're incredible,” Goldberg tells me. “The mobility is really, really good. But the dexterity is still really far behind and people don't understand. They look at something dancing or doing backflips.” 

One of the major issues is the lack of data. Large language models like OpenAI operate as well as they do because they are essentially trained on the entirety of available human knowledge. There’s no equivalent for physicality. The last several decades have seen a number of researchers — including those on Goldberg’s Berkeley team — working to collect this data. But we’re still a long way off from a model robust enough to train robots to manipulate the world in a similar way. Simulation can help, but even synthetic data can only go so far when it comes to understanding the ins and outs of unstructured environments.  

“[T]he vastness of data that's being used to train large language models is like 100,000 years’ worth,” says Goldberg. “[W]e're nowhere near getting that for robots and it's going to take a while. But there's some glimmers of hope because companies are showing you they can do some tasks.” 

Goldberg points to towel folding as an example. “[T]hink of the tasks like tying a shoe,” he adds as a counterpoint, “or putting keys on a keychain. Or peeling an orange. These are all these really nuanced tasks that are really subtle. We seem to do them pretty much instinctively. Buttoning a shirt is another great example. You never see a robot buttoning a shirt. You won't for a long time because it's very, very nuanced.” 

When it comes to setting realistic expectations for what robots can and cannot do, marketing only muddies the water. It lives on that razor’s edge between the possible and the improbable. As for what constitutes the latter, well, that’s somewhat subjective at this stage. Timelines tend to diverge wildly, when it comes to certain tasks. Some, like those accelerated by the advent of LLMs, have developed much sooner than many expected. Others can feel like just as distant a possibility as they did decades ago. 

The nature of the robot video has also significantly evolved, particularly when VC dollars are involved. The stodgy lab videos with full disclosure of speed and other metrics are mostly confined to research facilities these days. The early days of viral YouTube robot videos led by institutions like Boston Dynamics have since given way to something more Madison Avenue than U of W Madison.  

As robots are increasingly targeted toward consumers, many robot videos have come to resemble car commercials. It goes well beyond consumer, too — essentially if you’re venture-backed and have a vaguely interesting-looking robot, marketing firms figure it might as well be a sports utility vehicle. 

If we weren’t already heading toward a bubble (we already were), this disconnect between expectations and reality would likely have assured one. Again, the next few years will be a mix of good and bad surprises, as some milestones are achieved earlier than assumed, while others continually see the field goals shifting back. In the long run, transparency is in the best interest of all parties.  

Agility offers a good template here, with these warts and all. Granted, the environments created on the floor at Automate are significantly less complex than what these robots will encounter in the field, but there's value in watching the systems operate for long stretches. I understand why many companies don’t want to show the safety barriers or the hiccups that find their systems toppling over. But I’ve always been a proponent of outtakes and bloopers — and not just because it’s good for website traffic. Let’s be realistic about how difficult these problems are.

That way when those breakthroughs do happen, they’ll get the reception they deserve.