Minding the Gap: Ai2 Commits to Simulation-based Training With MolmoBot and MolmoSpaces

By Brian Heater, Managing Editor, A3
03/11/2026
6 minutes

Ai 2 MolmoBot Test

Much of our recent coverage has focused on what we’ve informally termed the “liminal zone” of robot training. While there’s a consensus that real-world data collection is a critical piece of developing robust physical AI models, it’s unclear how we can deploy enough robots to collect that data at scale without the aid of such models. In other words, how do we kickstart the flywheel? 

In an interview tied to Ai2’s (The Allen Institute for AI research) recent launch, the organization’s director of Perceptual Reasoning and Interaction Research, Ranjay Krishna, noted that — while there are plenty of industry robots currently deployed in the real world — many are fixed and tasked with highly repetitive jobs. These systems aren’t especially useful when it comes to building dynamic world models.  

 “What we really need is robots working in unstructured environments for this data flywheel to become useful — places and environments where it's not clear what's going to happen in those environments and your robot has to adjust its behavior,” says Krishna. “In your home, maybe you spilled something, maybe you fell down, you'd want your robot to do something in response to things that you're doing. We don't even have robots good enough or safe enough even that we can deploy them in these unstructured environments. The state of Flywheel, to get it even started, we need robots to be good enough where we can put them out in some unstructured environments and we're just so far away from that.” 

How difficult and expensive will it be to collect real-world data at scale? Opinions vary. Physical Intelligence cofounder, Sergey Levine, recently told me that many in this space are overestimating such challenges.  

“In the grand scheme of things, maybe sometimes something that we as researchers get a little bit mixed up about is easy for us versus easy in the context of human civilization,” he said. “And in the context of the world as a whole, actually getting real-world data and then deploying robots that are going to collect more experience and get better and better is a lot easier than inventing some other technologies just to avoid having to do that. And because it's a bootstrap problem, it's easier once things are out in the world.” 

In our conversation, Krishna addresses some of the key challenges facing this real world data gathering.  

“Whenever we've had to go out and collect data ourselves, it's not just that you need the hardware and everything set up correctly, but you need to train people to give you data that's actually useful for training robots,” he notes. “And the training process isn't easy. You have to get people to move in ways that feel unnatural to them. They're usually tele-operating these robots. They have to learn the form factor of the robot that they're moving. It's quite a lot of actual setup that goes into the process before you even have a usable data collection process.” 

Simulation has become an increasingly popular approach to training and data collection. The process requires a considerably lighter lift in terms of both time and resources. The generation of synthetic data does, however, come with important caveats. Topping the list is the fact that simulation can get important factors wrong, and the inaccuracies inherit in the sim-to-real gap can compound over time.  

“The hypothesis is that you can train a robot in simulation, but then it'll only work in that simulation,” says Krishna. “The moment you take it out and deploy them in the real world, they're suddenly not going to be able to adjust. The reason it's so difficult is because all the simulation engines we build, they're all an approximation of our real world. We approximate physics, we approximate forces, we approximate material properties. And so when your robots train in these environments, they learn the sort of idiosyncrasies of that simulator. If  your gravity isn't exactly right or physics and collisions aren't exactly correct, it'll learn to model those kinds of errors. And when you go to the real world, suddenly those errors can compound over time. So you might expect a cup to be firmer than it actually is.” 


 NEW ONLINE TRAINING COURSE

Designing Industrial AI Agents

Gain the skills to orchestrate advanced AI agents that learn, adapt, and collaborate like experts in real-world automation environments

Learn More

 

Despite these potential pitfalls, Krishna considers himself, a “big proponent of simulation.” The strategy forms the core of MolmoBot, a new suite of robotic manipulation models from Krishna’s Ai2 team.  

In a release tied to the platform announcement, Ai2 posits the following rhetorical, “Most approaches try to close the sim-to-real gap by adding more real-world data. But what if the gap shrinks when you dramatically expand the diversity of simulated environments, objects, and camera conditions instead? If training no longer depends on proprietary, manually collected data and is instead rooted in scalable simulation, robotics research becomes more reproducible and broadly accessible. 

In its present form, the system is designed for a pair of research robotics platforms, the mobile Rainbow Robotics RB-Y1 and Franka FR3, a robotic arm designed to be mounted to the top of a table. The latter fits in with those fixed — and often repetitive–– robotic systems currently deployed across a range of industrial settings. The former, on the other hand, represents the push toward “general purpose” systems embodied by the recent humanoid wave.  

“In our evaluations, our best model achieves zero-shot transfer to real-world static and mobile manipulation tasks on unseen objects and environments without any fine-tuning, achieving competitive performance with prior methods including π0 under standard benchmarking protocols,” Ai2 writes.  

In one very real sense, the company is putting its money where its mouth is, keeping the platform open and offering researchers, testers, and other interested parties access to data, data generation pipelines, code, and a technical report for vetting purposes.  

“We want to build open models because our goal at the end of the day is to mobilize the entire scientific community,” says Krishna. “It's to make possible for other scientists to explore questions that they might not be able to explore because they don't have the infrastructure to do it or the data to do it or the models to do it. What we try to do is make sure that everything that we build is completely open. All our data sets get released publicly. All our models get released publicly.” 

Krishna applauds the robust work NVIDIA is doing in simulation, noting that the Allen Institute utilizes Isaac in-house. 

“The difference is the diversity of simulation data,” he adds. “They've been working on a set of projects called RoboCasa, where they've built kitchen environments. These are single rooms with kitchen items where they train robots. The biggest difference is we're building simulation environments that are really large and very diverse. We have large houses with five to ten rooms. We have conference centers with multiple rooms. We have libraries. We're building hospitals and all of them have a wide diversity of objects and that diversity is what we're betting on. It's scale. It's scale and diversity.” 

That scale and diversity arrives in the form of MolmoSpaces, a massive ecosystem for simulation featuring, 230,000 indoor scenes, 130,000+ object assets, and over 42 robotic grasp annotations grounded in real-world physics.

MEET THE AUTHOR

Association for Advancing Automation

Discover how Association for Advancing Automation can support your automation journey with their complete range of solutions and expertise.

Visit Company Website