Automated

With Brian Heater

April 1, 2026

Ranjay Krishna on Why Robots Still Fail in the Real World and the Data Problem Holding Them Back

Robotics is advancing quickly, but real world deployment is still far more difficult than most people expect.

In this episode of Automated, Brian Heater speaks with Ranjay Krishna about the fundamental challenges preventing robots from working reliably outside controlled environments, and why solving the data problem is key to unlocking the next wave of robotics.

Ranjay explains why today’s robots struggle with tasks that humans find intuitive, from learning by observation to understanding perspective and adapting to new environments. While AI models have made massive progress in language and vision, robotics introduces a new layer of complexity where actions change the world in real time and small errors compound over time.

The conversation explores the limitations of current approaches, including why training robots in simulation often fails to translate to the real world, and how the lack of diverse environments creates major gaps in performance. Ranjay shares how his team at the Allen Institute is addressing this by building large scale simulated environments designed to better reflect the variability of real world spaces.

They also discuss the concept of an ImageNet moment for robotics, and what it would take to create the kind of large, diverse datasets that transformed AI. By generating hundreds of thousands of simulated environments and scaling data collection, his team is exploring whether robots can learn more effectively in simulation and generalize those skills into the physical world.

The conversation also covers why robotics requires more than just better models, including challenges in hardware, sensing, and real world interaction. From embodiment and perception to reasoning and adaptation, it is a grounded look at why robotics remains one of the hardest problems in AI and what needs to happen next for the industry to move forward.

We’d love to hear from you! Have thoughts or guest suggestions? Reach us at [email protected].

You can find the transcript and more episodes of Automated at automated.fm.

You can find more episodes of Automated at automate.org/podcast.

Transcript

[00:00:00] Ranjay Krishna

For us and our focus, it's really been about navigation. Can we get these robots to move around in spaces that they've never seen before? A couple of years ago, in 2023, we scaled up all of our simulation environments and the bet that we were making is the sim-to-real problem is only a problem because our simulation environments lack the diversity that the real world has.

[00:00:34] Brian Heater:

Hello, and welcome to another episode of Automated. My name is Brian Heater. I am the managing editor at the Association for Advancing Automation, recently back from GTC. As we are recording this, I talked and wrote a lot about physical AI in the past week. So it is fortuitous timing to be running an episode with Ranjay Krishna. In addition to serving as an assistant professor at the University of Washington, he's been working on the robotic data problem at the Allen Institute for AI for some time now. A lot of folks at the NVIDIA events seem to share his opinion that more training might actually be accomplished in simulation than had originally been thought. It's a great talk, thanks to Ranjay. Also, Automated is coming to Boston. We are taking our first ever road trip in a few weeks. We're recording a bunch of episodes in person, including some in front of a live audience. We're also hosting a happy hour from 4 to 6 PM at Mass Robotics on April 8th. Thank you to Maxon for sponsoring that. Thanks to you for tuning in. If you would like to support Automated, please like and subscribe, please tell a friend. Also check out the newsletter over at Automated.fm and with that, please enjoy this conversation with Ranjay Krishna.

[00:02:08] Brian Heater:

So we recently spoke with Sergey over at Physical Intelligence and he said something that I hadn't heard before from a physical AI company, which was he felt that a lot of companies are maybe overestimating how difficult the process of actually going out in the real world and collecting data is. Is that a fair assessment?

[00:02:16] Ranjay Krishna:

Oh, absolutely. Whenever we've had to go out and collect data ourselves, it's not just that you need the hardware and everything set up correctly, but you need to train people to give you data that's actually useful for training robots. And that training process isn't easy. You have to get people to move in ways that feel unnatural to them. They're usually teleoperating these robots. They have to learn the form factor of the robot that they're moving. It's quite a lot of actual setup that goes into the entire process before you even have a usable data collection process.

[00:02:55] Brian Heater:

One of the really interesting topics that we've been discussing quite a bit on the show recently is the data flywheel and this point that we're at right now where it seems very like chicken and the egg. There aren't enough robots deployed out in the world, or at least potentially general purpose robots deployed in the world, to collect the physical data needed to deploy that many robots out in the world.

[00:03:19] Ranjay Krishna:

Mm-hmm.

[00:03:21] Brian Heater:

So yeah, how do we get to a point where we can actually start, and what sort of scale do we need to actually start collecting that kind of data?

[00:03:29] Ranjay Krishna:

So the way I look at it, we do have quite a few robots actually already deployed. But they're deployed in very structured environments, usually in factories where you know exactly what they're going to be doing. And so in those scenarios, the data flywheel gets you data that basically looks the same as each other. What we really need is robots working in unstructured environments for this data flywheel to become useful. And by unstructured I mean places and environments where it's not clear what's going to happen in those environments and your robot has to adjust its behavior. So in your home, for example, maybe you spilled something, maybe you fell down, you would want your robot to do something in response to things that you're doing. And we don't even have robots good enough, or safe enough even, that we can deploy them in these unstructured environments. And so to get the flywheel even started, we need these robots to be good enough where we can put them out in some unstructured environments. And we're just so far away from that at the moment.

[00:04:38] Brian Heater:

One of the things that we've seen really rise to prominence, especially recently, is synthetic data. NVIDIA has been a big proponent of this, which is actually training these robots in simulation. And obviously you can do that a lot faster and a lot more at scale. One of the things that always jumped out at me though is, are you risking the possibility of suddenly introducing hallucinations and introducing errors into the process by doing things that way?

[00:05:15] Ranjay Krishna:

I'm a big proponent of simulation. So NVIDIA's been doing a lot of great work in building out simulation engines and even environments in these simulation engines where you could train these robots. The biggest reason why a lot of roboticists have shied away from simulation at least recently is something they call the sim-to-real gap. The hypothesis is that you can train a robot in simulation, but then it'll only work in that simulation. The moment you take it out and deploy it in the real world, it's suddenly not going to be able to adjust. And this adjustment, the reason it's so difficult, is because all the simulation engines we build are an approximation of our real world. We approximate physics, we approximate forces, we approximate material properties. And so when your robots do train in these environments, they sort of learn the idiosyncrasies of that simulator. And so if for example, your gravity isn't exactly right or physics and collisions aren't exactly correct, it'll learn to model those kinds of errors. And when you go to the real world, suddenly those errors can compound over time. So you might expect a cup to be firmer than it actually is. Maybe you might break a glass because you held it too tight. All of these material changes or collision changes or physics changes actually makes it very difficult for these robots to adjust. Now what's interesting though is we at AI2 have been betting big on simulation for quite a while now. Since about 2017, we've been building out simulation engines very similar to NVIDIA's, but for the longest time for us and our focus, it's really been about navigation. Can we get these robots to move around in spaces that they've never seen before? And a couple of years ago, in 2023, we scaled up all of our simulation environments and the bet that we were making is that the sim-to-real problem is only a problem because our simulation environments lack the diversity that the real world has. And so we built out hundreds and thousands of environments. These are environments that look like museums, they look like schools or conference rooms or even people's homes.

[00:07:39] Brian Heater:

And you're talking about edge cases and things that might be difficult to otherwise predict in a standard simulation.

[00:07:47] Ranjay Krishna:

I'm actually talking about just the diversity of data that's available in simulation. When you look at a lot of simulation engines, they usually focus on one type of environment. For example, the kitchen. NVIDIA's built out a lot of simulation environments recently. But a lot of them are just kitchen environments because they want, at the end of the day, to be able to deploy robots in people's homes. But kitchens only give you some amount of diversity, right? It only has objects that you typically find in a kitchen. And so your robots, even if they do adjust to a set of kitchen environments, they might not be able to go to the living room and do anything meaningful there, because the objects that you see, the sofa, the TV, they might have never seen those before or interacted with them, and so suddenly they would get very confused about how to deal with that change. And so that's the bet we were making, is that if you can scale up the diversity and generate environments that represent all the places that we might ever want to deploy a robot and train these robot models across all of these environments, then suddenly they might just be able to work in the real world.

[00:08:54] Brian Heater:

To a certain extent it sounds like a different approach than what I've been hearing from a lot of companies lately, and this is especially coming from the robotics companies themselves, the humanoid companies, of let's get really good at doing a couple of tasks first and then sort of scale out from there. But it sounds like what you're suggesting here is, hey, let's just go for it. Let's build as big of a model as we can all at once.

[00:09:24] Ranjay Krishna:

That's maybe a fair assessment. If you look at what's worked really well with AI models today, if you look at language, for example, for the longest time linguists were really focused on solving very specialized tasks. Tasks like entailment, is what I'm saying going to imply the next thing I'm about to say, or sentiment analysis, is what I'm saying a good thought or a bad thought. A lot of those kinds of problems they were working on. At some point, around 2017 and 2018, we started basically throwing all of these tasks together and started training language models on all of the data that we had available. And we saw a lot of what I call complementary behavior where learning to do one task allows you to improve on your performance on something else. And so today you can ask these models to draft an email for you, to help you edit things, to give you ideas, to help you brainstorm, even talk through your problems. And all of them require a different set of skills, but because they've been exposed to everything, they learn from all of these different areas to improve on any specific skill. And I'd be willing to bet the same thing is true for robotics as well, that if you train these models across all different kinds of capabilities, in as many environments as possible, with as many form factors as possible, they will get better.

[00:10:55] Brian Heater:

In a sense, are people overestimating how dissimilar training for physical AI is versus these large language models that we've seen over the past several years?

[00:11:17] Ranjay Krishna:

In one sense, I think there are similarities. A lot of the problems that we're trying to solve are similar to the problems we've had with language understanding or visual understanding where, if you expose them to a lot of different kinds of data and train them with a general purpose machine learning model, they should learn useful skills. So that component I think is common. The component that's maybe not common is that robotics is fundamentally an active process. You train these robots to act in the world, and when they act in the world, the world itself changes. And that active process is something that's really hard to model. Whereas in language, when you feed a model an input, the input stays the same, and so they just respond to that one input. But in robotics, everything changes the moment the robot does anything or if anyone else in the environment around them does anything. And that active problem basically makes it so that you can't just make decisions in isolation. You have to make decisions sequentially. And that sequential decision making process means that a lot of errors, if you make a mistake on any step, they can compound over time. This is what makes deploying these robots so difficult, and why they're so unreliable even today. Even if you get to a robot model that's about 90% accurate, that 90% accuracy at one point in time means that over five minutes or over ten minutes, that accuracy is gonna go way down because that error is going to compound.

[00:12:46] Brian Heater:

And it strikes me too, you know, we will be talking about a lot of the work that you and your team have been doing more recently. Is it MoMo?

[00:12:57] Ranjay Krishna:

MoMo. Yeah.

[00:12:58] Brian Heater:

You've primarily been testing it with these two research arms. You've been testing it with a tabletop arm and a mobile manipulator. And again, it seems like those are like vastly different levels of complexity, right? A stationary arm on a table versus one that's actually moving around as it's interacting with the world.

[00:13:16] Ranjay Krishna:

Yeah. So maybe I can give you a little bit of background about what we've been doing. At the Allen Institute, like I was saying, we've been really focused on navigation for a long time. And a couple of years ago is when we finally decided that our environments were diverse enough where it now became possible for us to start thinking about more complex robotics tasks. So not just moving around in space in the case of navigation, but more complex tasks like being able to pick things up, move things around, open drawers, close doors, things that require you to simultaneously move your body as well as manipulate the objects around you. Even if you take the case of opening a door, that's actually quite a hard problem, something that we do so seamlessly. But you have to think about what to grab first. You need to know where the handle is. You need to move your arm to that handle, grab it. But as you're opening the door, the door is going to collide with your body, and so you need to move your entire body as you're opening that door. And so that process means that you need full control over the entire set of joints on the robot to be able to do that one simple task. And that's not easy at all because a lot of the robots that we have have so many joints. We're really hoping that if we were to train these models to do tasks that are vastly different from each other and with robots that are vastly different, like you're saying with some that are single arms that are static, not moving, and entire robot embodiments where you can move the entire robot body, training across all of these is basically going to allow us to generalize these models to handle new use cases.

[00:14:57] Brian Heater:

Reading through the release, that's something that jumped out at me that I appreciate here is, well, one, you're keeping it open, but two, you're saying, hey, check our work here. We're showing this off to you and we want you to work on this. We want to kind of crowdsource this to as many people as possible to see what parts of this are working and what parts aren't.

[00:15:22] Ranjay Krishna:

At the Allen Institute, our goal has always been to build breakthrough open AI, and we want to tackle the biggest challenges that are out there. Robotics is one of those challenges that's really been at the forefront for us for a really long time. And we want to build open models because our goal at the end of the day is to mobilize the entire scientific community. It's to make it possible for other scientists to explore questions that they might not be able to explore because they don't have the infrastructure to do it or the data to do it, or the models to do it. And so what we try to do is make sure that everything that we build is completely open. All of our datasets get released publicly. All of our models get released publicly. All of the code that we use to train these models, the infrastructure we use to train them, all of it gets publicly released. And this allows a lot of people to actually just build on top of our work. And of course, they get to test our hypotheses, they get to test our results. And that's really great as well for the research community because only through reproduction of scientific results can we gain a certain level of confidence that this is actually something that's meaningful. And that's really been the biggest driver for the researchers in our organization.

[00:16:42] Brian Heater:

Yeah, this is something I was curious about, because you currently have your feet in a lot of different worlds. You're still in the academic world, you're still teaching, you're at the Allen Institute, AI2 and the institute are connected. I don't know if you would say they're synonymous. Is AI2 sort of a more commercial side of that? How do the two parties interact with one another?

[00:17:08] Ranjay Krishna:

I guess I have my feet in two different places. One of them is, I'm a professor at the University of Washington. There I run a research lab with my PhD students and we're exploring novel ideas and tackling different kinds of problems across a wide range of AI spaces. A lot of it is focused primarily on computer vision and of course now language and robotics as well. And the second place where I spend a lot of time is the Allen Institute, or AI2. I would say the two places are distinct. The Allen Institute is a nonprofit organization. Its primary goal is to produce very exciting open AI research that other people can build on. Our goal is to mobilize the entire scientific community, and the problems we tackle there tend to be a little bit larger in scope than the problems we end up tackling at a university. And that's largely because there's a difference in the amount of resources you have allocated at a university versus at the institute. And so at the institute, we're really trying to take big bets and we take on large projects where we bring in multiple collaborators. We collaborate quite a lot with tons of other universities as well as researchers from private institutes as well, who are interested in dedicating some amount of their time to doing open-ended research with us.

[00:18:43] Brian Heater:

When you say big bets, to me that implies a payoff. I mean, is a payoff just that this thing is successful, or is there also some sort of fiscal payoff as well? Is there potentially money to be made by creating these successful, even though they're open source, models?

[00:19:00] Ranjay Krishna:

So for us, all of the researchers at the Allen Institute are really driven by impact. What we want to do is produce AI that can be for the social good. We want to build things that actually make the world better. And so for us, it's never been about making money from the work that we do, but through seeing our work being used by others and seeing the field progress as fast as possible. If you look at the state of AI research today, there used to be this golden era until about five years ago where everything that all of these companies were doing, they were publishing. And that meant that we could all move faster because we were all building on each other's work. Google and Meta, these companies were putting out research, they were putting out infrastructure to help us at universities do research that we couldn't do before. We were able to try out ideas using the kind of compute and scale that places like Google and Meta had. But over the last few years, a lot of these institutions have essentially closed off all of their research. The amount of publications have gone down and for most of the most exciting ideas, the things that are actually working really well, they've stopped publishing a lot of these different ideas. And what that's meant is that progress in AI, although it might look like it's going fast, we could be going a lot faster today, but we're not because a lot of these institutions have closed off. So in a sense, we at the Allen Institute are filling that gap that all of these other companies originally had. And so we're really excited about being able to produce research and really go after problems that require more resources that maybe universities can't tackle, and hopefully through that process, make it more affordable, make it easier for other people to build on top of.

[00:21:00] Brian Heater:

Yeah. So you studied with Fei-Fei Li at Stanford.

[00:21:03] Ranjay Krishna:

That's right, yeah.

[00:21:04] Brian Heater:

And you know, obviously one of the primary things she's known for is ImageNet. And you know, in a sense, the work that you're doing here is kind of creating a physical three-dimensional version of that. And it strikes me, there's all of these companies out there that are collecting data that are making their siloed datasets. When you're keeping models open like this, do you feel like there could be some sort of centralized dataset of physical, real world data?

[00:21:34] Ranjay Krishna:

You know, naturally a lot of my thinking and a lot of my research interests have been largely shaped by Fei-Fei Li during my PhD years.

[00:21:43] Brian Heater: I imagine it would be hard not to be around her and not have a lot of that rub off on you.

[00:21:48] Ranjay Krishna:

Of course. Yeah. And we all saw the impact that ImageNet had on the AI community. In a sense, you're not too far off. I am constantly thinking about what that moment looks like for robotics. And so we released earlier last month a project called MoMo Spaces. And this is our first attempt towards this ImageNet moment for robotics. What MoMo Spaces is trying to do is take all of these environments that we've created, and these environments are super interesting because scaling up these environments where you can train robots is really difficult because you want them to be as diverse as possible. And so the way we create them is by prompting LLMs to tell us, okay, I want to create a conference room, what kind of objects would you find in a conference room? It tells us. And then we use that to go find 3D assets from our library. And these 3D assets are assets of chairs, tables, whiteboards like the one behind me. And we then go back to the model and say, okay, what are the constraints for these objects in these 3D spaces? Where would you find a whiteboard? Where would you find a chair relative to a table? Then we take all of those constraints and we treat it as an optimization problem and create an environment with all of these assets in there. And then once we have it, we actually go ahead and manually verify whether that environment looks realistic, whether that's an actual place that might exist, and we also verify that the material properties of these objects are correct, that the cup has the right weight, the right kind of material properties as well. And then it's ready for us to start testing with real robots. We've created about 230,000 of these environments and they have millions of objects in them. And of course, it's not just that they have millions of objects. All of them also come with places where you can grasp them. So we've annotated all of that information as well, meaning that if you wanted to train a robot in this environment, we'll already tell you, hey, these are the places where you can grasp this object to pick it up and do something with it. And we're now building out what we're hoping we'll release soon, which is MoMo Bot. And MoMo Bot is a project that's got multiple different aspects to it. One aspect is data generation. Because we have these simulated environments, we know everything that's in those environments. We know exactly where that object is, we know what shape it has, we know where to grasp it. We can generate data very, very easily. All we need is some GPUs and then we can automatically scale up data generation. We've been generating over the last couple of months millions of robot trajectories. And to do this in the real world would be, like we were talking about, very very expensive. You'd have to train people to do it. You'd have to buy the robots. You'd have to put these robots in spaces, move them around, and that's a lot of effort. And if you wanted to take a bunch of robots to 230,000 different locations, that's going to take you years to do. Whereas for us, we can do this in a matter of a couple of months.

[00:25:25] Brian Heater:

Mm-hmm.

[00:25:25] Ranjay Krishna:

And now that we have all of this data, we've been training and coming up with new kinds of architectures to train these kinds of models, and we've been seeing some very, very exciting results. Something that's been very surprising for us is that we can actually train these models in simulation and only in simulation, and they actually generalize to the real world. Meaning that without ever seeing anything in the real world, we now have robots that can open doors, that can pick objects up, move them around, that can also move around themselves in real world spaces. That's a very exciting finding for us because I think now we have a proof of concept that simulation is a viable strategy for training these robot foundation models. And what we really needed was that ImageNet moment, that sort of diversity of data and environments from which we can train.

[00:25:58] Brian Heater:

It's always fascinating to me, and I was a humanities major, it's always fascinating to me to hear researchers say that they're positively surprised at an outcome, that it turned out better than expected. And I know that in and around Transformers and large language models, there's a lot of conversation around black boxes and not really quite understanding the inner workings of them. When you say you were surprised that it worked out better than you thought it would, what do you mean by that and what do you attribute that to?

[00:26:37] Ranjay Krishna:

So I mostly attribute that to public opinion. You mentioned your previous conversation with somebody at Physical Intelligence. Physical Intelligence put out a blog a few months ago where they were arguing that simulation is not a viable strategy. That the only way to train these robots is by collecting real world data and scaling up real world robot data. And they've been doing a lot of that. They've been hiring really talented people to come in and collect data across what I understand are hundreds of robots. And that's going well. Everyone's very excited by the models that they're putting out, everyone's using those models. What we're hoping to show is an alternative strategy. And it's a strategy that not a lot of people are betting on because it hasn't panned out in the past. We've had cases where people have tried to build things from simulation, and they've always run across this problem where things just don't work in the real world when you train in simulation. Meanwhile, we're showing that not only does it work, but it allows you to have robot models that have capabilities that existing models just do not have currently. So for example, in simulation we can train a robot, or collect training data of a robot doing something, and while it's doing it, we can move the camera that it's using to see the world around randomly in its environment. And that's really hard to do in the real world because usually the camera is mounted on top of the robot or on its arms or someplace, and so it's usually fixed relative to the robot. But in simulation, there's nothing stopping us from moving that camera around. Now the reason this is exciting is because what we're showing now is not only does this model that we train in simulation work in the real world, if the robot's cameras are kind of detached a little bit, maybe you didn't attach it to the body correctly, so it's wiggling a little bit or moving around, it still works. It's still able to generalize because we trained it with these cameras moving around. And so in the real world you can actually move the camera around and it'll still work and do what it's supposed to.

[00:28:50] Brian Heater:

That is a good point. I was speaking with, I can't remember which physical AI company, but yeah, they brought up a similar point, which is that the minute you put a robot out into the world, much like us, they start to degrade. So it's never gonna be exactly the same system that it was when you released it.

[00:29:07] Ranjay Krishna:

And you can model these degradations in simulation. And what we're trying to do is account for all of these kinds of faults and build them into how we collect data. And because we can scale this up so much faster than in the real world, you can build in these capabilities. We talked about compounding errors a couple of minutes ago, and even there, we can automatically create data where the robot actually fails to do something and then has to correct its behavior. And all of that is very easy to scale up as well. You just inject some errors into how you generate your data.

[00:29:42] Brian Heater:

The other immediate upside that I see of creating really small models and having robots out in the world doing focused tasks is that there's a more immediate deployment there. And it sounds like to a certain extent what we're talking about here is longer term, we're talking about these huge models. Are these models that we're talking about right now things that are useful, that actually can be used and can be deployed on systems in the real world right now?

[00:30:16] Ranjay Krishna:

So you're right. What we're doing at the Allen Institute is very much taking longer term bets. We're trying to take bets that might not pay off for a year or two or even five years. And build models that we're hoping is going to drive the next wave of innovation. And so in that sense, a lot of the things that we're betting on and the infrastructure we're building is for research. And that eventually that research will mature enough for it to become deployable in the real world. We do have smaller versions of these models and we can deploy them in existing hardware, but they don't work like you would expect from an industry grade robotics application. And to get there, I think there's still a lot of heavy lifting that needs to go into building useful robot systems. We're really tackling the intelligence problem. We're tackling the problem of figuring out what to do, how to act, how to move, how to grab things, how to move things around. And aside from that, there's all of these other issues around how do you build the right hardware, how do you make sure they're compliant, how do you make them safe for a specific deployment? All of those considerations go beyond just building an intelligent system. They're very much dependent on the application you want to build for and the domains that you want to deploy these robots in. So we're going to need to work together with a lot of domain experts to actually make them deployable.

[00:31:42] Brian Heater:

I assume that we just have to take for granted that there are always going to be edge cases, right? Like you put robots out in the world for long enough and there are just gonna be weird things that happen. You hear these stories about self-driving cars and just bizarre things that nobody can predict happen. Is there a way to prepare for those to a certain extent based on these models that you're currently building?

[00:32:06] Ranjay Krishna:

There are a ton of challenges that the research community has really been focused on for many years, or at least talking about for decades. One of those is this idea of continual learning, or in-context learning. Even with language models, this has become quite good today. You can tell a language model, hey, here's some examples of inputs and outputs, and here's a new input, produce the output. So you can show it examples of bad essays and good essays, and then ask it to improve your current essay and make it better. That kind of behavior is so far away from robotics today. You know, ideally what you'd want is a robot system where if you bring it home, maybe you have a particular object that looks very different, or you wanna hold it very differently, maybe it's something that's unique. You'd want to be able to teach the robot how to do it. And to teach it, you'd probably do the thing that you want yourself and then expect it to learn by watching you. This is something we do very often with each other. We tell each other, hey, this is how you clean this mug, or this is how you clean this statue. If we want robots to be able to learn by watching us, that's going to require us to build entirely new capabilities. They're going to need to map, first of all, their own form factor, their own body, which looks a lot different from a person's body. Their arms are going to move and have very different kinds of joints and flexibility than a person. So that mapping itself is really hard. Then aside from that, they also now need to put themselves in the perspective of the person doing the task. So when you watch someone do it, you're really getting a third person view of somebody performing an action. What you want, of course, is to map that into a first person view so that you can do it yourself. And that's really difficult. And then aside from that, you also want this continual learning where oftentimes we imagine what it would be like to do something. So for example, maybe you watched a game of tennis. You could imagine yourself playing tennis. You could imagine how your arms would need to move in order to hit the ball. Now, that kind of imaginative process, that kind of simulation, is something that these robots currently lack. And so this ability to remap different bodies to themselves, the ability to change perspectives, this ability to continually learn, all of those components need to be built. And we haven't even talked about memory and long-term tasks, long horizon tasks. All of these things are still very much open-ended questions.

[00:34:46] Brian Heater:

The sense that I get, and I've gotten this from a lot of people, is that the solution is going to come in many forms when it comes to training, pre-training, post-training, and things like teleoperation, learning by watching video, simulation, and training in the real world. Like these are all going to play a role in making robots better.

[00:35:12] Ranjay Krishna:

One of the things that we agree on in my team is we're making a big bet on simulation, but by no means do we think simulation alone will be the solution. What we're going to want are algorithms that actually do learn in the real world. Eventually, we want these robots to perform tasks, realize they made a mistake, and then not only have error corrective behavior, but learn from that experience. And so we do want systems that adjust and adapt to the real world, and that requires these models to also have reasoning capabilities. And so another line of work that we've been working on is called MoMo Act. And it's a series of action models that are really focused on this reasoning capability of figuring out what is the 3D space around me, how should I move in that 3D space, and how should I adapt as that 3D space changes over time.

[00:36:05] Brian Heater:

So we spoke about this a little bit before and obviously you said that you have a great deal of respect for what NVIDIA's doing and you work in collaboration with them. And obviously they're playing this huge role in just pushing robotics forward generally. But what is it that their simulation right now is lacking when it comes to training robots?

[00:36:31] Ranjay Krishna:

Yeah. NVIDIA's doing a wonderful job with simulation engines. Their Isaac Simulator is one that we use as well in-house here at the Allen Institute. The difference, I think, is the diversity of simulation data. They've been working on a set of projects called RoboCasa, where like I was saying, they've built kitchen environments. So these are single rooms with kitchen items where they train robots. The biggest difference is we are building simulation environments that are really large and very diverse. We have large houses with five to ten rooms. We have conference centers with multiple rooms. We have libraries. We're building hospitals. And all of them have a wide diversity of objects. And that diversity is what we're betting on.

[00:37:25] Brian Heater:

So it's scale really is what it comes down to.

[00:37:27] Ranjay Krishna:

It's scale. It's scale and diversity.

[00:37:30] Brian Heater:

Yeah. It's interesting, you were using linguistics as a kind of a metaphor before, and that jumped out at me because I was looking at your homepage and one of the things you said is, my research bootstraps machine learning using frameworks from behavioral and social sciences. So again, I'm a humanities guy, and I'm curious what it means to sort of look at these other fields and how they ultimately do inform the work that you're currently doing?

[00:38:08] Ranjay Krishna:

There are many different aspects to this. A lot of my work does draw on ideas and frameworks from a ton of different cognitive scientists and behavioral scientists. I can give you a couple of examples. We were looking at Maurice Merleau-Ponty. He's a philosopher from the early 1900s to mid 1900s, and he was one of those few people that really talked about the importance of embodiment. His main argument was that intelligence isn't something that is only in your mind, it's very much something that is part of your body, that your body does a lot of the thinking for you so that your mind doesn't have to. For example, he talks about how if someone throws a ball at you, you're not predicting the trajectory of that ball and where it's going to go and then deciding to move your arm to that location. Your body's already done all of that for you. It knows exactly where to move its arms to be able to catch that ball. If you touch something hot, you're not thinking, oh, I'm touching something hot, I need to move my hand away. Your body's already reacting and changing that. Similarly, he talks about how even the environments that we look at are very much defined not just by symbols that we associate with them, but by what your body could do with them. So for example, you don't look at a chair and say, that's a chair. You think about it as something you can sit on. You don't look at a staircase and say, that's a staircase and these are individual steps. You look at it as something you can step on to climb onto things. And so all of these different ways that your body reacts to things, he calls motor intentionality. And that's a lot of how we're also thinking about building models as well. Today, if you look at how most robotics models are built, they're built with this language component, this LLM, this large language model component in there. And I would be willing to bet that all of that is completely wrong. I don't think language should be the medium through which we think about embodiment or through which we build robotics models, because our body doesn't use symbols and language to be able to act in the world. Instead we go directly from sensors to reactions. And so this mapping from computer vision to robotics, that's the right mapping without that language intermediary in the middle. So as things move forward, I think this dependence on language models is one of those things that should go away. And so in my lab at the University of Washington, we are exploring these kinds of models, models that I call visual grounding models. So models that directly go from sensors and ground what they see in some representation so that they can act on it. And it's very much a representation that's defined by actions you can perform on those things and not by the symbols that you associate with them.

[00:41:01] Brian Heater:

Maybe this is even more of a hardware question, but I wonder if that means more decentralized processing for the robots themselves, in actuators and pieces themselves that kind of do react to their environment, but also react as a whole in tandem.

[00:41:23] Ranjay Krishna:

Yeah, I could totally see that as well. Sensors are expensive today when we're building this robot hardware. You've got of course maybe a few force-torque sensors and of course a few cameras, but that's maybe it. Ideally, yes, we would have robot embodiments that have a lot more sensors to them that can react to a lot of different kinds of stimuli. We feel air across our body that helps us adjust to the spaces that we're in. When someone taps us on our shoulder, we know exactly where the person is. All of these things allow us to adapt very quickly to our environment. Our body facilitates so much of our thinking and so I totally agree. It's also a hardware problem. I would love to see, you know, we have really good hardware coming out now that are able to do a lot of exciting things. I'm sure we're gonna have a sensor revolution as well at some point.

[00:42:15] Brian Heater:

Well, Professor Krishna, thank you so much for taking the time today.

[00:42:18] Ranjay Krishna:

Thank you. Thank you, Brian.

[00:42:21] Brian Heater:

Thank you to Ranjay and thanks to the folks at the Allen Institute for setting that up. Thanks to you as always for tuning in. Please like and subscribe on your preferred podcasting platform and maybe some of your non-preferred ones as well. Go check out the newsletter over at Automated.fm. All right, and with that we will see you this time next week for another episode of Automated.

Unlock Full Access to Automated and Explore Everything Automation.

Subscribe today and leave a review on YouTube, Apple Podcasts, and Spotify.

PODCAST HOST

Meet Brian Heater

Brian Heater is A3’s Managing Editor. During his 20+ year career in technology journalism, he has worked as Hardware Editor at TechCrunch, Managing Editor at Tech Times, and Director of Media at Engadget. He is the host of the RiYL podcast and lives in New York’s Hudson Valley with his two rabbits, June and Flash.

Subscribe to the Automated Newsletter

The future of automation delivered to your inbox every Thursday. Interviews with the top minds in robotics and AI, the week’s biggest news, the latest job openings, and more.

We’d love to hear from you! Have thoughts or guest suggestions? Reach us at [email protected]