May 13, 2026
Sergey Levine on Why Real-World Data Will Define Physical AI
Physical AI looks closer than ever.
But the hardest part in robotics is not getting a machine to do one impressive task on camera. It is building systems that can improve from real-world experience, handle edge cases, and scale across different robots and environments.
In this episode of Automated, Brian Heater speaks with Sergey Levine of Physical Intelligence about why robotics has reached an inflection point, and why progress now requires more than great models in a lab.
Sergey explains why the next phase of robotics will depend on something much less flashy than a viral demo: collecting the right real-world data, learning from it efficiently, and building systems that improve through deployment.
The conversation explores what makes a robot experience useful in the first place. Sergey describes a concept borrowed from child psychology called the “zone of proximal development,” where the best learning happens when a system is challenged just beyond what it can already do. For robots, that means creating environments where they can succeed, fail, adapt, and improve.
Brian and Sergey also discuss how the bottleneck in robotics is changing. Basic motor skills are improving fast. The harder problem now is judgment. A robot may be able to clean dishes, but if it drops a clean plate on the floor, it still has to understand that the plate needs to be washed again. That kind of common sense remains one of the biggest unsolved challenges in physical AI.
They also dig into one of the biggest debates in robotics right now: data. Sergey argues that real-world data collection is not the impossible obstacle many researchers once assumed. In fact, he believes the long-term path to better robots is more practical than people think. Deploy systems, collect experience, improve the model, and repeat.
The conversation also covers why Physical Intelligence is focused on a general intelligence layer rather than a single-narrow product, why robots should not just be treated as metal versions of people, and what surprised Sergey most about controlling very different robot platforms with the same model.
Finally, Sergey reflects on why Physical Intelligence is structured more like a lab than a traditional startup, why experimentation matters so much in modern AI, and how we may one day look back on this era as the moment AI moved beyond internet data and into the physical world.
Connect with Sergey Levine
https://www.linkedin.com/in/sergey-levine-5a31a24
Learn more about Physical Intelligence
https://www.physicalintelligence.company/
We’d love to hear from you. Have thoughts or guest suggestions? Reach us at [email protected].
You can find the transcript and more episodes of Automated at automated.fm.
Transcript
[00:00:00] Sergey Levine: Robotics at this stage is at kind of an inflection point, where developing more capable and more sophisticated robotic foundation models requires more than just a laboratory science approach. It also requires a more holistic, kind of industrial undertaking. One of the things that I think about a lot is, how can we make sure that as robotics researchers we're tackling the right problems, in anticipation of the dynamic that we'll see at this larger scale? And to me, the kind of work that we're doing at Physical Intelligence in part serves that purpose.
What we would want to see ultimately is robots deployed in many different settings -- settings that are collecting a lot of data. But it's also very important to do it correctly.
[00:00:37] Brian Heater: Is there a way to start aggregating this to really take on that huge data gap that exists right now?
[00:00:43] Sergey Levine: Actually, getting real-world data and then deploying robots is a lot easier. In the research community, people often balk at it. There's a lot of work that tries to avoid the need for real-world data. I don't think that there's anything wrong with using other data sources, but I think that if we do it on top of models that are trained on lots of real-world data, we'll get a lot further.
[00:01:01] Brian Heater: When we reflect back on this moment, how do you think we're gonna define this moment in robotics and physical AI in hindsight?
[00:01:08] Sergey Levine: Do you want an optimistic answer?
[00:01:10] Brian Heater: I want the pessimistic and then the optimistic answer.
[00:01:13] Sergey Levine: Okay. So the optimistic answer is that...
[00:01:26] Brian Heater: Hey folks, welcome to the Automated Podcast. My name is Brian Heater. I am the managing editor at the Association for Advancing Automation. We've got a very special guest for you this week, as Sergey Levine joins us from Physical Intelligence to discuss the startup's approach to training, centered around real-world data collection.
I have actually not gone back and listened to this episode since writing our newsletter feature, but I'm going to go out on a limb and suggest that you not take a shot every time that one of us mentions the data flywheel. Thank you so much to Sergey for joining us. Thanks to you as always for listening to the show. If you've been enjoying it, please like and subscribe. And with that, please enjoy this conversation with Sergey.
[00:02:16] Brian Heater: We talk a lot about what's coming up next in automation on this show, but if you really wanna see the future in motion, you've got to be there in person. Automate 2026 is where the world's leading innovators, builders, and dreamers come together to show what's possible. Robots, AI, machine vision, motion control -- you name it, all automation under one roof. Register for free at automateshow.com to join us in Chicago, June 22nd through the 25th. We'll see you there.
[00:02:45] Brian Heater: So you're one of -- I was trying to do the math on this. There are, what, six co-founders of Physical Intelligence? I feel like I haven't found a definitive resource of the number anywhere.
[00:02:59] Sergey Levine: There's seven of us.
[00:03:00] Brian Heater: There's seven of you. Which is obviously quite a bit. Was it kind of piecemeal? Did all of you come together, did this happen over an extended period of time?
[00:03:13] Sergey Levine: A lot of us had actually worked together for quite a while before that. So on the AI and machine learning side of things, many of us worked together previously at Alphabet, at Google DeepMind, where we developed a lot of the foundations for the kind of robotic models that we're working on now. The two other people that joined after that were Adnan Esmail, who is the head of hardware here, and Lachy Groom, who was previously an investor.
[00:03:38] Brian Heater: Did you come over -- or did that group come over -- straight from Google DeepMind?
[00:03:42] Sergey Levine: Basically, yes.
[00:03:43] Brian Heater: In the work that you were doing -- I mean, obviously they're still doing some very interesting things over there -- was it something that you felt like you needed to break off and do independently?
[00:03:53] Sergey Levine: Robotics at this stage is at kind of an inflection point, where developing more capable and more sophisticated robotic foundation models requires more than just a laboratory science approach. It also requires a more holistic, industrial undertaking.
So, as an analogy: in the world of LLMs, there were fundamental technologies that needed to be developed -- transformers, that sort of thing -- but a lot of what goes into making it work has to do with data collection, data curation, a lot of kind of systems and holistic effort that goes beyond just the core technical research.
And robotics is even more like that. We needed to develop new ideas about how data should be collected, so that it reflects the demands of real-world work. That takes more than just a laboratory kind of effort.
[00:04:46] Brian Heater: Obviously there are a number of different approaches to data collection. Was there something novel, something that set your approach apart from what was already existing at the time?
[00:05:03] Sergey Levine: I wouldn't say there's any one thing. I think it's more about addressing the data problem -- not just from the standpoint of a research-style training-set/test-set kind of paradigm, but more from the standpoint of, what kind of experience does a robot really need to be able to perform a wide range of real-world tasks?
Some of that means being really smart about setting up some tasks, and smart about choosing a sufficient diversity of environments. But some of it also means collecting experience that's very realistic, where you actually put the robot into a real-world environment where there's some work that needs to be done, and we actually do that work, collect the corresponding data, and then get something that's very representative of real-world situations.
[00:05:46] Brian Heater: That's interesting. We actually just happened to get off a call with another San Francisco company, Weave, who actually have some robots out in the real world right now, so they are taking -- I don't know if you would call it a flywheel approach right now -- but to what extent are you actually deploying systems out in real-world environments at the moment?
[00:06:11] Sergey Levine: What I would say is this: so far we have experiments with deployment. We had, for example, our robot assembling boxes at the Dandelion Chocolate Factory. That's very convenient because Dandelion is across the street from our offices, so we could set up a robot there, and it spends all day building these boxes that they use to pack their chocolates.
We have a coffee service at the office. So, right outside where I'm standing right now, there's a station set up where somebody can go on the company Slack, type in, "I want a latte," and the robot will actually go and make it. These are not really high-horsepower commercial efforts. They're really experiments to see what happens when the robot has to solve real-world tasks. And they're also experiments on how data from these kinds of deployments can actually be used to further improve the system.
[00:06:58] Brian Heater: So obviously, a very different approach than we're seeing with a lot of the humanoid companies right now, who at least are having a big partnership with an automotive company or something like that. To a certain extent, I don't know if lower stakes is the right way of putting it, but certainly the pressures are different than if you're dealing directly with, like, a Mercedes, for example.
[00:07:21] Sergey Levine: I think there's merits to many different kinds of deployments like that. One thing that I think is really interesting about some of the tasks that we've studied is that they involve doing work that people want done -- people want a coffee made -- but they also involve a lot of edge cases, a lot of difficult situations that provide the system with opportunities to improve.
We have a few different ways to drive that improvement, including from autonomous experience as well as from human interventions. But the ideal set of jobs for a robotic foundation model that drives improvement are jobs that you can do decently well, but that also offer many opportunities to fix up mistakes, get better, experience some occasional failures that were hard to predict in advance, and then resolve those failures either through reasoning, or through adaptation, or through human intervention. Because then the robot is learning more. It's not just doing a thing repeatedly -- it's actually acquiring new experiences.
[00:08:18] Brian Heater: I guess in the specific instance of a robot that is in your office making coffee right now, obviously the scale isn't really there, right? It's just making a few coffees at a time. How much value can actually be derived from that specific deployment?
[00:08:38] Sergey Levine: It's a prototype for a larger system. What we would want to see ultimately is robots deployed in many different settings -- settings that are collecting a lot of data. But it's also very important to do it correctly.
So we need to make sure that we have the right technology so that all of that experience is actually making the model better. We also have to make sure we understand which kinds of situations are the most conducive to collecting useful experience. There may be some very repetitive tasks where there's very limited value, at least for the model, in doing that task for more than a few hours. There may be other tasks where there's sort of an endless variability, and we wanna be very smart about collecting those kinds of experiences.
[00:09:21] Brian Heater: That's interesting. What constitutes a useful experience?
[00:09:25] Sergey Levine: One bit of intuition about this is maybe inspired by something in child psychology. There's this notion called the zone of proximal development. The zone of proximal development is when you are doing something that you're not completely clueless about, where you have some sense for how to get started, but that provides just enough challenge that there's room for improvement.
So a good early-childhood education system puts children in that zone of proximal development. And what we wanna do is put our robots in that zone of proximal development. Now, that's kind of easier said than done, because obviously the analogy very quickly breaks down, and then we have to think very hard about what that actually means for the model -- which might defy our intuition about people.
So maybe there's some basic motor skills that need to be practiced. Maybe there's some higher-level skills that need to be practiced. There's actually a lot of surprises there. It also depends on how you're supervising the model. Sometimes we supervise the model with verbal corrections, so somebody actually tells it like, "Hey, in this environment, you should pick up the plate and put the plate in the drying rack instead of putting it in the dishwasher." That's not actual supervision, that's verbal supervision, which the model can integrate. In other settings, it's a more low-level physical failure, so then someone needs to actually take over control and illustrate the correction. So all of these things, my point is, they all play together, and you have to get all that confluence of factors right.
[00:10:45] Brian Heater: I'd heard you describe this in another interview, and I thought you put it in very plain language, but an interesting way that I hadn't heard it quite described as before. One of the differences between data collection with cars and robotics is that cars are obviously out there in the real world, so they're collecting data already, and you kind of have to get creative with robots when it comes to collecting data.
Are there ways to scale that? Does that involve right now taking some of these kind of baby steps -- a Weave, for example, or doing a few of these pilots with a chocolate factory box company -- or are there gonna have to be other creative approaches, to creating synthetic data, for example?
[00:11:40] Sergey Levine: I think this is a place where there is room for creativity. I think it's also one of those things where, as the system becomes more and more capable, a lot of the stuff becomes easier to do. When we started the company two years ago, the biggest challenge for us was getting basic motor skills that work with any degree of reliability.
Now we're at a point where we can actually get very good basic motor skills for lots of tasks, and the challenge is kind of shifting upwards, where now it's maybe more important to make sure that the robot makes the right decisions in the context of a very long-horizon task. For example, we were evaluating a kitchen-cleaning policy a few days ago, and there, the problem is that, okay, the robot is supposed to wash all the plates, put them in the drying rack -- and it drops one of the plates on the floor.
Now, it dropped it on the floor after having washed it, so then it picks it up and puts it away. Well, what it should do is wash it again because it's on the floor -- so there's kind of a common-sense element. The good thing though about this is that as the bottleneck shifts further upward, in some sense we actually have more tools that we can bring to bear, because those problems also look a lot more like the problems faced by LLM agents and vision-language models and so on -- other areas of AI.
[00:12:48] Brian Heater: How finalized of a product -- and I guess how generalized of a product -- do you feel like you're going to have to have, to really deploy this at scale?
[00:13:01] Sergey Levine: I think that really depends a lot on what kind of application you're thinking about. Our take on this is that the right kind of robotic-intelligence layer can be helpful across a broad swath of applications. It really captures the imagination for people to think about, "Oh, what would it be like to have a robot in my home that does household chores?" And that's a great application. But there are also lots of other applications, and each of them have a different standard, a different slope, a different set of challenges.
Some demand much more speed, throughput, robustness. Others demand much more of this kind of common sense. We wanna improve the model along all those dimensions, but as we do that, certain opportunities will open up for deployment, for these kind of data flywheels. Others will open up later, and as long as we're progressing, and each of these things kind of bootstraps the next one, then I think this stuff is in good shape.
[00:13:55] Brian Heater: It feels like, to a certain extent, the company has been built up around being able to have a long runway. But insofar as you are actually making progress towards a broader goal of generalization -- that you're on the right path, and that you're not really feeling in the meantime any specific pressure to commercialize a product.
[00:14:24] Sergey Levine: Well, what I would say is this: AI technology is in a really good place right now in terms of the internet-based AI. You can use coding agents, you can use LLMs to help answer questions, research your next vacation and stuff like that. But when it comes to stuff in the physical world, there are a lot of major open challenges. The balance that needs to be struck by anybody that wants to really make a fundamental breakthrough in real-world applicability of robots is to balance taking advantage of the opportunities, but at the same time actually making a serious investment into solving the really hard problems in the long term.
And there are really, really hard problems. Physical AI can benefit a lot from LLMs, computer vision, et cetera, but it also requires us to solve fundamentally new challenges that those areas do not have to contend with. And it's very important to set up the kind of frameworks, the kind of evaluation structures, the kind of datasets that would allow rapidly iterating on those really hard problems. These are not the kind of things that are gonna be solved in a month or in two months. It's gonna take a very serious scientific effort, which will be helped a great deal by the advances in all these other areas.
[00:15:37] Brian Heater: I'm sort of coming at this from -- I was at TechCrunch prior to this. I'm coming at this from somebody who has talked to a lot of VCs and a lot of startups, and at a certain point there is that pressure to deliver something. Do you find that for the most part there is a sort of understanding that this is something that is going to take a long time, but that ultimately it's going to be worth it?
[00:16:09] Sergey Levine: Yes. I think the short answer is yes. And I think it's also very important to keep in mind the upside here, which is that it's one thing to build an AI-enabled version of an existing robotic system. For example, warehouse automation, where robots need to pick up objects and things like that. There are some solutions that exist, and those solutions can be made incrementally better by leveraging the latest and greatest advances in computer vision.
But if we can develop truly general robotic foundation models -- the ones where you don't tell it, "fold a T-shirt," you tell it, "Hey, I like to come home at 6:00 PM, please make sure dinner is ready. Do the laundry on Tuesdays, and clean the house on Thursdays" or something. That's the prompt, and then this thing can work for months and months doing this. And as it does it, further improve, learn about your habits, learn about what goes right and what goes wrong when it performs certain tasks. That has a much more profound transformative potential than incrementally improving the individual areas that are already kind of working.
And the transformative potential is not from producing robot butlers. The transformative potential is from enabling ubiquitous physical intelligence, where anybody could experiment with new hardware platforms, new application domains, load in this model, and have it provide a respectable first attempt at their application task, and then get better from there. I think that has enormous potential, far beyond current automation applications.
[00:17:38] Brian Heater: So you're talking about essentially building a platform onto which anyone can develop any kind of physical AI application?
[00:17:49] Sergey Levine: Yeah, and I think that's more important than any single application, because the truth is that right now, in our imagination, everybody has a particular kind of idea of what they would like a robot to do. But I think it's very tempting to just sort of think of robots as a version of a person, but made of metal.
[00:18:09] Brian Heater: I mean, especially because the ones that we're building are a version of a person made of metal right now.
[00:18:14] Sergey Levine: Yeah, and that's what we see in science fiction and so on, and that's cool. There's nothing inherently wrong with that. But I think that physical AI has potential for a lot more than that. You could have robots that are designed for specific tasks, robots that are more general, robots that are anywhere in between. It's like, you know, personal computers, right?
Once there was a general-purpose operating system software, computers can be big, they can be small, they can be in your refrigerator, they can be in your car, they can be on your desk -- and they take the form that is most suited for the particular problem they're addressing. So if physical actuation is that ubiquitous, there could be much more interesting applications than the most immediate ones that come to mind.
[00:18:55] Brian Heater: So to what extent, when you're working on any given application, are you thinking about the actual physical embodiment of the robot?
[00:19:05] Sergey Levine: One of the things that kind of surprised me is that we haven't had to do anything particularly special to handle a variety of different robotic platforms. From a science and engineering standpoint, there's something really interesting about all the different ways you could tell the model what kind of robot it's controlling, and represent the model in some clever way so that it sort of maps onto the robot morphology. But when we got started, we actually found that we didn't really need to do any of that.
Our model is actually very naive when it comes to robot embodiment. It takes the images from the cameras, it outputs a fixed-length list of numbers whose length is the maximum size needed for any robot we've ever controlled. It's like 38, I think. So if the robot requires less than 38 degrees of freedom, then it just pads it with zeros, and that's it. And the rest is all in the model. So the model absorbs data from many different robots and basically figures out, from looking at the image through the camera, what kind of robot it's dealing with and how it should control it.
[00:20:00] Brian Heater: I'm really curious -- as this team was coming together, it's been very interesting to see... Obviously, there are these moments where, you know, you mentioned this inflection point. There are these moments that we've seen, certainly from the outside, where there's a ChatGPT moment, where it's just clear that, oh, this is something different.
These systems are accomplishing something that certainly we didn't think that they would be able to accomplish in a certain amount of time. And there's a lot of parallel research happening at universities. Obviously, you're collaborating a lot. I know you went to Stanford and you've been at Berkeley. To what extent, when you're really forming and building a team initially, are you looking for those people who are doing that kind of like-minded or parallel research that you've been performing over the years?
[00:20:52] Sergey Levine: Interesting question. There are a few things I could say about this. One thing that's really important in robotics especially is that these are very integrative disciplines. So it's very important for any effort like this to succeed to have folks with a variety of different skill sets -- even more so than in foundation models. For foundation models, you need folks that are good at systems, that are good with ML research, that are good at math.
But for robots, it's also important to have a good hardware team. It's important to have folks that really know how to build the robot stack, the software stack -- people that are very good at thinking about data and tasks -- so that there's a very wide range of different skill sets. If I just kind of looked for people just like me, then I probably wouldn't get very far.
But at the same time, it is very important to make sure that any team like this is very aligned on a mission. And I think one of the things that we have done, I think quite successfully, is gathered a group of folks that very deeply care about the physical AI problem -- the problem of building general-purpose AI systems for physical robots. It's actually very special to be in an environment where people are very focused on that mission.
[00:22:05] Brian Heater: You all obviously saw something at the same time. This isn't the first time a group of people have gotten together and tried this. You all sensed that there was something coming, that something had been unlocked, that this was the time to really try this in earnest.
[00:22:21] Sergey Levine: One story I could tell you about here is about a project that we did about a year before we started the company. This was actually led by my co-founder Quan Vuong. It was a very academic project, but I think to me at least it was very eye-opening about the potential for this stuff.
So we thought, okay, can we experiment with training models on data from many different robots, and do so in a way that isn't just an internal experiment of our own, but something that impacts the larger research community? So what we did is, we contacted a number of different robotics research labs -- 33 other labs -- and we asked them to essentially donate their data to us.
Back then, every robotics lab kind of worked in isolation. They would get their platform, their particular tasks, collect some data for it, experiment with different models, and then iterate on this. So we said, okay, well, since you're not doing anything else with this data anyway, why don't you share it with us, and we'll try out some experiments, and in exchange we'll send you back the model and you can play with it if you want.
I don't think anybody that we contacted really thought very deeply about this, because they're like, okay, you know, it's just a little experiment, let's try it out. We trained the model on all this data, and then we sent it back to a few of the labs that agreed to run some experiments for us. And we just told them, hey, take this model and compare it to whatever you're developing on your own system.
Everybody had something a little different, and it was suited to their particular tasks. One lab was doing some cable routing. Another one was taking out the trash. Another one was putting objects into drawers and things like that -- whatever research tasks they had. And the model trained across all the different datasets from all these different robots ended up being, on average, 50% better in terms of success rate than whatever each individual lab was developing.
That was really surprising to me, especially as a researcher, because I know that if someone has been working on a particular problem with a particular system for a while, they've really tuned the heck out of that thing. So if we just train up a model that we've never run on their robot before, and we just send it to them, and immediately they test it and it's better -- that shows that something is actually really working.
Now, these were academic tasks. If you saw videos of these things, you'd think they're pretty dumb. But the fact that this was being evaluated by researchers that have been working on those particular tasks with their own systems, and they were finding our model to be better, suggested to me that there was that spark of potential -- the possibility of a generalist system being better than specialists that are specialized to those particular tasks. So that was a big part of the impetus for doing this thing.
[00:24:49] Brian Heater: That's really interesting. And again, this is completely my outsider view, but I've been doing this for a while and I talk to a lot of researchers. And there is collaboration that happens. You read a lot of papers, and it is cross-schools. But I don't know -- people get busy, they move on to other projects. It's interesting to hear that for whatever reason, a lot of this data just kind of sits there. Is there a way to start aggregating this to really take on that huge data gap that exists right now?
[00:25:36] Sergey Levine: So in the academic world a few years ago, I think that that would've been a very reasonable thing to do. I think at this point, the kind of industrial-scale data collection efforts that we have, and that a few other folks have, are so large that they would dwarf anything that could be accumulated from just the open-source datasets.
This is not something I expected, but it is actually quite nice to see that at least in industry, people are now actually taking large-scale data collection very seriously. One of the things that I've said for a while -- and maybe I feel a bit vindicated about -- is that real-world data is not actually that costly. It's an industrial effort. You have to set it up, you have to get the robots, people have to actually do it, people have to be trained to do it properly. But it's not somehow profoundly impossible relative to other kinds of industrial efforts. It looks impossible from a very research-centric academic standpoint, because it costs money, it costs resources, and it costs work -- typically the kind of work that professional researchers don't typically enjoy doing.
But it's not harder than building a house or building a bridge, right? It's work that can be done, and if we are serious about taking physical AI to the next level, it makes sense to put in that work and get it right.
[00:26:58] Brian Heater: When you say you weren't expecting that -- you just didn't think that you would be able to scale or to collect data at the rate that you've been able to collect it since the company started?
[00:27:08] Sergey Levine: I meant more for the community as a whole. What I didn't expect is this level of recognition that real-world data collection is actually a very reasonable path towards getting very effective models. Because it's something that in the research community, people often balk at, right?
There's a lot of work that tries to avoid the need for real-world data, and a lot of it is actually very good work, very good scientific work. Utilizing other data sources, utilizing simulation, utilizing videos -- these are good things to do. But I think sometimes people start working on that because they kind of take it as a given that getting real-world data would be intractable. So I don't think that there's anything wrong with using other data sources, but I think that if we do it on top of models that are trained on lots of real-world data, we'll get a lot further.
And that actually really makes sense, because as people, for example, we can watch a cartoon on TV with very abstract graphics, or a pilot can use a flight simulator, and it doesn't have to look perfectly realistic, and they can still get a lot of very useful context out of it. But that's because they're grounding that in their own experience in the real world, where they understand roughly how things work already, so they can internalize all these other sources of experience. And I think robotic foundation models could do that same thing.
[00:28:18] Brian Heater: I guess to a certain extent, it was unpopular because it's almost like a brute-force approach to collection, right? It's just like, let's just do as much of this as possible and collect it. It's not an easy way to do it, right? It is, to a certain extent, kind of -- it's not teleop, but it is one-to-one data collection, right?
[00:28:38] Sergey Levine: I think it's actually easy in the grand scheme of things. Maybe sometimes something that we as researchers get a little bit mixed up about is, easy for us versus easy in the context of human civilization. In the context of the world as a whole, actually getting real-world data and then deploying robots that are gonna collect more experience and get better and better is a lot easier than inventing some other technologies just to avoid having to do that.
And because it's a bootstrap problem, it's easier yet, because once things are deployed, once they're out there in the world -- imagine there's a million robots out there. If they're all collecting experience, the problem is not how to get more data, the problem is what to do with it when you've got it, and how much of it to throw out. Because of that, to me it actually feels very unrewarding to spend huge amounts of technical effort inventing solutions when it's possible to bootstrap things and then be in this easier mode, where things are getting better and better.
[00:29:39] Brian Heater: This is fascinating, and this is an interesting point to be at right now. This is the flywheel question, right? We are at a point where we're starting to slowly deploy robots into the real world. I mentioned Weave. He told me -- I asked him how many robots he deployed. He said, "We've deployed more robots than we have employees." I asked him how many employees he had. He said he had 15. So, extrapolate from there.
You talk about these pilots that a lot of the humanoid companies are doing, and most of them, I would say, are probably single digit. So we're kind of in -- I don't wanna say a holding pattern, but we're in this place where it's like, how do we get to a point where we can deploy that minimum viable product? That minimum number of robots required to really jumpstart scalable data collection?
[00:30:38] Sergey Levine: I think to a large extent, this is a problem of model capability. And that means that you need both the industrial part of it and the research part. I don't think either of them in isolation works, and I think people sometimes tend to also reduce this a little too much. There's kind of one extreme, which is, all you have to do is invent technology; and the other extreme is, all you have to do is use the technology you have now and pump more data into it. I think neither of those is correct.
What we need to do -- this is maybe the boring but pragmatic answer -- is, you know...
[00:31:08] Brian Heater: I love boring and pragmatic. Honestly, this is a conversation that I like to have, because I think that there are too many -- I don't wanna say it's easy, but there are too many of these five-minute-long humanoid robot videos that are giving really unrealistic expectations as far as what a robot can do currently.
[00:31:29] Sergey Levine: Yeah, that's right. So the boring, pragmatic answer is that you have to go in stages. You scale up data, maybe by an order of magnitude, and that opens up new avenues for research. And then we figure out, okay, given this level of scale, what do we need to invent to get to the next level of capability? Then you go to the next level of scale, and then the next level of capability. It takes time.
You have to do this carefully. If you prematurely try to go to the next level of scale, then you don't know what kind of data to collect or how to set up your experiments. You might, for example, spend lots and lots of time doing something that would be obviated with better technology. Maybe there are some kinds of capabilities that are better off being learned through autonomous experience, and you might waste time learning them through teleoperation, or vice versa.
One example, alluding to something I said earlier about how research changes the nature of data that you want: some of our experience with reasoning models. You can have a model that looks at a problem -- maybe at some high-level problem like "clean the kitchen" -- and instead of just directly outputting actions, it says, "Okay, given what I'm seeing, the next step is to pick up a plate and put the plate in the dishwasher." So it's like a thinking mode for ChatGPT or something like that, and then based on that, it outputs the action.
But now what you can do is, you can go in and you can supervise the thoughts instead of supervising the actions. And that's easier supervision to get, because that can be done by somebody speaking into a microphone, or it can be done by a labeler looking at the video after the fact, without the need to teleoperate the robot.
Initially, obviously, the problem was that you need the physical skills, but when our models got good enough, then they could absorb this additional verbal feedback and get better. So now that is an additional avenue for data collection that we know can improve the model, after we've done the research to develop this. And who knows what kind of things might be opened up in the future as we develop new technologies. So these things have to keep apace, to make sure that the data collection matches with the needs of the technology.
[00:33:24] Brian Heater: What do you mean exactly when you say supervise the thoughts?
[00:33:27] Sergey Levine: It means that somebody goes in and writes text that indicates what the robot should have thought at that moment in order to produce the correct action.
[00:33:36] Brian Heater: Okay, so you're almost going in and like coding.
[00:33:38] Sergey Levine: A little bit, yeah, but in English. So maybe the robot should have picked up the towel instead of the plate, or it should have picked up the plate over here instead of over there. And you can go in after the fact and tell it, "Yep, this is what your thoughts should have been."
[00:33:50] Brian Heater: And this is really one of the primary places where LLMs can play a pretty key role, it sounds like.
[00:33:56] Sergey Levine: Yeah. And that's, by the way, one open question right now: okay, so you can do this with human labeling, you can probably also automate it to a large extent. If you can automate it at training time, you can probably also automate it at test time. So is it a problem of getting more labels, or is it a problem of improving the base model? That's one of the research questions.
[00:34:13] Brian Heater: So this gets back to really what we were talking about at the beginning of the conversation. The approach right now is really just starting small, piece by piece, addressing addressable problems and then continuing to build there. But I guess once you start the process of building, then potentially you can build pretty quickly.
[00:34:39] Sergey Levine: I think that this is a place where it's very important to -- well, this is something you said earlier -- to have a very sober assessment of capabilities. And sometimes those capabilities are lower than people think, sometimes they're higher than people think.
For example, there was this blog post that we wrote back in December, where one of our former colleagues from Everyday Robots, Benjie Holson, kind of posted a little challenge: "These are the tasks that I think would be hard for robots to do." He posted this in the fall -- I think it was in October or so. These were tasks like making a peanut butter sandwich and washing a frying pan. So he was kind of reflecting on what he had experienced at EDR back at Alphabet, and the kind of tasks that he thought would be challenging with the robotic technologies that exist in 2025.
We looked at those tasks, and we actually found that with our current model, without doing any actual research -- just by collecting a little bit of specialized data -- we could actually solve almost all of those tasks. So that was an instance where actually the technology's further along than people thought. But of course, there's other things that are less far along. So we could get our robot to wash the frying pan and make a peanut butter sandwich. But the interesting part to me is that we didn't actually do anything special for that. All the work was actually done by the people collecting the data. The researchers basically tested some models, and that's it.
[00:36:01] Brian Heater: So the disconnect there is, again, this idea of being able to show somebody a video of a robot that can do something really impressive in a three-minute video. But ultimately, what is the disconnect? The disconnect is being able to do that at scale, and being able to do that across a variety of different environments.
[00:36:22] Sergey Levine: And particularly dealing with the edge cases that arise, right? The peanut butter sandwich -- okay, maybe you can make the peanut butter sandwich fine, and nine out of 10 times it's perfectly reasonable. But one out of 10 times you'll drop the sandwich on the floor. And now it's not enough to start over -- you have to clean it up, you have to maybe get the mop out. There's all these other behaviors that stem from the need to remedy long-tail issues, and that's actually very hard. It's basically equivalent to the entirety of the AI problem, because pretty much anything can happen in real-world settings.
So getting that kind of common sense to react intelligently to those situations, getting the robustness, the speed -- you also want the robot to get faster at these things so that it can be useful in practice. This is especially important in industrial settings. All those things are actually quite hard, especially the common sense. So I think there's a lot of research to be done there, and that's a lot of what we're focusing on.
[00:37:18] Brian Heater: One of the things that was really fascinating to me about your company in particular -- and this really does go back to this idea of having seven founders as a piece of it too -- is that it does feel like a research facility. It feels like, I don't wanna say a playground, but it feels like a large laboratory where a lot of research is being done. It is not structured like a traditional startup, as we would think of it. Is that a fair assessment?
[00:37:53] Sergey Levine: That's right. To a large extent, this stems from our belief that the problem really is hard, and that there are a lot of opportunities to get started. A lot of the innovations in AI make it much easier to tackle this problem now than it was five years ago. But there are still fundamental technical challenges yet to be resolved, which will require creativity. They'll require hard work too, and they'll require the right initial conditions, the right data, the right models, all that stuff -- but they'll require creativity also.
And in order to unlock creativity, you really need an environment that promotes that kind of creative thinking and creative problem-solving. So that's why the company is structured to a large extent as a lab, because we think we need that to solve these hard problems.
[00:38:37] Brian Heater: As a creative writing major myself: how does a laboratory setting promote creativity in a way that another setting might not?
[00:38:47] Sergey Levine: One really important thing is to reduce the friction for experimentation. This is, by the way, a big challenge across modern AI, not just in robotics. If someone has an idea, if they're lucky and the idea is really, really small, they can try it quickly. But sometimes you'll have an idea that you can only try at a particular scale, because we know that machine learning works differently at a large scale than it does at a small scale.
So one of the big challenges in producing a functional research environment for modern AI research is to make it as easy as possible to try out these ideas, even when they require doing things at a larger scale. That means, for example, really good infrastructure for evaluations. How do we test to a statistically significant degree whether our models are getting better or worse? Good infrastructure for messing with different parts of the model with different algorithms. How do we test out a new reinforcement learning algorithm when it might require thousands of trials in the real world? Sometimes you can use simulation for that, sometimes you can't. So getting all those structures right is actually really, really important.
And then the other part of it is creating an intellectually vibrant atmosphere, and that involves making sure that the right people join the company, that it's a place where people can talk about cool research ideas over lunch, where people want to stay late and hack away at some creative idea that they came up with that morning. So these are the social aspects of it.
[00:40:10] Brian Heater: And fostering, almost, a -- democratic isn't the right word, but getting rid of bureaucracy. There are founders, and people do have titles, but getting rid of some of the bureaucracy. Realizing that people are working at the company because they're very smart people, and listening to people when they do come up with those kinds of creative or out-of-the-box ideas.
[00:40:34] Sergey Levine: Yeah. We're not unique in this regard, but we don't have titles. We have -- the only title is, there's two titles: member of technical staff, and CEO. And there's only one CEO. We're legally required to have one.
We're not unique in that regard, actually. For exactly this reason, many of the Silicon Valley AI startups, especially ones that are very research-focused, do adopt this kind of structure. It's also not unique to Silicon Valley.
[00:40:58] Brian Heater: You see, it's been really hard for me to connect with people on LinkedIn -- and this explains why -- to figure out who's who over there.
[00:41:04] Sergey Levine: I think it's also a recognition of a certain truth, which is that, I mentioned that this kind of work requires a heterogeneity of skills, but it also requires everybody to keep a significant chunk of the problem in their mind. Because if you don't look at this thing holistically, then it's very hard to identify the most important challenge to tackle.
So everyone needs to be good at their particular thing, but people also need to understand the big picture and where their particular kind of superpower can fit into addressing that big picture. Again, I don't think that's unique to us. For any effective AI research lab, getting that right is really important.
[00:41:45] Brian Heater: And then remaining -- it strikes me that remaining, keeping one foot in academics, is also extremely useful. Almost like being forced to monitor the breakthroughs that are happening on that side is also a superpower that you're bringing to the table.
[00:42:04] Sergey Levine: One of the things that I'm often a bit concerned about -- and to be honest, this was part of the reason why I was excited to start this company -- is that in the academic research world, in robotics, I felt like the community might be entering a little bit of a local optimum, where it's hard to work on the most important problems because they're a little bit out of reach, due to non-science factors like availability of data, having the right kind of platforms, all that kind of stuff.
And by analogy: if you think about, let's say, natural language processing research, a lot of the technologies that went into LLMs were developed in research labs and in academia. But getting to the point where everybody could do research on LLMs also required an industrial effort. Without that, there were still researchers working on problems, but many of those problems were not the right problems, at least in hindsight. Obviously at the time they seemed like really good problems, but in hindsight they were not the right problems, because this next level of scale changed the dynamic.
So one of the things that I think about a lot is, how can we make sure that as robotics researchers, we're tackling the right problems in anticipation of the dynamic that we'll see at this larger scale? And to me, the kind of work that we're doing at Physical Intelligence in part serves that purpose. In part it serves the purpose of helping make sure that we understand what's the dynamic for that next level -- not just for ourselves, but also for the research community as a whole.
[00:43:29] Brian Heater: I'm not gonna make you do a prediction, because I hate predictions. I used to do this, and I realized that it's very easy to keep pushing the goalpost. Everything's five to 10 years. It's perpetually five to 10 years.
[00:43:43] Sergey Levine: Because that's about as far as we can reasonably see things.
[00:43:47] Brian Heater: Yeah, especially when it comes to things like AGI. But here's what I will do: when we reflect back on this moment -- obviously, very exciting time, both from a hardware and a software standpoint -- how do you think we're gonna define this moment in robotics and physical AI in hindsight?
[00:44:13] Sergey Levine: Do you want an optimistic answer?
[00:44:16] Brian Heater: I want the pessimistic and then the optimistic answer.
[00:44:20] Sergey Levine: Okay. So the optimistic answer is that I think it's very possible that we might end up looking back on the early to mid-2020s as a bit of a transitional period in AI, where people were still training their AI systems largely on data from the internet.
I suspect that if we have physical systems that are deployed very broadly in the real world -- not a few hundred robots, but millions of robots, maybe every household will have a robot -- then the way that we train AI systems will change fundamentally, because we'll have this source of data that is so reflective of human experience that it'll be much easier to train sophisticated AI agents -- not just for robots, but for everything -- on that kind of data.
And we might look back and say, "Hey, isn't that kind of funny that people used to train their AI systems on data from the internet -- which is that thing that we use to post pictures of cats and swear at people on bulletin boards, right?" So I think that there is something very exciting there, because physical experience has a kind of quality. It allows you to understand causal structures. It allows you to understand intuitive physics. And all these things are actually very fundamental to human thinking. Obviously, our brains exist in the physical world.
Because of that, we might get much more powerful AI systems that are, in some ways, actually much more familiar to us, from using primarily that kind of data. Now, obviously it's a long road to get there, but I think that would be tremendously exciting. And maybe that's actually what we'll think of when we look back on this.
[00:45:53] Brian Heater: Great. Well, Sergey, it's been a pleasure. Thank you so much for taking the time.
[00:45:58] Sergey Levine: Thank you, Brian.
[00:45:59] Brian Heater: Thanks so much for tuning in to another episode of Automated. Thanks to Sergey Levine for joining us. If you've been enjoying the show, the easiest way to support us is to subscribe, to like, to comment. You can find more info on the show, you can find show notes, you can find the weekly newsletter over at automated.fm. And with all of that, we will see you next week for another episode of Automated.
Unlock Full Access to Automated and Explore Everything Automation.
Subscribe today and leave a review on YouTube, Apple Podcasts, and Spotify.

PODCAST HOST
Meet Brian Heater
Brian Heater is A3’s Managing Editor. During his 20+ year career in technology journalism, he has worked as Hardware Editor at TechCrunch, Managing Editor at Tech Times, and Director of Media at Engadget. He is the host of the RiYL podcast and lives in New York’s Hudson Valley with his two rabbits, June and Flash.
Subscribe to the Automated Newsletter
The future of automation delivered to your inbox every Thursday. Interviews with the top minds in robotics and AI, the week’s biggest news, the latest job openings, and more.
We’d love to hear from you! Have thoughts or guest suggestions? Reach us at [email protected]
Follow Us Everywhere: