Automated Podcast Episode: Andrew Barry

June 10, 2026 • | Episode 41

Andrew Barry on Why Dexterity Is the Next Breakthrough in Physical AI

Physical AI is moving quickly.

But Andrew Barry says one of the biggest unlocks in robotics is not just getting robots to move through the world. It is getting them to touch, grasp, adjust, and manipulate the world with real dexterity.

In this episode of Automated, Brian Heater speaks with Andrew Barry, co-founder and CTO of Generalist, about how the company is building general intelligence for the physical world and why dexterous robots may be the starting point for far more capable automation.

Andrew explains why Generalist is focused on the tasks that are both difficult and valuable. Robots have made major progress in mobility, but their ability to manipulate objects is still limited. If robots can solve dexterity, they can become useful in a much wider range of real-world environments.

The conversation explores how Generalist is collecting massive amounts of real-world manipulation data. Andrew describes the handheld data capture devices the company built, why they chose that approach over teleoperation, and how thousands of devices have helped them scale a much richer data set for robot learning.

Brian and Andrew also discuss the commercial side of physical AI. Andrew explains why the company is not just chasing impressive demos, but benchmarking against real tasks people are already paying for today. That distinction matters because a viral robot demo is not the same thing as a deployable robotic system.

They also dig into one of the most surprising parts of modern robot learning: improvisation. Andrew shares the moment when a robot picked up a baggie with the opposite hand from the one it had been trained on, completed the task anyway, and left the team realizing something very different was happening inside the model.

The episode also covers Generalist’s GEN-1 model, the parallels between robotics and the early GPT era, why flexible objects like cables are so difficult to automate, what data flywheels may actually look like in robotics,

Andrew Barry [00:00:00] The spot we're starting at is dexterous robots. If you can solve dexterity, there's just a huge number of things you can do. I have worked on many, many robots in my career that have been very difficult to commercialize, and so when we started the company, we said we're going to go out and make sure that our goals and our benchmarks are real tasks people are paying money for today.

We have built a handheld device that allows us to do data capture very inexpensively compared to many other techniques, and we have scaled that.

Brian Heater [00:00:31] How cognizant are people of the fact that they're training an automation system that may essentially do that task in the future?

Andrew Barry [00:00:39] We're very, very upfront with people about it.
We had taught the robot to pick this baggie up with the left hand and then shake it. It picked the baggie up with the right hand and shook it, and we're like, "We never taught it to do that," and we all just didn't believe it.

Brian Heater [00:01:08] Hello and welcome to Automated. I'm Brian Heater, the managing editor at the Association for Advancing Automation. I'm excited to bring you this episode from our trip to Boston back in April. The conversation was recorded shortly after Generalist debuted their Gen-1 model. We were extremely impressed with the demos and invited co-founder, CTO, and Boston resident Andy Barry to sit down for a chat.
It got a lot of really great insight into physical AI, and I think you will as well. If you're enjoying the show, don't forget to like and subscribe. Check out the newsletter over at automated.fm.
And with that, here's Andy Barry of Generalist.

Brian Heater [00:01:50] You know, we talk a lot about what's coming up next in automation on this show, but if you really want to see the future in motion, you've got to be there in person.
Automate 2026 is where the world's leading innovators, builders, and dreamers come together to show you what's possible. Robots, AI, machine vision, motion control - you name it, all automation under one roof. And as part of Automate this year, the Humanoid Robot Forum brings together leaders, engineers, and researchers for a two-day deep dive into the real-world development, deployment, and commercialization of humanoid robotics.

Register for free at automateshow.com to join us in Chicago June 22nd through the 25th. We will see you there.

Brian Heater [00:02:17] So this is probably wildly inappropriate, so I'm going to start by paraphrasing your co-founder Pete - in a blog post that went up, I think within the last day or two. He was pointing out the fact, he said, "We've never referred to our models as either VLAs or world models" - and I'm wondering why that's important.

Andrew Barry [00:02:52] Yeah, that's a great question. So, fundamentally, we think of our models as something different, right? And that's because they are trained from scratch. Not quite, right? 99% of the parameters in the model is from scratch.

Brian Heater [00:03:06] Why is there always a 1%?

Andrew Barry [00:03:09] Yeah, that's a great question. I mean, we do the thing that makes the model work as well as we could possibly make it work. And so in that case, a while ago, before we had a huge amount of data, we tried getting rid of the 1% and it caused a performance regression. And we haven't redone that experiment, so I suspect that we could go to 100% at this point. But, you know, we like to be honest.

Brian Heater [00:03:40] Yeah, I mean, going 1% - that would be like a purely branding thing, I suspect, right? And it is. And in robotics, you do say 99% because everything is kind of like 99.999% at this point, right? So actually it's funny because this was something that came up - I'm assuming this is way after our conversation with Russ, but I was talking to him a little bit about Pete's comments as far as starting from scratch and whether or not that was necessary. I wanted to get his take, and I'm curious why that is so important because that sounds like a pretty daunting undertaking.

Andrew Barry [00:04:19] Yeah. You know, we didn't set out and say, "We're going to do a model from scratch" because being from scratch is really important, right? The reason we're doing it is that with such a large data set, you can do it from scratch, right? You don't need the crutch of a VLA or some other set of parameters that are chosen for other reasons, right? And so if you have a large enough data set, you can just - basically, in some ways fill up all of the parameters in the model with exactly the thing you want to do. And there's a lot of evidence from the rest of the field, and we see it too, that that gives you just better performance.

Brian Heater [00:05:00] So the idea then is that you can just sort of - what? Just scoop it all up at once, basically?

Andrew Barry [00:05:07] It's more like, if you have a certain amount of capacity in your model, right? And you're going to fill that capacity with as much goodness as you can possibly fill it with. And so you can say, "Well, let me fill it with Wikipedia first." But what we find is for dexterous robots, Wikipedia is not all that helpful compared to more and more dexterous data. What does the friction look like? What do the objects look like from the system we use? So that's why we do it.

Brian Heater [00:05:38] And what are - insofar as you're willing and able to talk about it - and I know video is really big for pre-training, it seems like everybody has kind of gone in that direction. What are you really using to build these models?

Andrew Barry [00:05:56] Yeah, so we've said now - we have built a handheld device that allows us to do data capture very inexpensively compared to many other techniques, right? And we have scaled that a huge amount. So we have half a million hours of that. We've built thousands and thousands of these devices and deployed them all over the world.

Brian Heater [00:06:18] So it's effectively teleop in a way? I mean, it's real world data collection.

Andrew Barry [00:06:22] It's real world data collection. It is not teleop. These are gloves people wear. And that gives you a huge number of advantages because if you need to do teleop, well, you have to have a robot. And the robot has to work, and you have to have power, and you have to have all these things. But if you have a person do it, then you just get diversity, right? You're going out into the world as opposed to bringing the world to you. And so you're able to get a data set that's much richer.

Brian Heater [00:06:50] So obviously scale in size is a big question. I mean, how were you able to compound that much data that quickly?

Andrew Barry [00:06:59] Well - two years ago when we started the company, we said, "This is what we're going to do."
And at that time, my co-founders Pete and Andy were at DeepMind, right? And we had known that this was a way to collect data since like 2018, right? Pete made a video in Russ's lab doing this, and then Andy had written a number of papers about this kind of technique.

But in 2018 we didn't know what to do with the data, right? It was like, "Okay, great, you get a ton of data, what are we going to do?" Right? Well, you fast-forward to when we started the company in 2024, and it's like, okay, well, obviously LLMs work. We can basically use those techniques, and Andy and Pete were doing exactly that at DeepMind. But then the scale was clearly what was required. So at that time we said, "Okay, we're just going to go after scale." And so we spent two years building an incredible pipeline to get a huge amount of scale, and that's one of the things we've done that has allowed us to build Gen-1.

Brian Heater [00:08:00] Yeah, so what - logistically, pragmatically - what is the pipeline? How are you actually getting that? Like, who's doing it? Is it just a team of three people getting loads of hours of data?

Andrew Barry [00:08:12] Definitely not. Yeah, we tried everything. The first thing you have to do is you have to make a device that you can mass manufacture, and it has to be inexpensive and it has to be incredibly easy to use, right? And then the second thing you do is you have to find people to use it. And we tried everything. We tried mailing it to people and seeing what would happen. We tried bringing people into our offices. We tried working with a bunch of partners who had access to large labor forces. We did all of those things.

Brian Heater [00:08:43] When you were mailing it to people - just to pause for a second - what were you telling them?

Andrew Barry [00:08:49] We tried everything there too. And we told them things like, "Do your laundry." We told them things like, "Do whatever you can think of."

Brian Heater [00:08:59] Well, those are the tasks, but you were pretty upfront. You're like, "Hey, this is the thing that we're doing. We're eventually training robots and we want you to be a part of that."

Andrew Barry [00:09:06] Yeah, there was no cloak and dagger. We were very clear about what it was.

Brian Heater [00:09:11] Yeah. And people - well, maybe it's something that people are hypothetically excited about, but actually when it comes to doing the work, nobody actually really wants to do it.

Andrew Barry [00:09:23] People were mostly confused. They were like, "What do you want?" And we were like, "Just put these gloves on and go do your stuff, and then mail us the data."
And a bunch of people were like, "I don't know what this is. I'm too confused," and said no. And then a bunch of people were like, "Okay, this seems cool." And then they dug a little bit, and they were like, "Oh, wow, you're training robots with this. That seems cool." You know, and then pretty quickly we had robots that actually did some things, and so we could show them those videos and they were like, "Oh, I get it."

Brian Heater [00:09:51] That specific group of people - how did you find them? Because it sounds like they didn't know you directly. They didn't know specifically what you were doing right at the top. How did you find this large group of people to do laundry?

Andrew Barry [00:10:04] A lot more than laundry.
But yeah. And this isn't even just in the United States, right? So we have partners that we work with now that has really helped us. And that has allowed us to basically train our partners, and then they train their labor forces as well.

Brian Heater [00:10:23] So it's basically like it's people who are being compensated for that job anyway, and then they're wearing the... Okay.

Andrew Barry [00:10:31] So yeah. We have people who are being compensated and then are also wearing the device, and then they're additionally compensated for that. And then we also have folks who are compensated just to wear the device and do whatever they want to do.

Brian Heater [00:10:43] I'm sure that you want the professionals - the people who do it for a living - to be the ones who are...

Andrew Barry [00:10:50] There was nobody who really did this for a living before. We actually do a lot of the tasks ourselves. And a lot of the data is not just commercial data. It's in people's houses and stuff - doing whatever they want to do. So it's not necessarily professionals doing a specific job.

Brian Heater [00:11:12] So I think we might have jumped ahead a little bit. You kind of alluded to what sounds like a solution to a certain extent there. How did you go from just sending these things out and nobody wanting them to actually being able to collect that data?

Andrew Barry [00:11:28] I would say we tried everything and threw the spaghetti at the wall and saw what stuck. And so we pretty quickly learned how to explain it to people so that they could understand it and be excited about it, and then we pretty quickly learned how to run the logistics for it. It's a complicated thing. We're moving petabytes of data around, and that's not something you can do on your home internet connection. So there are a huge number of moving pieces there - that's really what we spent the first year of the company figuring out how to do.

Brian Heater [00:12:02] Are people - especially in those cases when it is something that is professional - how cognizant are people of the fact that they're essentially training an automation system that may essentially do that task in the future?

Andrew Barry [00:12:17] Yeah, we're very, very upfront with people about it. And some people are really excited about it, especially in jobs that they're not that excited to do.

And then there are some people who are less excited about it. I thought that would be a much bigger problem than it was. For the most part, folks were pretty happy to try it out and collect data.

Brian Heater [00:12:41] So it's really manipulation - that's the problem right now that you're trying to solve.

Andrew Barry [00:12:46] Yeah. And so, you know, we're a Generalist intelligence company for the physical world - it's right there in the name. And so that's a really big problem, right? So we want to break it down, and the spot we're starting at is dexterous robots. Because if you can solve dexterity, there's just a huge number of things you can do. We have all these robots in the world. Myself, the team, has worked on a huge number of robots. In a lot of cases it's been like, well, we have an incredible mobility platform, but - and the but is always we can't touch the world or we're really limited in our ability to touch the world. And so the robot is just much less valuable than it could be. And so we think if we can crack dexterity - and we're starting to see signs that are very exciting in my mind - then that opens a huge number of things we can do.

Brian Heater [00:13:41] So you're starting with the most difficult problem first.

Andrew Barry [00:13:45] We're starting with the most valuable problem first.

Brian Heater [00:13:47] Which is effectively - I mean, at least what I hear - one of, if not the most difficult, certainly in humanoids.

Andrew Barry [00:13:55] It is very hard. Yeah. But we're not doing it because it's hard, we're doing it because it's valuable.

Brian Heater [00:14:01] One of the things that I often talk about with people - and I guess this is sort of a philosophical thing as far as launching a service - there are those startups that have their go-to-market, right? And then certainly in the physical AI space, and certainly when we're talking about general intelligence and really large models, there's just sort of this accepted idea that it's going to take a while to get there, and it's the problem that you're trying to solve. Like the first problem you're going after is a big problem that's going to take a little while to solve.

Andrew Barry [00:14:35] Yeah. No, it's definitely going to take a little while, but I think what we are seeing is strong parallels to the language space. GPT-2 was very cool and not commercially useful in any way, right? You couldn't really use it to do anything. GPT-3 was like, wow, this is definitely more capable, and we started to see glimpses of commercial viability. There were some cases in summarization and in certain types of ad copywriting where GPT-3 really was good enough. And then GPT-4 comes out and you're like, okay, actually this opens up a whole bunch of tasks - but GPT-4 cannot be a lawyer, right? It still is quite limited. GPT-5 comes out, and so we're starting to see the same exact progression here where it's like gen zero, incredible ability to do things. Gen one, you say, okay, there are actually starting to be glimpses of places this could be really commercially viable. 99% is not 99.99%, right? But we're seeing that start to happen. And then the second piece is, gen zero and gen one are five months apart. And so the rate of progress - you know, it's like 66% to 99% in five months - we'll see what the line looks like, but...

Brian Heater [00:15:52] So when you say you're seeing signs or getting those glimpses, I'm curious what that means. And I think it'd be helpful to break that down for a general audience - the difference between seeing a demo of... I mean, I know I keep going back to laundry, but clothes folding has been the big thing everybody's been talking about, right?

Andrew Barry [00:16:17] Yeah, we did clothes folding partly because everybody asked us. They were like, "Can you do clothes folding?" And we were like, "Fine."

Brian Heater [00:16:21] Yeah, exactly. But what is the difference between the dozen or so clothes folding demos that we've seen over the last few months and actually seeing a sign for commercial viability?

Andrew Barry [00:16:33] Yeah. I mean, let me put it this way. I have worked on many, many robots in my career that have been very difficult to commercialize, right? And so when we started the company, we said, "We are not going to just build technology and then somehow they will come." We said from the very beginning, "We're going to go out and make sure that our goals and our benchmarks are real tasks people are paying money for today." Because if we can hit our benchmarks, we will have reached commercial viability, right? And so we went and did that. And so we've been working with partners where we've taken exactly their tasks - sometimes even their specific parts - we've brought them in-house, and then we've run our benchmarks on those. And we've trained models on them, and we've used them to guide our research, right? And so when I say glimpses of commercial viability, what I say is some partners are getting very excited about where their tasks are and how close we could be to deploying those.

Brian Heater [00:17:38] And this model builds on dexterity. Do you see this being the foundation for your future general purpose robot model?

Andrew Barry [00:17:48] Yeah. So this is like a big change in my mind, right? I worked on Spot from like 2016, you know, through like 2021. We did door opening, right? You know, we made some great YouTube videos, had a lot of fun doing it.

Brian Heater [00:18:01] Which - now it's like, oh, door opening. But at the time, opening a door with a robot...

Andrew Barry [00:18:05] That was really hard. And the way we built it - and I'm very proud of that behavior. That behavior works really well. And to be clear, I wasn't the only one who worked on it. A bunch of people at Boston Dynamics worked on that.

But each section of that was a hard-coded controller. So in the Russ Tedrake version of the world, you have a funnel in state space, and each controller kind of filled in one of those funnels. But what we see now in these kinds of models is that you get this improvisational intelligence - is what we're calling it - where the funnel of the model is just wildly bigger. Because the model sees a case it has never seen before, it's never been in the training, and it will do something very reasonable.

Brian Heater [00:18:58] Yeah.

Andrew Barry [00:18:58] And so we see that improving reliability, we see it improving speed, and we also just think it's worth pursuing by itself, right? And so that improvisational intelligence is kind of one of the key things we see. And let me tell you about reliability in that case. If you watch a human do a task, their accuracy is not incredible. They do not have one missed pick out of a million. But their ability to complete the task is really, really high because of course, if they mispick, they just pick again. And we see exactly the same kinds of things coming out of these models. Of course, not all the time, but we see glimpses of this - where the model will just do something that is exactly what it should be doing, and we know it's not in the training data.

Brian Heater [00:19:45] I was speaking with the CEO of Rhoda AI - and obviously they're doing some really cool stuff in the space with video. And one of the things he said to me was that - and this was one of those surprise things in hindsight - for a lot of tasks, they found that they were actually more efficient when trained on stuff outside of that specific task. On videos that weren't that specific task - as far as learning physics and getting to know the world.

Andrew Barry [00:20:19] Yes. And we see the exact same thing. We kind of showed this in the Gen-0 blog post, which is we have a specific set of tasks, and every time we add more data into the model - and this is data completely unrelated to the task, not even in the same country as the task is being operated - the task gets better. Everybody's waiting to see how far that will go, but there are no signs of that stopping.

Brian Heater [00:20:41] How far that will go as far as diversity of tasks, but also settings as well?

Andrew Barry [00:20:45] Settings, we stopped worrying about. So we did a demo at GTC. We had no data at GTC ever. We never took any data there. We have two offices, so half of our company's in San Mateo, California. Half of us are in Davis Square in Somerville, and the tasks immediately transfer across offices with no additional data. So yeah, as far as unstructured environments and lighting and things like that - we had somebody come in, and we were running a demo for them, and they said, "What happens if you turn the lights off?" And we just turned the lights off, and you can't even tell. Wow.

Brian Heater [00:21:22] That's the big thing. But in the real world - are these mobile tasks that require actual movement beyond that, or?

Andrew Barry [00:21:36] So what we're focused on right now - we certainly are adding mobility - but we started with the simplest possible problem, which is like two arms at a desk.
And so a lot of the tasks we've shown have not been mobile-based tasks, not because our system can't do it, but because we're focused on robotic dexterity. Mobility works, you know. I worked at Boston Dynamics, mobility works great.

Brian Heater [00:22:01] But does mobility plus dexterity - is it that much more complex?

Andrew Barry [00:22:08] Mobile manipulation's a whole field, for sure. What I will tell you is we have put our robots on wheels and run them around with the models running, and the models do not seem to care very much. We do have one video posted of a little semi-humanoid running around doing stuff.

Brian Heater [00:22:25] One of the things you seemed particularly excited about was the cable threading. And I've heard this from several people - we were talking to somebody recently who pointed out that automating on the automotive line, it's always the cable threading that's always the most difficult thing. Why has that been such a difficult challenge to solve?

Andrew Barry [00:22:45] That one's really hard to model. You have these flexible objects that are just tough to model. And so when you're trying to write code for them, it's just really hard to do. And then when you take this technique - it's the flexible materials, right? We never did any of that on Spot when I was at Boston Dynamics. It was just something we didn't know how to do. And there's a whole bunch of techniques you can try, but just pointing a camera at it and collecting a little bit of data, putting it on top of a foundation model, just works a heck of a lot better than anything else I've ever tried.

Brian Heater [00:23:21] And it sounds like you're talking about cross-embodiment as well.

Andrew Barry [00:23:29] 100%, yeah. Cross-embodiment - we've shown iiwa robots, flexible UR, a couple different semi-humanoids. That stuff works.

Brian Heater [00:23:45] You were talking a little bit about the origin of the project, and it sounded like a lot of it goes way back to Pete's research. At what point was it clear for you and the founders that you might actually have a startup in this?

Andrew Barry [00:24:02] The story is that Pete had just had a kid, and I had just had a kid.

Brian Heater [00:24:09] Just people telling me this, they're like, "Oh, I just got married, or just had a kid," which to me seems like the worst possible time to start.

Andrew Barry [00:24:15] Yeah, it is pretty much the worst possible time. And so I was in a molecular biology lab. I went from Boston Dynamics to the Broad Institute. And oh my gosh, I love molecular biology. I can talk your ear off about molecular biology.

But anyway, we just had a phone call and he was like, "We've got to scale this thing."

Brian Heater [00:24:36] This thing that I've been working on for a while now - now is the time?

Andrew Barry [00:24:38] Yeah, but they didn't have the ability there to just scale the data like we wanted to.
He had been working on VLAs, right? Yeah, he was working on PaLM-E and those kinds of things. He had just gotten back from leave. And so we were just like, "Maybe we should do our own thing." I had a kid a month prior, and I was like, "Well - this will take like six months to raise money for, and I'll take all my leave, and it'll be really nice." And we raised money very, very quickly, and then I had to quit my job.

Brian Heater [00:25:18] I'm curious - and I actually think I might have asked Russ the same question - because in this time when there are all these physical AI companies coming out, when you're pitching yourself to a VC, obviously a lot of it is like, "Here are my co-founders and here's our background" - but beyond that, how are you kind of differentiating yourself as a physical AI company?

Andrew Barry [00:25:43] You're talking about now or two years ago?

Brian Heater [00:25:45] Two years ago pitch deck - this is how we're different.

Andrew Barry [00:25:48] Two years ago, we had no strategy and we just said all of the things we could think of that were risks. And so we said this is a hugely deep-tech...

Brian Heater [00:25:59] You led with all the negatives.

Andrew Barry [00:26:00] By accident, yeah. I mean, we'd never done a company before. Who's to say we knew what we were doing?
But I think a lot of it is around the honesty of - and this is why this is really, really hard, and this is why we thought about each one of these problems, and these are why these are the right problems to work on. So why didn't we go work on a mobile base? Because we said dexterity is where all the value is. And why aren't we starting with a humanoid? Because again, dexterity is where the value is, and if you have dexterity, you can put it on a humanoid, right? And so why do we buy other people's arms as opposed to building our own arms? Because we know from teleop alone that the arms are good enough, so that's not where we should focus the company's risk. All of the risk - I don't want to take any risk at all except in the thing that really matters, in dexterity. And so that's what we said at the beginning, and I think the investors recognized that.

Brian Heater [00:27:04] So I guess between you it was an open question of like, are we a robotics company? Are we an AI company?

Andrew Barry [00:27:11] Yeah, we're a general intelligence company. And I think we're starting with robots. I think we will see a lot of other things happen. But you're not - like, you said we're not starting with building a humanoid. So you're not a hardware company in that way.

Brian Heater [00:27:24] I think being a hardware company's really hard.

Andrew Barry [00:27:26] Pun intended.
Yeah, and we do build some of our own hardware, and we build it in service of the dexterity, right? So we build our own grippers, and that really comes down to - we couldn't find the grippers that matched the type of data collection we wanted to do in the right way. And so we basically felt forced to build our own grippers. And we've - I love our grippers. We have incredible grippers, and we are also going beyond our grippers. That's really all about making sure the data set and the inference time matches. But when you start... What we're starting to see is when you have enough diversity, well, all of a sudden it's pretty easy to add in new types of hands, different types of grippers. All these kinds of things are starting to happen.

Brian Heater [00:28:12] And you're building the grippers specifically for data collection?

Andrew Barry [00:28:16] So we build both, right? We build the data collection device and the robot gripper, so we can make them similar, yeah.

Brian Heater [00:28:22] I guess what I'm getting at is the robot grippers themselves potentially have real-world applications, or will.

Andrew Barry [00:28:32] Yeah. And we see people - in some cases, very serious people - wanting to purchase the grippers, and we've talked to various people about that, yeah.

Brian Heater [00:28:42] Okay. So two years ago, right?

Andrew Barry [00:28:44] Yeah, about two years ago. Maybe a little longer now.

Brian Heater [00:28:45] So things got going really, really quickly. And would you say - I don't know if GTC was quite like coming out of stealth for you, but it was definitely a big moment.

Andrew Barry [00:28:58] It was our first public demo.

Brian Heater [00:28:59] Okay. For sure. Yeah. So - two years - what does that period look like in the interim?

Andrew Barry [00:29:07] The first real year of the company was all about understanding how to do data collection, right? So we knew it was all about data collection at scale, in a really reliable and inexpensive way so you can scale. And so we spent a year basically only focused on data collection. Plus, of course, you do the robot side, but at that time, the only reason to do the robot side was to make sure we were doing the data collection correctly. Because you really didn't want to end up with half a million hours of useless data - that would be a disaster. So we built the robots just to prove to ourselves that the data collection we were doing was high fidelity enough that it could transfer. And once we saw that transfer, we scaled that data collection all the way to half a million hours and beyond now. So that's what we spent the first year of the company on.

Brian Heater [00:29:56] And does - or will - video have any role in pre-training for you?

Andrew Barry [00:30:01] Yeah, of course. We have this very unique data set around the UMI-style data. And then of course, you can add to that. If you have all of YouTube, great - you can do exactly what Rhoda does, but then you can also fine-tune the model on an enormous dataset that's very similar to what the robot's going to see. You're going to get really good performance. And the same thing comes in with language - if you want to add language to it, great, now you have a very steerable system, and we've done a little bit of that. But it's really about the base of the system - the fundamental friction of the world, touching objects, understanding objects. That's the fundamental base of the system, and then you can add all these things on top. As opposed to Wikipedia being the base of the system - well, you can read Wikipedia all you want, you're never going to be a good skier. And so that's kind of how we think about it.

Brian Heater [00:30:58] It's physics of materials.

Andrew Barry [00:31:02] Exactly. You know, objects, how they move.

Brian Heater [00:31:05] And the thing that I keep talking to people about lately is this idea of the flywheel. Do you feel like we're at a point where, or we're getting close to a point where there's going to be enough of those signs where it will be legitimately useful where it can scale to a point where... Or do you feel like you have started a data flywheel?

Andrew Barry [00:31:34] We have started it, and we are learning a huge amount about it. I don't think the data flywheel will look like what people thought it would look like two years ago. If you look at where robots are deployed today, they do not have 10 gigabyte internet connections to S3 out of those warehouses. You can barely get internet out of those warehouses. And then the other piece is - are the customers actually interested in helping you with your data flywheel? In general, no. Right? Like, what's the value to them? Are they going to do the hard work of training your robot for you?
And so we are thinking very hard about how we design the product so that we are giving the value back to the customer in a way that makes them really excited to help us with our data flywheel. I think we'll have a lot more to say about this soon.

Brian Heater [00:32:30] Interesting. So - gamifying isn't the right word, but - adding value into the process of gathering data.

Andrew Barry [00:32:38] Yeah, and it's all about making their system better. That's what the customer cares about. I'm a customer, I want my system to work. I want every single time I give you data, I want to see that result in my improvement. And so if you can connect that feedback loop, well, now you've totally flipped the script because now they're very motivated to give you the data. They're really excited about it. They're getting value out of it. You're going to get a lot more data, you're going to be a lot happier, and you're going to get better quality data.

Brian Heater [00:33:06] One of the things I really like talking to folks about - especially those dealing with really gigantic models - are the things that surprise them along the way. Have you, in the last two years, is there anything... It sounds like some of these signs, maybe they've picked up with more accuracy or more quickly than you anticipated.

Andrew Barry [00:33:33] There are many surprises. I'll give you the first one that was just a jaw-on-the-floor moment for the whole company, which is we were doing these tests very early on, things were working, and we had taught the robot to pick this baggie up with the left hand and then shake it. And we were running the task, and on one of the roll-outs, it picked the baggie up with the right hand and shook it, and it did the task. And we all just stopped, and we were like, "We never taught it to do that." And it was one of the first models that had a pre-trained base, right? And at that time, the data set was small enough that Andy Zeng got sick the next week, and he spent his entire flu watching every single minute of the data collection for this task because he was sure that somebody had done it with the right hand. And he looked at every single minute of the data. Nobody did it. And so that was kind of the first, like, oh, wow moment where it's like there is something here that is just very different than what we have done before.
And then since then, in fact we released - in Gen-1 - and I was actually holding the camera at this moment. We taught it to pick up this washer and put it on a shelf, and then the other hand comes in and picks it up and puts it in a slot. And we were trying to get some nice shots of smacking the robot with a hockey stick - which, of course, I love thinking about. We're smacking it away, and the washer gets all messed up, and it's holding it in one hand, and the other hand just comes in in midair and reorients the washer in the first hand, and then it just puts it in the slot. And it's not in the data set. In this case, I can't prove it to you because we scaled our data past Andy's ability to get sick and watch literally every minute of it.

Brian Heater [00:35:24] So you're not intentionally trying to infect Andy in order to...

Andrew Barry [00:35:28] No, we try to avoid that, yes.

Brian Heater [00:35:30] Another thing that's really fascinating to me too is, at a certain point it does kind of get away from you a little bit. That you can't sit there and crawl through and read all the data. So what do you do? Because you don't really have all the learnings, and you don't quite understand exactly how things are working under the hood.

Andrew Barry [00:35:50] This is part of the reason it's so exciting to work on this - which is that we are training the robot to do things. But that doesn't mean it always does the task successfully.
So what we've seen is cases where we trained the robot to always miss first and then pick up the object. And I can point them out on little moments of the videos.

Brian Heater [00:36:13] To be clear, you were training it to miss first?

Andrew Barry [00:36:14] No, no, no. By accident, right? It's just if you're not careful with the data set, and the user misses, and then you regrasp - which you do all the time - the robot will learn that.
And there's no sense of, you know, there's nothing that tells it, "Don't do that."

In a way, it's becoming more human.

In some way, yeah - more human-like. And this is something we're learning a lot about and learning how to steer. Because we want to be able to say, "Actually, don't do that, but do do this," and we don't want to delete the whole data set. So you end up doing all sorts of things there. The other piece is you can take these models and just fix little parts of them. And it's not - you know, I'm a coder - it's not coding, it's programming via data collection. So in this one case, we're trying to stuff this hose into the foam pad, and we just didn't quite get it right when we did the data collection, and so we really needed the robot to do one extra action. And I spent about two hours just doing that action over and over again, and then trained a model overnight, and the next day the robot just did it. So we're learning how to adjust these little parameters and fix little moments in it.

Brian Heater [00:37:42] So I did want to talk a little bit about you and back up, because it is an interesting progression in terms of how you got to Generalist. So - going Boston Dynamics to Generalist, that makes sense to me. You were working on Spot, you were working on the arm. But this period in between - the Broad Institute.

Andrew Barry [00:38:07] Yeah.

Brian Heater [00:38:08] So it's biomedical and genomic research. How and why did you end up there?

Andrew Barry [00:38:15] I love molecular biology - that's the answer. So I was at Boston Dynamics and my fiance at the time, now wife - she is a biologist. And so the way it happened is she told me, "You know you can buy a DNA sequencer for 1,000 bucks now." And this was like 2017, and I was like, "No way." And then I look it up. Sure enough, this little thing called a MinION. Incredible technology. It's a nanopore technology. Really cool company. And so I decided to buy her one for Christmas. And I just put my order, add to cart, and then they called me, and they were like, "Who are you?" And I was like, "Uh-oh." And so I fessed up to my wife what her Christmas present was, and she gave me a note card for what to say to the sales rep.

Brian Heater [00:39:16] They were vetting you?

Andrew Barry [00:39:17] Yeah. And it turned out later they had people who thought it was like 23andMe and didn't understand that this is a piece of lab equipment.

Brian Heater [00:39:25] That's an expensive 23andMe.

Andrew Barry [00:39:27] Yeah. So I read my note card, and the sales rep totally didn't buy it, and then I was eventually like, "Okay, look, my wife's a biologist. I'm buying it for Christmas." And the sales rep was like, "Oh, that's totally cool. Sure, we'll ship it to you." And then I was like, "Do you do gift wrapping?" And they were like, "No."

So we got this thing, and we had an amazing time. We did a little PCR, and we're just living in a little apartment in Cambridge, and we sequenced the bacteria off our tongues. And I just thought it was the coolest thing.

Brian Heater [00:40:00] That's so romantic.

Andrew Barry [00:40:01] The bacteria off our tongues is less so. You know, it smells like bad breath.

Brian Heater [00:40:04] Yeah, it's like Lady and the Tramp.

Andrew Barry [00:40:06] We have very normal mouth flora.

Brian Heater [00:40:10] Oh, mazel tov.

Andrew Barry [00:40:13] And anyway, so I started doing this, and I just had a great time doing it. And then I just got more and more interested in it, and we got to the point where Boston Dynamics was changing as a company, and I said, "Okay, I'm going to go do something new."

Brian Heater [00:40:26] They were commercializing.

Andrew Barry [00:40:27] Yeah, exactly, and they were growing a lot, and at that time I was like, "I had an amazing run here. I'm going to do something new, and I want to do more ML too." Especially at that time, they weren't doing a huge amount of ML. That has, of course, changed.

Brian Heater [00:40:40] It is interesting to have left a robotics company to do more ML - in hindsight.

Andrew Barry [00:40:45] Yeah. But at that time, we were doing a little bit, but I really wanted to do a lot.
They were doing - and I remember having conversations specifically with Rob and Mark at the time about this kind of app store idea that they had - of having people program different systems versus having it be a purely ML play.

Brian Heater [00:41:08] Yeah, that didn't work out so well.

Andrew Barry [00:41:10] Yeah. And they acknowledged that. They were upfront about it.
And so then the next part of the story is I started asking my wife - I said, "If you can explain this lab's paper to me, I will do the dishes." And so then I just cold applied to a whole bunch of labs, and I knew some people, and I started meeting people and got a job at the Broad and had a fantastic time. They even gave me a pipette and a lab coat. They taught me some of that stuff, and I learned a ton of ML - did my own transformers kind of all the way from the ground up. Really, really interesting field.

Brian Heater [00:41:48] Well, that was kind of my next question - what role was ML serving in that world?

Andrew Barry [00:41:57] So that lab - what they do is gene delivery. In kind of quick terms, they have a protein soccer ball, and inside that ball is whatever DNA you want. And so for people who have a problem with their DNA, they need to deliver that to the correct cells. So basically those balls have - it's targeted delivery. It's effectively Velcro, and you've got to get the right brand of Velcro on the ball and the right brand of Velcro in the target cell. And so the question is, how do you pick the Velcro on the ball? And instead of Velcro, it's actually what sequence - approaching this. Well, that lab had figured out how to do 100,000 different options at once, and now you have a bunch of runs of the 100,000 different options, and you say, "Well, we should do some pattern recognition." And then when you go do pattern recognition, you say, "Boy, this starts to look like ML." And then you say, "Wait a second. There are all these foundation models in biology. Can we use those?" And so that's exactly the kinds of things we were doing.

Brian Heater [00:42:57] I'm curious - obviously it's not a straight path, as I said. It would make sense to go straight from Boston Dynamics. And I'm curious that taking that almost kind of like diversion - whether any of those learnings were really useful for what you're doing now.

Andrew Barry [00:43:12] Oh, incredibly useful. Because it's all the same, right? It's all giant transformer models. And we did that there in that context, and the models look the same, just like they do for language. The data's very different, but the models are the same.

Brian Heater [00:43:30] Great. Well, I think we covered a lot of ground. I appreciate you taking the time.

Andrew Barry [00:43:33] Awesome. Thank you. This was fun.

Brian Heater [00:43:36] Thanks to Andy and Generalist. It was a great conversation, and a really fascinating insight into the latest physical AI breakthroughs. Thanks to you, as ever, for watching. If you've been enjoying the show, please like and subscribe to it, and our newsletter of the same name over at automated.fm, and we will see you next week for another episode of Automated.

Andrew Barry on Why Dexterity Is the Next Breakthrough in Physical AI

ABOUT

Your weekly guide to the people, ideas, and technologies shaping the future of automation.

Podcast

Newsletter

News

Videos