November 12, 2025  •   |  Episode 11

Ken Goldberg: Solving Robotics’ Hardest Problems: Dexterity, Data, and Design

Ken Goldberg is a great resource to have on speed dial when those complex robotics questions emerge. As a longtime U.C. Berkeley professor and cofounder of Ambi Robotics, he’s familiar with the promise the category holds and the vast amount of work and data it will take to bridge that gap. From Ambi Robotics’ data flywheel and the “robot data gap” to why personal use of humanoids may remain decades away, Goldberg and Brian Heater explore the hard truths behind dexterity, simulation, and learning at scale. Lock in for a synopsis of where robotics stands today and where it’s heading next.


Ken Goldberg (00:00)

The famous creativity test is, know, think of all the, how many uses can you find for an object, right? And the example I gave it was how many uses can you find for a guitar pick? And it generated all the hundred uses like instantly. And then one of them was a miniature sale for a toy boat. And when I saw that, I was like, that's...

Amazing. Like that's such a good example. I would never have thought of it. And I was like, that's a great, that's creative. Okay. I cannot deny that.

Brian Heater (00:40

Everyone welcome back to another episode of automated. I'm Brian here the managing editor of a three and we have a very engaging interview coming up with Ken Goldberg of UC Berkeley and and be robotics. Can somebody I've spoken with a number of times over the years and it's always a good time. In fact we actually chatted the other week for the automated newsletter about more of X paradox and why humanoid robots can say do backflips but can't button a shirt.

I know you're going to enjoy this one ? chock full of really interesting information. And if you enjoy the show, please like and subscribe and we will catch you on the other side. So as advised, I watched the GTC keynote you gave last night. We'll have a lot of things to discuss around it. But but the first thing that jumped out at me was that you opened with the video of Rosie the robot. And the reason why I bring that up specifically is because I was looking into this, the Jetsons in its initial run ran three seasons, 62 through 63. Why, like 50 years later, why is that still our primary reference for home robots?

Ken Goldberg (01:56) Great question. I don't know. You know, it's interesting because when I was watching it, must have been a few years after that, but it was certainly in regular reruns. it was given that date. It's really interesting because that was right around the time that, you know, the post Sputnik period in the U S was ramping up its whole rocket Apollo program. So there was a lot of interest in technology in particular rockets. So that makes sense. But then the robots are also part of it. Right. So all of that is part of an era.

of the 50s, where there was a lot of excitement and optimism about new technologies. And that rolls into the 60s when it's really like a national craze where everybody's getting into it. And there's all kinds of programs that get kids excited about technology. All these kids went into engineering, including me from that period. But I think there was a real sense that this was a national effort.

and we wanted to build things and there was a race on to beat the Soviets. There was a lot of excitement. Computers were somewhat new, but very exciting at that time. Nobody was really talking about AI per se, but robots have been around since the word robot comes from the 20s, as you know.

Brian Heater (03:08) So that explains sort of the potential origins of Rosie and the cultural context, but I'm wondering if it speaks to what's happened in the intervening years or almost like a lack of innovation or a lack of public facing innovation on that front that we're still using that as our reference all these years later.

Ken Goldberg (03:30) I think part of it is that there's been a lot of things about robots and robots have been in science fiction and all kinds of movies and TV shows, Westworld, all kinds of, and of course all the 60s shows too, like Lost in Space, right? There were all kinds of robots there. What I think was really interesting about Rosie is that she is, if you remember, the robot is kind of always breaking down. And I really like that they'll portray all...

Because it's kind of a joke, it's a running joke in the show that the robot doesn't quite work and she kind of malfunctions. And that's actually, if you also watch Iron Man, there's a robot in Iron Man that very similar, that's kind of always messing up. And it's true in Lost in Space, right? There's a robot that kind of doesn't always work. So I think that is a kind of very astute insight that the writers had that these robots were a great promise.

but they weren't necessarily gonna work flawlessly.

Brian Heater (04:32) Also good for comedic effect. But what's funny is, as you're saying that, I remember it now. I grew up watching the Jetsons in reruns. I'd forgotten it because it's, and I don't know if this is just like how we view pop culture or history, but it's using Rosie as a reference point is like, this is what we want to get to. This is the ideal. This is what everybody's hoping for in the home.

Ken Goldberg (04:59) Right, well I think that the dream of the robot butler or maid is very, many people want that. it's, you know, that something that will, you know, tidy up and basically take care of you around the house. Remember in The Jetsons, Rosie also played with Elroy and the kids, right? She wasn't just a worker. So I think that there's this idea that this could be, you know, sort of a companion as well.

And I think that is very much still alive. I think especially for people who getting older, and I can relate to this now increasingly, that is ? many senior citizens are gonna retire. And it's increasing, we know this is the demographics. it's very hard to get good help and people who can help you if you're at home and you need a lot of care, healthcare if you're older. So we've gone through this with parents, right,

And you can hire somebody but often it's very expensive and second of all if they're good they get hired away so it's hard to keep them and Also, there's a danger if they're not so good, you know, they might get grumpy or they might be you know Neglectful and I can say my my own feeling when I project ahead if i'm imagining myself somehow alone Let's say I don't know whatever happens ? to tiffany my wife, but let's say i'm i'm alone and I have to be taken care of

I would really like to have a robot that rather than a person that I may not trust or may not, you know, may not want to be there with me. A robot would be great, especially I always think about it where it could do a lot of certain things that I can't do, but also maybe keep me company in the sense that it would have access to all my past photos and emails and so.

memories and it could sort of help me keep track of things and you know and now with language models you know we could have conversations so be that would be kind of preferable.

Brian Heater (07:03) Maybe the first or second time I visited you at your labs, you showed me some of the research you were doing with ? pick and place with robot arms. And a lot of it was based around food. You had all these cans of groceries and everything set up around the office. And it strikes me that that's in part because of how much more difficult it is in terms of the setting, but also in terms of the things that they're actually picking up.

Ken Goldberg (07:31) People think that picking things up is trivial, right? It's easy for us. And as Hans Moravec pointed out 35 years ago, what's ? easy for robots is hard for us and vice versa. Like grasping are still hard for robots. We're getting closer, but being able to do it really reliably for all kinds of objects is still a challenge. And so the home where you have many things that are unstructured, right? If they're not in packages.

then you have all kinds of things that have to be put together. Clothing is a good example, right? You have clothes all over the floor with laundry, right? When you're cleaning up after a meal, you have all kinds of half-opened packages and things you have to pick up. For example, look at just like the, you have some crackers and they're in a package that's usually transparent and you open that up and you sort of have what's left is a transparent wrapper.

That's very hard for robot to pick up, partly because it's very hard to perceive. If it's transparent, the typical methods of depth sensors or cameras can often overlook the object. Humans are very good at compensating, and we can see it instantly, right? It's trivially obvious to us, but that gives robots a sense of hesitation. Also, that we see that that thing is an empty wrapper, so it's okay to sort of squeeze it, but a robot would look at that as sort of a...

odd shape and it's not sure what to do with it. So a lot of the standard approaches will fail on just an empty wrapper. Another example I like to use is eyeglasses, right? We know how to pick them up and it's no problem. But a robot is, it doesn't, you know, it's very complicated because they're delicate. You don't want to put your grippers on the, on the glass part, right? So there's a lot of nuances to grasping. That's why we study those. In fact, at the lab, we're often very interested in these adversarial objects, objects that are very difficult to grasp because we want to study those in particular.

Brian Heater (09:31) I wonder what people's threshold for potential mistakes is. Obviously it comes down to what it's accidentally picking up. If it damages a soup can, no big deal. But if it breaks your glasses, that's another question entirely. as we're getting to this point, and I know this is still a ways off, but as we're getting to this point where we're talking about humanoids in the home, for example, ? does it have to be right 100 % of the time in order for people to actually adopt it?

Ken Goldberg (10:00) Good question. Okay, great question. Let's think about that. So if we separate out the factory and it really depends on what you're talking about because lately I've been thinking about the evolution of robots going from moving things, which is like warehouses and logistics, to making things, which is all the manufacturing, to maintaining things, which is the third level, which is what we're talking about coming back to the home care.

Let's start with the moving things. If you're in a warehouse and you drop something, it's no big deal. And that happens all the time, right? So it's just normal that at the end of the day, you kind of look for all the boxes that got dropped around, right? Or anywhere and they get picked up often by humans and then dealt with. So that's not an issue. I mean, it's really very minor fact. And so you can have several percent of failure on grasping in a factory or in a warehouse. And that's not a problem. Factory is different.

Because factory, especially if you're producing something valuable, that could really be a loss. Like a circuit board or a wafer, right? A silicon wafer could be possibly tens of thousands of dollars if you drop it. now let's look at the HINE. In the service, mean, when I say maintain, I like it because it's an alliteration, but also because it really is about a different level of interaction where you're maintaining things, you're actually taking care. This actually does apply for many environments where, you know, like service where you're fixing cars or, or, or debugging circuit boards or even gardening is a category of maintenance. And home care is the big one, right? That's the one that, you know, it's going to be increasingly huge and, just as a huge, ? giant problem hanging over society. Now there you probably, okay, if you're just cleaning up the, the, the dining room or the living room.

You're just picking things up and you drop it, no big deal, you just reach down and pick it up again. We do that all the time. But it matters if it's a tray of hot soup and you're bringing it over to me and that soup just tilts down and lands all over me. That's bad. Or similarly, some valuable, there's all kinds of examples where you can...

Let's say it's a thermometer and you're putting the thermometer in my mouth and you break that and all of a sudden a bunch of, you know, mercury. don't know. you can imagine.

Brian Heater (12:28) I would add to that, yeah, I would add to that because you brought up, I think age tech is what people are calling it now. Age tech, they until recently were calling elder care, this has been a big thing in Japan for decades now because their population is aging even more rapidly than ours is. But you're also talking about bringing your robot into the home to care for the most vulnerable individuals.

Ken Goldberg (12:35) H-Tech.

Yes,

absolutely. that's right. So is either probably want to maybe take care of your kids. That would be another one, right? They're pretty vulnerable. can imagine. Yeah. Well, imagine like feeding the baby, right? You don't really want a robot doing that. The errors can be very, consequential. And so, but you're right, vulnerable people. And that's someone maybe, you know, as you get older, someone who doesn't, you know, has, has Alzheimer's or can't see anymore who is

Brian Heater (13:03) These are Elroy's.

Ken Goldberg (13:22) you know, really vulnerable, can't even communicate that something's wrong. you know, you have the robot, imagine, you know, someone's at home alone with the robot, the robot is doing something malfunctioning, but it's the person can't even tell. I mean, that's a nightmare, right? And you know, it's hurting the human in some way. And the human is sort of so advanced in age that they can't really even speak up or stop it. So we really want to be thinking about that. There is an element of vulnerability and safety is such a huge factor in robotics. People don't really talk about it, but in industry, you know, it's a huge OSHA in the United States has all these rules about if you want to put a robot in place, if it's fast and that means it's got big motors, so it's going to be dangerous and so therefore it has to have isolation, it has to have light curtains around it and bars and things so you couldn't get hit by accident, but that adds a huge layer of complexity and cost to setting up a robot. Now the hope is that these new generation of collaborative robots are much safer, right? But they move much slower, so they're not, you know, in many cases they're not as appealing to industry.

Brian Heater (14:39) I don't know if you're somebody who's comfortable forecasting. I find it's about 50-50 in the robotics world, but GTC was a huge part of this, especially with 1X being there, but we're very seriously now at this place where people are talking about humanoids in the home. But I also suspect that as kind of a robot evangelist, it's your role to temper expectations in terms of what is and isn't ? a reality and what is and isn't doable right now. But broadly speaking,

Do you have a concept of how long it might take to get there?

Ken Goldberg (15:14) Okay, so First of all, I want to be careful because i'm i'm a robot enthusiast, but I wouldn't say i'm an evangelist, right? i'm very nervous about that particular word because I actually want I feel like that's a big danger right now I think that humanoids already that look very human. We've actually had those since disney and in the 50s but we've also now have robots that can move that can walk and even sit for example and do backflips, right extremely well and that is an enormous amount of progress in the last 10 years, right, between dogs and now humanoids. And a lot of this is because of the ability to take simulation, learn in SIM, and then transfer that to real. So SIM to real works very nicely for two categories. One is for flying things like drones. You can simulate and then you can learn in simulation. And also works really well for locomotion, for legs.

And so even over rough terrain, you can simulate and then you can do amazing maneuvers, walking or upstairs and, ? and uphills, backflips, that all seems to be very well matched to simulation. What simulation doesn't help you with, at least so far is manipulation. It's very hard to manipulate in SIM and that's, we can go down that rabbit hole, Brian, but it's very, whole bunch of reasons why that is still

an open challenge in the field is how to model and simulate the complex mechanics and interactions that happen ? when you do something even as simple as just like rotating this pencil. Right, pen. So there's where I think we have a big gap and I'll be happy to talk more if you want. I can give you my reasons for that, but there's a story I can describe where, you know, we've made progress.

And, but the progress, if you look at comparing it to large language models, is miniscule. Language models have a huge, vast collection of data. Do want me to explain this ? story?

Brian Heater (17:24) It was something that I wanted to get at because this is something that comes up in your GTC talk and clearly it's something that you have to talk about a lot. why you talk a little bit about the move from two dimensions to three dimensions and why that's so difficult. Yeah.

Ken Goldberg (17:41) Yes.

Okay, well, let's start there. So I want to lay this out in that the argument is that data solved vision, data solved language. Therefore, data will solve robotics.

Brian Heater (17:56) In terms of these large models that we have, like DexNet is a good example, huge databases of images, huge databases of text, and that's being crawled to create this output.

Ken Goldberg (18:11) Right, although let's bracket off Dextran for a second because that's actually a special case. let's say large language solid vision in the sense that, you know, that was the big breakthrough in 2012 with image debt. And it was by mining a vast collection of data, of photos, images with labels, and thanks to Fei-Fei Li for that, that Jeff Hinton and his colleagues were able to train a convolutional neural network to be able to generalize.

right, had a critical mass of data and then we could start generalizing to new images and find objects in them with high reliability. Right, that was a big breakthrough. People say solid vision, but to some degree it's made huge massive progress in vision and that's actually fueled a lot of the excitement about AI in general. The next step which happens surprisingly a few years later, but it's when language, there's a breakthrough and it's largely due to Google's innovation of the transformer architecture, which extends deep learning and adds a temporal sequence to it. And that solves sequences of words, strings of words to the point where it can do translation, but also generation. So you can ask a question and this, these large systems, ChatGBT being the poster child does, you know, starts answering in amazing ways. I mean, surprising to everyone. And

essentially to some extent has passed the turning test, right? And we all know this, we've played with this and it's astounding and it's getting better and better. And it's amazingly good on so many fronts in just being able to synthesize vast amounts of information and then come up with really interesting answers. And by the way, it's definitely not just regurgitating. think that originally people thought of them as sort of stochastic parrots, but they're definitely beyond that now. These are...

These systems are coming up with original ideas. And if they're guided properly with the right prompts, they're able to do really amazing things. So data solved language, So then the question is, well, all this progress, it seems obvious that robotics is next. Now, I want to say to your point about the three dimensions. You can think of language as a one dimensional signal. It's just a sequence of words.

It's I'm oversimplifying because it's complex, but it's also discrete. have these discrete set of words. Right, right, right. But vision is two dimensional. You have two dimensional images that you're analyzing. Right. So those are extremely complex and high dimensional. mean, two dimensional, but still there's a lot of a huge state space of potential images out there. But when you get into robotics, you actually have even more complication.

Brian Heater (20:38) Yeah.

Ken Goldberg (21:02) because you have images and essentially stream of images, you have video essentially, and what you're trying to generate is not just the next word, but actually a command for the robot, which may be a whole bunch of joint motions, that is higher dimensional, even higher dimensional. And the robots move in, you think of three dimensional space, but actually if you also include rotations, that's six dimensions. And then you have multiple arms and fingers and legs, right? There's a lot of complexity that's going on there. So if you want to transform an image into a motion, there's, you know, it's far more tricky than transforming a sentence into the next word. So my point is that that's one of the big problems that you actually have a much more complex challenge to start with. But the bigger problem is you don't have data. And in the first two examples,

vision you have data, lots and lots of images have been collected over the internet and so there's a vast repository. And same for language. All the data, all the words that have been written and scanned, right, is vast. Now recently we did a little study and it was inspired by a guy named Michael Black who's at Physical Intelligence, the new robot company. He made an analogy, he said, well let's think of everything in terms of hours, hours of time that you put in in terms of that's how we'll measure this data size. So he said, you know, with the robot, how many hours have we tele-operated the robot to get it to do things and that we can just measure how many hours that was, right? And he said that, you know, this effort that had been going on at Google and number of universities, which I was a part of, Berkeley was a participant, had collected some, you know, 3000 hours of data. And then, he announced that physical intelligence, his company completed 10,000 hours of data collection. And they do that with many people working different shifts, tele-operating the robot, but 10,000 hours, if do the math, that's one year. Okay, but they did it in less than a year, but they collected a year's worth of data. Now, compare that to the large language models. And he said, okay, well, we'll do that, but how long would it take you to read a book, right? that would be maybe two hours. And so that amount of data, right, those amount of words. So if you run that all the way out to like Quinn, the largest language model out there, VLM, vision language model, well, that turns out to be about 1.2 million hours of data, or sorry, 1.2 billion hours of data. That is 120,000 years of data.

That's in other words, if you just compared how long it take a person to sit down and read all the text that's been used to make those systems. So now you think, okay, well the robot system so far we've got one year, but to get equivalent to what these large language models are using, that's well, might take 100,000 years. That's what I call the robot data gap, okay? It's enormous when you really look at it.

And it's just that we don't have, it's not that we can mine that the same way that large language models can just pull more data off the internet. It doesn't exist, we have to create it. And this is the big challenge that robots are facing, right? So that's why I know we will get to humanoids. I am certain that it's gonna happen. But the question is when? And if anyone who thinks it's gonna happen in the next year or even two years is...

is extremely optimistic that way. I will be very willing to bet we're not going to see that in two years. I'm not sure we'll see it in five. I'm not even sure we'll see it 10. It's going to take time. We will get there. I'm very confident of that. I'm not doubting that we will get there. Let me be clear on that. I'm not ? a sort of ? you know, a sort of ? Luddite who believes this will never happen. We're gonna get there, but it's not around the corner.

Brian Heater (25:33) It strikes me as you're describing these, you know, troves of information that in a certain sense, the creation of the databases hasn't changed very much since, know, Fei-Fei Li started doing it with images. And it's very much ? a brute force approach to learning, it seems. ? You know, I suspect that the evolution of robots and the evolution of AI is also going to involve ? new ways of training and new ways of learning.

Ken Goldberg (26:03) That's right. And you're right. what I just described is kind of a sort of somewhat naive way of looking at things, right? There's ways we're gonna shrink that down. It's not gonna ? take 100,000 years, okay? I don't believe that either, but I just put it in perspective. But there's a number of ways we can increase it. Actually, there's four. The first one is simulation. Simulation is actually really, as we talked about, that works really well for flying and for walking and for acrobatics, right? For moving limbs around and... and basically moving over terrain without falling. That's been amazing. It doesn't seem to work well for manipulation. So the second one is YouTube, basically, is we actually have a big repository of videos of people moving things around with their hands. If you just sort of mine that, you can come up with millions of hours of video of just people manipulating things. That's a very valuable source, but the problem is these are two-dimensional videos and it's very hard to infer what's the three dimensional activity that's going on. And that is actually a very active area of research and computer vision is how to take those two dimensional videos and transform them into a three dimensional models of what's really happening. But then you also have to model all the little complex forces and deformations and collisions and everything else. And that turns out to be extremely hard. So people are working on it. They're working on all of these, but that one is also not around the corner. So that's the second one. And the third one is what we just described, which is tele-operating the robot and having a person basically operate it like a puppet and you're controlling it and then you're having it do things like chop vegetables and fold laundry and make coffee and all that. And that's what physical intelligence and others are doing. And that's where this one year of data comes in. Now, there's a fourth one that hasn't been widely recognized but I'm actually very excited about. And that is real production data. And that is where you have robots that are actually out there in the field, let's say in a warehouse, collecting data as they work. Now that's what Ambi Robotics is doing. For the last seven years, we've been building robots that are actually capable of doing real work, that is sorting packages. And it took, we had to bootstrap that.

And a lot of this is due to Jeff Mahler, PhD student at Berkeley, it was his dissertation, but he used the first generation of convolutional neural nets combined with a big data set that we collected that was analogous to ImageNet, we called it DexNet. That allowed us to train Bootstrap a model that was actually capable of being fairly good at grasping. And it was somewhat of a breakthrough at the time. Now, that's what you referred to earlier.

What it was able to do was to get above 90 % success rate, picking up packages. That was enough to make it economically viable. In other words, companies would see it as a way to, ? it was cost effective for them to buy these machines. So Ambirobotics, which was formed in 2018, and then spent time commercializing and making these systems, worrying about all the things I mentioned earlier, like safety and reliability and interface and all of those things and making it something that was really useful, in this case for e-commerce, which was growing in parallel. But that resulted in many robots in industry actually performing sorting. And so at this point, Ambi has sorted 100 million packages. And what's interesting is that we collected data on every single one of them. That was originally intended somewhat to help us with maintenance, to sort of maintain all these machines, but it turned out that was also, now we realized that that was a goldmine of data that we can actually use to train the next generation of machines. So that data collection, we actually just recently ran the numbers, that turns out to be 22 years. It was collected over four years, but it was lots of robots working many shifts. But that's 22 years of data in the bank. It's a big resource, it extremely valuable resource. We've taken just 1 % of it and used it to train a large model. And this transformer model. right, so the most recent generation and it is outperforming our past method that was trained with Sim2Real in terms of picking accuracy, raising those numbers up and that is big deal. So we're essentially going to be doing more processing of this data, using it to generate bigger and better models and then putting that into practice. But then you see, we call it a flywheel. So once you get a system that's good enough to start going into production, then people adopt it, they start using it, but as it's collecting data, that's real data in real environments, right? It's very valuable. It's the best kind of data, because you're dealing with real packages, not artificial. And then you're really collecting this over time, then you're essentially re-processing that, analyzing that, using that to train even better models, and so then these systems get better, and that kind of iterates. So. AmbE has also recently used that to generate a new product line that we call it instead of AmbE sort, which is what we've been focusing on for the first four years, we now have AmbE stack. So we can talk more about that later. But coming back to what is the data story, there's these four approaches, simulation, YouTube, teleoperation, and then production. And production is one that is extremely viable and AmbE is really demonstrating that it works.

Brian Heater (31:47) Maybe not talked about as much because you you actually have to get the robots out into the world. That first generation has to come to start start collecting data. But so so so years ago, people would say this thing about Google or Facebook. They would say, you know, Google's not an email company. It's a data company. Facebook isn't a social media company. It's a data company. Ultimately, is AMB a data company?

Ken Goldberg (32:12) Great question. I like that a lot. I would say yes and no in the sense that it's both, right? Google is certainly a search company, an email company, ? a documents company, as well as a data company, because all the data they're collecting is helping them do all those things better. That's true for Meta as well, although we might debate whether having more... Compelling social media is is is better in what sense what that's better, but it's better for them It's better for their advertisers and their stock value. Although I don't know what it is lately but I think that the question of of Companies starting to look at the data as an incredibly valuable resource That is is somewhat new because I mean always I think there was data that you can mine customer data and patterns

to better serve your customers, right? To develop products that were tailored to them or would identify where their needs were and anticipate almost what customers want. Amazon is a perfect example, right? Amazon is constantly modeling and using the data. Every order you make, everything you look at on Amazon's website is being combined to figure out better and better model of what you're gonna want next.

and tailor their distribution system, what they put in their warehouse to what are people going to order, right? And that's going to shrink the time down between the time you click and the time it arrives on your front porch. So the idea of data is extremely powerful and it's increasingly interesting. And just as I mentioned with the MBA is that, know, the company spends a fair amount of time developing the hardware systems, designing those and getting the right motors and putting them together and working on all the motion planning. There's a lot of other aspects that go into a robot system. It's not just data. the, I mean, everything down to like suction cups. What are the choice of suction cups? The design of the suction cups. What is the right ? cameras to use? And there's a host of nuances, even lighting.

You know, how are those wires routed? Millions of ? little problems like that that are constantly having to be solved. So data, but data is now increasingly giving you a big advantage in terms of being able to essentially process future environments and future inputs. So maybe this is good time to talk about the new Ambi stack system. So the first, what Ambi was founded on was picking. That's what we have been working on. And it's actually a problem I've been studying for, I hate to say, but 40 years. And since I was an undergrad, we still spend a lot of time on the picking problem. But there's a dual to that, which is the placing problem. Once you pick something up, how do you put it down? In the past, we've just dropped it into a bin. That's one thing. But what if you want to pack it into a pallet that you're going to then ship somewhere else? So that packing problem, or stacking problem is actually very nuanced. And you see it when you go to the grocery store and you want to pack your shopping bag, right? Some people are very good at it, right? know, somebody who would just fill that up perfectly. And most people like me are not, and you're just sort of, you know, it comes out this half empty and you're like, you can't fit the last thing in. So knowing how to do that well is an incredible skill.

And it turns out to be, it's a variant of playing Tetris. And you know, Tetris is ? even in two dimensionals, right? You're just getting the, these little tiles as tightly as you can. But it turns out that's actually NP-hard. It's one of the hardest, it's equivalent to one of the hardest problems in computer science, right? There's no efficient solution available. In 3D, it's even harder. So doing this job, the placing problem, the packing problem.

stacking problem. That's very, very an open, interesting problem right now. So, Ambient started to address that and what we've done is we've used the data that we've collected, which is images of objects in warehouses. We've used that in multiple images, by the way, which allows us to build three-dimensional models of where things are in space. That's been the big benefit of this gold mine. But it also is a part of helping us be able to estimate how to stack things. And that's where we built a new system. There's an entirely new market out there, which is it's just how do you create these pallets? And almost every industry needs to build pallets because it's going to ship things and store things. And that's often done manually right now. It's just humans. Things come down to conveyor belt, human stacks them up and tries to fill that pallet as tightly as they can. So we have a new system, Ambistack. And that's what it does. It has a conveyor belt and then a gantry crane-like robot, we just announced this in January, and it then attempts to build these stacks as efficiently as possible. It turns out that there's another branch of artificial intelligence called reinforcement learning that is a perfect match for that problem. And reinforcement learning has to be done in simulation because you want to be able to do many, trials over time. This is, by the way, how the system AlphaGo which beat the world champion in the game of Go, ? worked because it essentially trained itself using reinforcement learning to play the game. But it was able to generate many games over time, much faster than humans can play, millions of games per day. So we can do something analogous, which is that we have packages, we can simulate different shapes of packages coming down a conveyor belt. And the system will try to pack them as tightly as it can. Over time, it's trying different strategies to learn how to do this better.

Now, if you give it a totally random stream of objects, there's a limit. It really can't do that well because it just doesn't know what's coming and it's just trying to basically, you know, it's making certain guesses. But if it turns out that if there's some pattern to the packages and there almost always is, in other words, the packages aren't truly random. They're from some set of shapes and there's often a stream of similar shapes coming down together. The system learns how to take advantage of that, uses it to build these very dense packages. So our system has been trained using reinforcement learning. Ambistack can reduce 30 % of the air, the empty space that is used by a competing system, commercial system that's out there. That has been really exciting because it shows that now AI can also solve this very hard problem. When I say solve, I'm not claiming that it's doing it perfectly, because that would be extremely difficult. But I'm saying that it's getting closer. So it's approximation algorithm, but it's actually learned to do that better, we think, than humans can do it. And so that's where this new product, this new direction, so we're good at picking, but now we can combine that with packing. And so that's what we're very excited about is this new generation of products, AmbiStack, that we're just rolling out this spring.

Brian Heater (39:41) Part of reason why I asked the data question is because, as you said, you're also invested in the creation of the hardware. I think it's clear to everyone why it makes sense to specifically tailor hardware to software and vice versa. Ultimately, though, it seems to me that the end goal is to have some kind of system that in some way is hardware agnostic is that something that you feel like we are moving toward? Yeah.

Ken Goldberg (40:12) Okay, good. So you're this is great Brian because you're bringing it back to where are we going with all this, right? So one thing I didn't explain carefully was that I see that we're gonna get to the the humanoid dream the general Generalist robot but the way to get there in my view is to is it is one step at a time So that is that we actually try to solve these individual problems very well that you get a robot to do something Well, well enough that it's commercially of interest and then you can start selling those robots to do that task. That allows you to collect lots more data and then you can start branching out to adjacent tasks and build those robots and then start selling those and those start collecting more data, right? But you're overall amassing this very, very large and growing data set that allows you to expand it to different types of jobs. Yuck. Again, I want to be careful back to the stealing jobs thing because in my view, we're not stealing jobs with these systems. They're often moving the jobs into things that are actually more productive and transforming the workers into supervisors. And so they actually really like these jobs. They're getting raises because they're being more productive. But when I come back to the idea of the stepping stones that I think we're going to get to the humanoid by learning these specific tasks and getting competency and skill level for these robots. And then over time, they're gonna then be able to start doing other things. as I mentioned earlier, we started with moving things, which is sorting and packing, but now think about making things. And that's where you're using robots to now put things in machines, take them out, assemble objects, put things together. That's... That's the making things and that's the next level. And I think with all the data we're collecting is going to lead to that. And then the next step is once you're making things and you're out, you get these skills and assembly and complex manipulation of objects, that's going to lead to the maintaining things where you are going to get the sufficient skill level that you can start going into homes and hospitals and healthcare facilities, retirement homes, and actually start doing productive work there. So.

You know, it's a process of rather than jumping directly into homes, which is, as we said, very difficult. I believe that there's a path, but we go through factories, go through warehouses to factories, hospitals, and then homes. That path seems much more viable to me. And it's a matter of collecting data along the way to build yourself up to that full generality that we all want.

Brian Heater (42:56) So prior to this interview, you sent me a number of the papers that your group has been working on and one that really jumped out to me and I think is relevant in a lot of ways to this conversation was the sculpture, generative AI and representational sculpture. So, you know, like really top level, you feed in giraffe and it takes a bunch of blocks and it builds like an approximation of a giraffe. So that fits into the making part of the

Ken Goldberg (43:24) Very good. Okay. So let me do a quick shout out to my group because I have such a great group of students in the lab. The name is autolab.berkeley.edu. If you want to see us or learn what we're up to, just go there. But this is an amazing group of students. I just adore them because they work so hard. They're absolutely brilliant. And it's a mix of post-docs, PhDs, master's students, and undergrads. We're all working together and, you know, shoulder to shoulder working on all kinds of interesting problems in the lab even as we speak. This project that you just mentioned was something we did over the summer ? where we just started asking, could a large language model, again, a VLM, vision language model, this is like chat GPT, could it actually assist with robot and could they work together to build something? And we really weren't sure if that could happen, but we started by just having it start to stack some blocks and put blocks into some arrangements. And then we started asking, could the blocks look like certain shapes? So the result will be called BlocksNet, which we'll be presenting at a conference in May. You can find it online, but what it is is you can just say one word and show it a group of blocks. And then it will do its best to assemble that group of blocks into that shape. So as you mentioned, giraffe is a great example. You just give it the word giraffe and a picture of blocks.

Then it basically asks itself this prompt, okay, what are the significant features of a giraffe? It has a long neck, it has these legs, tail. Then it says, okay, of these blocks, which blocks would go well with these different features? Then it starts to basically go through a stepwise process of figuring out how to generate a three-dimensional model using those blocks. Then what it does is it generates 10 examples and then it basically compares them itself to say which one would be the best and it's able to do that. Then it says, okay, is this physically stable? So it goes over to a simulator that basically checks if it won't fall down or not. And if it does, it basically modifies it a little bit so it is stable. The next step is it automatically commands a robot to go and pick up those blocks and put that thing together. And once it does that, it also, it's on a little table and then it tilts the table down and the blocks fall back into the bin and it does it again. So it has to do it 10 times and over each time it's basically trying to fine tune the design so that at the end it has a design that's very reliable. And this has worked far better than we ever thought possible. I have to admit it doesn't work all the time right there's some things that it can't generate a design for or a reasonable design.

for various reasons, but a lot of things like the giraffe, like a bridge, like a shelf, it works remarkably well. And this is astounding to me that you can actually use these systems. Again, this is different than the robot model we've been talking about. This is just using the VLM, ChatGPT model that are already out there off the shelf, but combining that with some clever prompting and simulator and a robot, that combination is able to actually build something interesting.

Brian Heater (46:45) It's always fascinating to me as an outsider to hear that you're positively surprised with the outcome of your own work. We talk about this quite a bit when it comes to the LLMs, the degree to which they are a black box, right? We're getting a better understanding of what's going on with them, but still, even the people who make them aren't entirely sure. Like how big of a part is that kind of like that mystery and that surprise to your job as a roboticist?

Ken Goldberg (47:18) Huge, huge Brian. Here's an analogy I would say is like, the reason it's not, if you think about it, let's say you're cooking a new recipe, right? You put it together, you do all these things, and then at the end you serve it or you taste it, and you're surprised if it's good, right? So it's kind of like that, you know? It's very much you, yes, you put it all together, but that doesn't mean you know how it's gonna turn out, and that's really where...

where research is often so fascinating and technology in general, like technology, we know all the pieces of the computer, but we're surprised by how this system, whether it's software or hardware or robot, will actually perform in practice. That the combination is very different than the sum of its parts. And so that is one of the things I live for because I love those surprises when a system works better than expected. Of course, we often get the other kind of surprise, where it's a surprise it doesn't work. And then we're like, went wrong here? So, but that's what research is, is you're constantly being surprised because you're trying new things. You're trying things that you don't know, right? What the outcome is going to be. Otherwise it really wouldn't be research. So, you know, that's element of surprise is very, very part of the central aspect. And it's, and if you're really paying attention, that's what you're seeking out. I think of it sometimes like fishing, you know, you're kind of like feeling out where do I put my effort, know, where do I drop my line that I might pull up something and just like in fishing, you get surprised when you pull up something, right? It's always a surprise and you never know what it's gonna be.

Brian Heater (48:51) I always like to end my conversations on like big, broad, unanswerable, impossible questions. You're an artist, well we've spoken before so you're probably aware of that, but you're an artist and a lot of the work that you do kind of explores that intersection between art and technology. was reading an interview you did recently where you were talking about ? image generators, right? And you...

had previously thought that robots aren't capable of creativity and maybe that idea has since evolved for you. what does it mean when we talk about robots being creative?

Ken Goldberg (49:38) I think of research creativity, ? research as being a very creative process and very analogous in my view to making artwork. They're very different languages, but the product, in other words, an original work of art, an original piece of research are very analogous because they both have to be new, novel, and coming back to our earlier point, surprising essentially to the viewer or to the reviewer.

If it doesn't have a little bit of surprise, then it's probably not going to be published or exhibited. So both those things, again, it comes back to this phishing process, right? That you're always trying to just intuit where are those interesting things that are beneath the surface? Now, to your point about creativity in machines, it's very interesting because I didn't see any evidence of that for many years. And it seemed to me that

machine really couldn't do that. You need humans to be creative, to identify what was new, what was interesting. But I am changing my view and I have admitted that I'm wrong in the terms of images, certainly, right? We now have systems that you can describe an image and it will generate an image that will often surprise you. And that I did not predict. That's been amazing. And the latest versions of, as you know, of these VLMs that have just come out are just really amazingly good at that skill. it's so that's one. I think that there's been other things. The famous creativity test is, know, think of all the how many uses can you find for an object, right? And the example I gave it was how many uses can you find for a guitar pick? And it generated all the hundred uses like instantly. And then one of them was a miniature sale for a toy boat.

And when I saw that, was like, that's amazing. Like that's such a good example. I would never have thought of it. And I was like, that's a great, that's creative. Okay. I cannot deny that. But now, you know, it's, it's doing lots of things. And I think that that ability to be creative is, is really exciting. That's a frontier. And I think that it's being used. We're using it in the lab. We're using it to generate ideas. But also, as you said, this idea of generating three-dimensional objects. Hopefully it's going to generate new chip designs, it's going to generate new proteins. Super excited about what it's going to come up with.

Brian Heater (52:15) To a certain extent, would it be fair to say that what creativity means in machines is the ability to surprise?

Ken Goldberg (52:24) I think that's a good way to, that's perfect. Yeah, I think that's right. Ability surprise. Now here's one I haven't seen yet, which is I haven't seen an AI system come up with a great joke. I am waiting. I'm waiting for that. I think it will happen, but it seems like that's really hard. And if you ask for good jokes from Judge GPT and other tools, they're not really there yet.

But ? boy, it would be great if they were. We could just have an endless stream of really hilarious jokes. That would not be great.

Brian Heater (52:57) There you go. Great. As always, chatting with Ken, thanks so much to him for taking the time to do that. Thanks to you, as always, for sticking around. If you learn something new, please like and subscribe. And don't forget to check out the automated newsletter. You can find that and more over at automated dot FM. Thank you so much for joining and we will catch you next week for another episode of automated.

ABOUT

Your weekly guide to the people, ideas, and technologies shaping the future of automation.

Automated is a weekly media platform exploring the people, technologies, and systems shaping modern automation. Each podcast episode anchors the conversation, followed by in-depth editorial analysis, a curated newsletter, and short-form highlights that extend the discussion beyond the mic.

Together, it's a recurring briefing on robotics, AI, and the real-world deployment of intelligent systems.

Podcast

A long-form weekly interview with the founders, researchers, and executives driving the next wave of industrial automation. New episodes every Monday.


SEE ALL EPISODES →

Newsletter

A weekly digest delivering insight and perspective on the biggest news in robotics and AI.


SEE ALL ISSUES →

News

In-depth articles and analysis published throughout the week, covering funding, research, and robotics and AI news.


SEE ALL ARTICLES →

Videos

Short video clips pulled from each episode - featuring the sharpest moments and most quotable exchanges, ready to watch in under two minutes.


SEE ALL VIDEOS →