Automated

With Brian Heater

December 17, 2025

Waymo’s Vincent Vanhoucke on Embodied AI and Robotics

Robotics has a habit of compressing timelines in theory and stretching them out in practice. Few people are better positioned to talk about that gap than Vincent Vanhoucke, whose work spans the early days of deep learning at Google and the ongoing challenge of scaling autonomous systems at Waymo.

Vincent Vanhoucke, distinguished engineer at Waymo and former leader at Google Robotics and Google Brain, joins Automated for a wide-ranging conversation on embodied AI, robotics, and autonomous driving.

From the early days of deep learning to today’s foundation models, Vincent breaks down why scaling is harder than innovation, how robotics and AI finally converged, and what humanoids can learn from the long road to self-driving cars.

Listen to Vincent’s podcast: AI in Motion

You can find more episodes of Automated at automate.org/podcast.

Transcript

Vincent Vanhoucke (00:00.608)

It turned out that it had been the first time they were in a Waymo and they hadn't told me that it was their first time writing. So the first 10 minutes you can see them darting their eyes around and trying to get situated and experiencing the joys of autonomous driving for the first time with a little bit of nervousness initially. And then they had the same experience as everybody who's written in a Waymo, which is that after 10 minutes you basically forget that you're in an enormous car and it's a very comfortable ride and you don't really think about it anymore.

Brian Heater (00:44.91)

Happy New Year and welcome to another episode of Automated. am Brian Heater, the managing editor of A3 and I am very excited to bring you this week's conversation with Vincent Van Oek. Vincent has a fascinating history in the space, including a long stint at Google Robotics. These days, he is a distinguished engineer at Waymo as well as the host of his own interview podcast, AI in Motion, which is not unlike automated where it recorded in a moving vehicle. Check that out on the podcasting platform of your choice. And while you're there, why not like and subscribe to this program? Easy peasy. Thank you, Vincent, for taking the time to chat and I will see you on the other side. Slightly different circumstances for you. Is this, is it difficult getting prepared for an interview when you're stationary?

Vincent Vanhoucke

Yeah, it's been fun to do this series of of podcasts in the backseat of a Waymo in San Francisco, being mobile and also having a great conversation at the same time. The experience of riding around the city while being able to have a quiet conversation and interesting one is one of the value propositions of having an autonomous driving car. Being able to actually do work in the backseat. It's becoming my favorite commute is to take a Waymo to work and get a head start on the work in the morning.

Brian Heater

Yeah, I ask partially because I know a lot of very frustrated sleep deprived new parents they'll often take the child and kind of like drive around for a few hours to lull them to sleep. So I assume that moving from the studio into an autonomous vehicle probably has some sort of impact on maybe the quality or the content of the conversation.

Vincent Vanhoucke (02:53.652)

It's been surprisingly relaxing to have the city around us, but also being able to have a conversation at the same time. It's an environment that is surprisingly comfortable. You don't think of driving and having a conversation at the same time as something that you should be doing routinely. But once you have an autonomous driver in the front seat, It's a very different experience. I think it's right. And something I figured out a long time ago that I need to get back to is one of the keys to having a really good and really candid conversation is to make the other person kind of forget that they're having a conversation. And when you're just driving around in a car and you've just got your like lav mic on, then they probably, you know, they just feel like they're talking, they're AI with a colleague. I had one guest that I won't say who, but it turned out that it had been the first time they were in a Waymo. And they hadn't told me that it was their first time writing. So the first 10 minutes, you can see them darting their eyes around and trying to get situated and experiencing the joys of autonomous driving for the first time with a little bit of nervousness initially. And then they had the same experience as everybody who's ridden in a Waymo, which is that after 10 minutes, you basically forget that you're in an enormous car and it's a very comfortable ride and you don't really think about it anymore. But the first 10 minutes where I could tell that there was some novelty there and I had no idea.

Brian Heater (04:38.502)

One of the things I'm really interested in is this new, I don't know if it's a new stage for Google or like a new approach because in March, Astor Teller launched the the moonshot podcast. And then now you're hosting this AI in motion, which is, you know, what we're discussing happening in the back of, of a Waymo. this like a new thing for Google? Is this a bit maybe like, uh, like increased transparency or like speak more directly with the consumers?

Vincent Vanhoucke

We've always tried to speak very directly to our consumers and to the people that are our writers and the public and the general public. This series is a little bit more tailored towards the AI community, people who are interested in embodied AI in general. There is a lot of talk about AI in general right now, but not so much discussion around this next wave of AI, which connects AI to the physical world. So we thought it would be a good opportunity to start having that conversation because Waymo is to some degree, the very first company to bring embodied AI to the real world in a way that sort of connects to the general public, just to sort of, it's very different from industrial robotics or logistics application of robotics. It's something that lives in the human world and is tailored to really sort of impact individuals in the communities where they live.

Brian Heater (06:26.602)

It's interesting because obviously Astro, do you just call them Astro? Just Astro Astros, Astros podcast is obviously tailored to to a general audience, you know, watching yours like certainly. You should have some familiarity with with AI and I know that, you know, in terms of what I do, something that's made me…Successful is probably too strong a word for my career, but something that's been good for my career, I'll say, is my ability to speak to a more general audience. In terms of you sitting in the back of a car and talking to some very smart people about some very high concept ideas, are you attempting to distill it for a more general audience?

Vincent Vanhoucke

It's always a tension that I always in the back of my mind want to talk shop and, you know, use all the secret acronyms that they will understand and I will understand and all the shortcuts, the cognitive shortcuts that we are used to use when we talk with people in the field. It takes a very deliberate effort to unpack all that and really try to communicate at a level that they understand what I'm talking about. understand what I'm talking about, but also the public that we are trying to reach out to understands and has a shared basis that they can understand where the conversation is going. So it's new to me. This is kind of a new, I always say it in my career. I've always tried to try and maximize my imposter syndrome. Basically live at the edge of what I'm comfortable with. And as long as I'm surrounded with people who know the area much better than me and can help me and can improve my ability to operate. So this is definitely something at the edge of my comfort zone, if it will, be interviewing other people and trying to communicate things. very used to speaking in, you know, conferences and in settings that are very much tailored to a technical audience and reaching out to a more general public is very exciting. It's a bit of a new experience for me. One of the, guess, something that you and I definitely have in common is that I would say, and let me know if you don't agree with this, we've both taken what I would classify as a circuitous route into Robata. Yes, very much so. I started from AI. In fact, I did my PhD in speech recognition originally and moved on to work on more general machine learning, including computer vision, and took that route to going into robotics much later, trying to bring kind of a more fresh perspective on robotics through the lens of AI.

Brian Heater:

Did you is this a place that you I mean not necessarily autonomous vehicles, but is robotics a place that you? Anticipated you would end up eventually

Vincent Vanhoucke (09:59.19)

Not really. It was very serendipitous in the sense that we had been working on AI and deep learning for a number of years. And a lot of the work there initially was very revolutionary. There was a lot of very, you know, zero to one innovation. And at some point sort of evolved, particularly in computer vision, towards something a bit more incremental, something where You were basically iterating on different architectures and improving the efficiency of the models, but essentially solving the same problems. I kind of took at some point the perspective of, assume computer vision is going to be solved on some time horizon. Now that we are on the right track, we have the right basic framework to think about computer vision. Assume that other smart people will be able to take this all the way to human level or superhuman performance. What are the consequences of that? And when we went down that thought process, two things kind of came out. We thought that healthcare was going to be an area where radiology and imaging was going to be revolutionized. And number of my colleagues went into that. route from the perspective of computer vision. And robotics was the other one where we thought, suddenly, we can really apply, if we have a perfect perception system, robotics will look completely different. So at the time, I started sort of on the basis of that assumption, you know, how do we make that happen? Like, how do we leverage that new way of thinking about perception? and apply it to robotics. At the time, we had the opportunity to work with Google X. A number of my colleagues were there, and they had a bunch of arms lying around that had been the result of an acquisition. Robot arms, yes, robot arms. And they were basically there gathering dust.

Brian Heater (12:16.174)

Robot arms to be clear.

Vincent Vanhoucke (12:24.344)

Yes. We decided what if we use those arms and build a data collection system for essentially picking objects, having training AI model to learn how to pick objects and basically apply all the recipes that worked well in computer vision, not to recognize pixels on an image, but to recognize what is the correct action to be taking given a specific group-like task, which in that case was about picking. And that was a bit of a skunk work project. And the result of that was very, very surprising in the sense that we were able to quickly learn with a reasonable amount of data picking strategies that generalized very well, that supported basically picking any object that we presented in front of the robots. And at the time, this idea of being able to generalize in robotics was not really a dominant concept. A lot of the strategies for manipulation, for picking, were very bespoke to, you you assume that you knew what object you were going to be manipulating and then optimized a robotic policy to that object. So suddenly we had that opportunity to think about it in a generalized framework. And we ended up deciding to really pivot our computer vision efforts to robotics on that premise and trying to see where we could take it and how far we could go with this mix of AI and robotics. Yeah, it's a new way of thinking about systems, obviously, for decades. And even still, robots are largely built for single purpose activities and for doing them over and over again. And I wonder now that we're talking about general purpose through these AI systems, that's maybe shifted perception, and certainly the public's perception.

Brian Heater (14:38.924)

Is there an unrealistic expectation of what systems are actually able to do now and how far we are from like true general purpose robotics?

Vincent Vanhoucke

Yeah, there is this tension between sort of generalists versus specialists and how much progress we can make on very specialist tasks using very generalist systems. In fact, recently interviewed on the podcast, Sergei Levine, who was one of my co-conspirators in this journey of bringing computer vision and robotics together and asked him, like, do you think it's a tension really that there is the specialists on one end of the spectrum and the generalists on the other end of the spectrum. His perspective was that no, that there is very much a, that we will end up in a place where the generalist systems are also the best specialists. That transition happened in AI in the sense that it used to be that you build machine learning models for every specific tasks independently, right? For machine translation, for computer vision, for natural language processing. And suddenly we're in a place now with the large language models and large multimodal models where the generalists basically beat the specialist at every task. you can turn a generalist into a better specialist with very little fine tuning or even just by prompting the model. So that transition has happened in a lot of areas of AI. And the question today is, are we going to see this transition happen in embodied AI in the same way? And what will it take for that transition to happen? A lot of people are making the bet on it's really just a matter of scale. So we need more data and we need more scalable ways of collecting data. We'll see if that's true. There are also lot of bets on simulation. Can we do a lot of simulation and generate a lot of data in that way? I think the jury is still relatively out on what will win out and if we're going to be in a place where specialist systems are going to be much better than generalists for a long time. But that question is a very important scientific question to be answering for the future of robotics.

Brian Heater (17:17)

Yeah, it's really interesting because I feel like there's almost this tension even like, I want to say at Google, obviously intrinsic is kind of its own thing now, maybe under the larger alphabet umbrella. But to a certain extent, they are really kind of still working on these specialized systems. Google's obviously placed its bet into general systems to such a point that they're like, oh, Gemini. Gemini works for LLMS, let's just port that over, we'll keep the name and we'll bring that into the physical AI world. And it sounds like you're leaning in that latter direction that even if I've got a very simple, at some point in the future, even if I've been extremely simple robot arm that I'm still gonna be in some way using one of these really sort of large robust models to have it continue to do its job.

Vincent Vanhoucke 18:10

Back in the robotics world at Google, we collaborated a lot with Intrinsic. In fact, we were just a few desks away from each other. And there was a lot of collaboration going on on trying to see how much of bootstrapping from a generalist model can improve bespoke tasks. They were working on very, very hard tasks. A lot of the focus was on very precise dextrous manipulation at a time, that's a few years back, where dexterity was not at the forte of those generalized models. They were very clumsy in the sense that they would do well at very high level grasping tasks, for example. But as soon as you were trying to do very fine manipulation, there was just no good way of doing that. That is evolving rapidly.And so if you look at the latest kind of policies that are being learned, whether it's with Gemini for robotics or companies like physical intelligence or Dyna or others that are bringing more dexterity to this very, to this large model space, there is a lot of innovation happening in that space right now that I think is opening up the path. to having your cake and eating it too, or eating your cake and having it too.

Brian Heater (19:48)

So lot of this ultimately, I think, comes down to timeline, right? So the question is, are we making a product that we want to ship right now, or are we making something that's going to work at some time in the future? And one of the things that, you know, one of the benefits I suppose that you get from working under Google, this is a benefit that I would say Waymo has had in the long run is the ability to really sort of develop. these products over time. that's what Gemini is going to require, right? This development for a long period of time to get to that state where a real sort of generalized system is able to adapt to any kind of embodiment.

Vincent Vanhoucke (20:21.422)

The question is, what are you trying to maximize? Are you trying to maximize utility today, or are you trying to maximize your rate of learning? Right now, the most learning we are getting is in the space of very large learning, large AI models, because it's very greenfield. There is a lot to explore in that space. And the capability that we're seeing emerging from that line of research are really novel. It's a lot about generalization. It's a lot about being able to do things cross-embodiment. It's about new workflows that enable you to learn policies directly from data, not from essentially engineering and elbow grease. In terms of the where, The research should be, it's definitely very, there's a lot of things happening in that space in contrast to sort of more bespoke approaches that have, basically we're not evolving nearly as fast from a capability standpoint. If you're trying to get to a product today, it's a different story, right? So think there is a lot more uncertainty on how quickly those systems will be ready for prime time. In a sense, there's this interesting question parallel to the AV world, right, to the autonomous driving world where essentially when Waymo started back in 2009, there was the promise of self-driving as a product on the horizon, but the technology wasn't there. And in fact, it took several generations of technology. before Waymo got into a place where the commercial product was viable. And being able to sustain an investment through this cycle of technology evolution is what enabled Waymo to be where it is today. So the question is, are those AI-driven robotics approaches going to have to go through the same cycles of technology development?

Vincent Vanhoucke (22:47.564)

I think the shortcuts that they have is that we're in the right sort of technology generation for things to work. Like, Waymo had to go through the, the ConvNet revolution and then had to go through the Transformer revolution and sort of learn from all those advances. I don't think there is the need for that many revolutions for robotics. and AI to work well together. But we'll see. I might be too optimistic about this. It's also possible, it took Waymo basically more than 10 years, 15 years to get to the point where we are today. It's very possible that the embodied AI for manipulation in general will also take many years to emerge as a commercial product that's viable. So obviously at that point in the early days, you were an outsider specifically to Waymo to self-driving cars. I mean, is it fair to say that, you know, in hindsight that it has taken a fair bit longer than you anticipated to get to a place where we'll have full autonomy on the roads? So I started collaborating with Waymo actually a while back, back in 2013. I was on the Google side and working on computer vision then, and we were as part of the Google Brain team. We had just gone through the big deep learning revolution with Jeffrey Hinton and Elias Skiver and Alex Kruschevsky to bring the early versions of AlexNet into Waymo to do very simple perception tasks at the time. So got to see a little bit the early days of sort of what it took to basically reinvent the stack or part of the stack at the time and how really being nimble about the technology developments and following what the technology.

Vincent Vanhoucke (25:06.51)

What evolutions of technology were available enabled the company to sort of pivot very quickly towards whatever the state of the art was quickly becoming. It took a long time, but the driver that punctuated that timeline was really that the general technology was evolving at the same time. Again, there was like the pre-AlexNet days. the pre-deep learning days and then there was the convolutional nets days and then the transformer days and now we're going more into the the foundation model days. All of these generations of technology had to happen for the self-driving system to be at the level that it is today. The AI and robotics may not have to go through the same number of upheavals to be able to become productive. We'll see. I think it's an interesting question of how many generations of technology do we need to achieve the same outcomes in general sort of navigation manipulation.

Brian Heater:

I obviously come at a lot of this from the consumer side of things, having covered consumer electronics for many years. And in that world, they've got this concept of Sherlocking, right, which is specific to Apple and an Apple app where Apple Developed a feature and baked a feature into the operating system that effectively killed a third party that was doing a similar thing. And I think there's an analogy here and and this really speaks to what you're talking about when you talk about being nimble. And this is especially the case with AI because as a AI develops, we're seeing all of these really, you know, interesting and hopeful startups kind of be wiped away because of like just a complete, a new feature to AI. How, as a startup and as a larger startup, how can you stay nimble and how can you continue to pivot when things are changing so quickly?

Vincent Vanhoucke:

What I've seen happening more is that there is a ton of experimentation in the startup world right now? Yes, is a lot of churn as a result. A lot of people are trying to figure out, hey, this is a new technology. This is a new way of thinking about even the business model has to change because the technology sort of has different implications from where the value add is coming and how the consumer interface is with the technology. So there is a lot of sort of creative destruction that's happening in the general AI space. I think that's a very healthy state for things to be right now, given how new the AI space is, or renewed, if you will, through the modern breed of large language model and large multimodal models is. So think that's going to continue to some degree, but it's also It's an age of innovation.

Brian Heater (28:31.586)

So you referred to some of your early work or guess early experimentation in and around robotics at Google as being skunk works, right? Which, know, obviously like kind of puts it in line with the work that X was doing At that point, was Google considering robotics a moonshot in the same way that it had considered self-driving cars a moonshot early on?

Vincent Vanhoucke:

There was a big effort at Alphabet at large before my time, before I got involved in robotics, to really give an earnest shot at building something significant in the robotics space. All of that was part of Google X. There were a bunch of acquisitions. Boston Dynamics, Shaft and a bunch of others got acquired around that time. That effort basically was a little bit ahead of its time, right? I think the goal there that Larry and Sergey had was to bring this embodied AI about at a time where there was not a lot of connections between machine learning, AI, large data systems, the kind of things that had served Google well in other areas and robotics. And they tried to sort of bring that about at the time. And it was probably too early. I don't know. I wasn't involved in it. But from how things went, this was probably premature at the time. That said, really good things came out of that sort of melting pot at the time. Like you mentioned Intrinsic earlier. That's one of the companies that emerged from that initial thrust and that has been very successful since. At the time, the piece that was missing was that I think a lot of robotics was really focused on hardware. The hardware innovation was ahead of the software, in a sense, was ahead of the intelligence. We're in a much better place today where sort of the software and hardware, now it's a lot more of a question of how do we make the hardware and the software work together well. But at the time, a lot of the intelligence and the software and the ability to control the robots for the kind of tasks that were envisioned, the kind of tasks that were not easy to sort of, where the environment wasn't controlled, where you had to work in an environment where humans were involved and there was very, very unstructured, that ability wasn't there. It's still not there today, but at least the path is a lot clearer towards that.

Brian Heater:

It's interesting and fairly enlightening to hear you describe it in that way because, again, I was covering it from the outside at the time and something that struck us all was how, I guess I would say, disparate how diverse the kinds of companies that Google was acquiring were at the time. Like in addition to some of ones you mentioned, I think they acquired Botandali, which was like a special effects company. They did special effects for gravity. And the big question was like, how are they taking all of these companies that work in very different spaces and mashing them together to create a final product? I think it makes more sense now in hindsight if you're describing it in what's kind of perhaps more abstract than any of us on the outside were thinking of at the time was to build a language or to build a physical AI system for robotics.

Vincent Vanhoucke (32:51.982)

Yeah, I think that was the governing idea behind this. And you're right that it ended up being a number of very disparate companies. And it turned out that putting a bunch of hardware companies and putting them into a room and say, please do AI was a lot harder. It was worth a shot Yeah. It was worth a shot. Well, it's been an interesting sort of, you know, sort of I'm very glad that some of those companies sort of were able to go through that upheaval and actually survive. Like, Boston Dynamics is a good example of that, where they're very successful today. The experiment was, to some degree, a very expensive experiment, but one that ended up forging very good idea of what it took to really bring about the embodied intelligence that was envisioned at the time. We learn from this mostly this lesson of we need to bring a software engineering perspective to that equation. We need to bring machine learning and AI perspective to that equation. At the time, ML community and the robotics community, there was not a lot of common ground between the two. In fact, I started a conference, an academic conference called the Conference on Robot Learning, specifically to bring the robotics and machine learning community into the same room. It's been going on for nine years and it's been very successful at really bringing the communities to talk and forging a common language that the communities could bring back to the broader AI community and build benchmarks and ways of evaluating performance, ways of measuring progress. I'm very happy that this is now a thing, that these AI plus robotics community is a thriving community.

Brian Heater:

Yeah, anybody who watches this show with any regularity knows that one of my special interests is Willow Garage. And that's because also ahead of its time, also, I think it's fair to say it didn't achieve its end goal, but still managed to leave a really large mark on robotics, right, and still launched a thousand ships. And I think that there is a very real parallel that you can draw to this Google Robotics project. Obviously, some unfortunate consequences. Not every single company made it. But I think probably one of the upsides of it, that you do have a situation when you stick a bunch of roboticists in the same room and ask them to work on a project, is you've got a of really smart people cross-pollinating and working with each other. And that's why some of this work came out of that

Vincent Vanhoucke (36:07.66)

This is also one thing that I'm very proud of from my time at Google Robotics is that if you look at a lot of the companies that are innovating today, a lot of the people that are leading the AI efforts in those companies are people who at one time were part of Google robotics. So it's physical intelligence, it's a figure, it's 1x It's a number of those companies that have, even at Tesla, the number of folks that have gone through that sort of new way of thinking about robotics that was emerging at the time, have gone on to work on kind of the next frontier of robotics companies I'm looking back at, I remember covering everyday robots and I don't know how familiar people are with them, but they're kind of in that in-between space, right? They came after that initial push and ahead of where we are now and they kind of effectively, that team sort of migrated into DeepMind Robotics pretty fair to say, but I'm looking back on the work that they were doing at the time and like it's pretty clearly they were having the same conversations about physical AI that we're having right now. Yeah, yeah, they were very much ahead of the time and they were a very key partner for us at Brain Robotics to really enable that junction. They are very talented people both on the AI side and on the hardware side who understood that something had to change and evolve if we wanted to build robots that were driven by AI approaches.

Vincent Vanhoucke (38:06.67)

It went against a lot of the sort of predominant orthodoxy. A lot of the robotic systems in the past were very much designed around the, I want to say the ROS framework. This idea that everything is extremely modular and they only talk to each other through message passing. That kind of went against the idea of being able to optimize things end to end. And so we had to really rethink the stack so that we could do sort of end to end optimization and connect the perception all the way to the controls without having a very complicated software stack to stitch things in between. And understanding that there was a big shift that went all the way down to the operating system essentially that they need to happen and having that flexibility to think about the problem in a different way was clearly a huge enabler for the kind of research that we were doing. They also had a very strong sort of product vision of let's build a very inexpensive robot that is mobile, that has a manipulator, and that has enough compute on the robots that we can do very intelligent things. At the time, there was not really any off-the-shelf robots that had this combination of mobile manipulation and compute, and that really turbocharged, in effect, the research.

Brian Heater:

Is there anything now that fills that void? know that Nvidia's got some of these kind of platform robots that they're using right now, but I'm trying to think of something that's like a direct analogy.

Vincent Vanhoucke (40:02.84)

There is a lot of startups that are building mobile manipulator robots today, whether it's humanoids or non-humanoids. There's a lot of wheeled robots. Yeah, yeah. And a lot of those, now you can actually buy robots off the shelf. The price has gone down dramatically in the last five years And so there is a lot more diversity of platforms. We've also kind of established more and more that you could build a foundation that is essentially embodiment agnostic and then fine tune it to all of those different platforms. So there is a lot more belief in the ability to bring, to build a very generalist robotic brain, essentially. and be able to retarget it to a specific platform relatively easily with some amount of data. But the world has really, the ecosystem has exploded essentially. 2011 in real terms feels like yesterday, in robotics terms feels like a million years ago.

Brian Heater:

You were working, I think, at Google Brain at the time. What was deep learning in 2011?

Vincent Vanhoucke:

In 2011, I was working mostly on speech recognition at the time. And we had a very different set of tools than deep learning, powering all the speech recognition algorithms. And then I started collaborating with a gentleman called Jeffrey Hinton, who is now called sort of the godfather of AI or one of the godfathers of AI And at the time, one of his students came to work with us and promptly, over the course of a summer, completely blew our out of the water performance on all the benchmarks we had on speech recognition. And we ended up deploying what I think was the very first large scale deep learning system anywhere with Google Voice Search.

Vincent Vanhoucke (42:30.112)

And that was kind of the embryo of Google Brain existed at the time as a project at Google X. quickly people sort of realized as a result of this results that there was a lot of very direct applications of deep learning that could really benefit Google. And so we brought Google Brain back into Google. And that's when I joined the effort then There was really a belief or an idea then that this new way of doing very data-driven learning in a way that was very generic, that could cut across all the different products at Google, could really revolutionize the way we think about how do we do translation, how do we do search, how do we do ads, how do we do image recognition and things like this, and bring all of that onto a common substrate that if Google could invest in that substrate, suddenly it could really make an impact on all those products with very good efficiency scales. So the next few years were basically spent looking at how do we essentially disrupt Google from within by bringing deep learning to all the different areas of natural language processing, computer vision, machine translation and everything And a lot of that work sort of basically laid the foundation for this large-scale AI that we see today. At the time, it was very much a big leap of faith. There was this idea and a few people were like Jeffrey Hinton or Andrew Ng or Jeff Dean were really the visionaries that really saw this as a possibility and decided to sort of bring the investments to be commensurate with that potential opportunity.

Brian Heater (44:37.57)

Looking at your CV with the 10,000 foot view just doesn't make a lot of sense until you realize that that's kind of the inflection point right there. That's how you go from Raya to Waymo. That's how you go from facial recognition to a shopping site and then eventually end up at self-driving cars.

Vincent Vanhoucke:

Yeah, before I joined Google, I was in this small startup that was basically trying to do computer vision before any of it really worked. Some of it worked. You could do face detection quite reliably at the time, but anything beyond that was very, very challenging. But we were trying to bring computer vision to the general public, very much like Google Photos has brought a lot of computer vision capabilities to your photo collection today. We were trying to do that pre-deep learning. And so we had some very targeted successes, but it was again, a little bit ahead of its time, but still in the vein of, what can we do with the modern or what was the modern AI technology at the time? And how do we bring that to? users in a way that they can connect with it.

Brian Heater (46:30.562)

At this point, you're working on something that I guess it's fair to say is much closer than to, you know, mean, it is technically it's out there, you know, obviously, again, like in San Francisco, and now an increasing number of cities like you can catch a Waymo. You had that big highway news, which I was like, I think maybe even like under covered as far as how substantial of news that was. How does your job shift when that kind of change is taking place at a company.

Vincent Vanhoucke:

A lot of the work today is about enabling scale. You were talking about launching our service on freeways has some similarities to driving on surface streets, also has material differences. Cars go faster. When something goes wrong, cannot just stop have to sort of react to it in a way that doesn't disrupt the traffic. There's a lot of different and the geometry of freeways changes all the time. So you have to understand all the dynamic, how dynamic those things are. A lot of those challenges sort of build on top of the existing challenges To some degree, the scaling problem always brings about very different things to work on. I often say if you drive a million miles a week, you see the things that happen once in a million every week. If you drive 10 million miles a week, you see the things that happen once in 10 million every week Cases are no longer that far on the edge.

Vincent Vanhoucke (47:57.782)

No, and it turns out something that happens once in 10 million miles is very different from something that happens once in 1 million miles. They all have their own very different peculiarities. so scaling and generalizing sort of go hand to hand. And what AI really brings to the table is this ability to generalize well and to bring about this level of scalability One of the sort of big and ever-present lessons that I feel like I've learned indirectly coming over to A3 is that people seem to consistently underestimate how difficult it is to scale, both in terms of data we're talking about now, but also, and this is kind of going to be a huge thing with humanoid robots, deployment and manufacturing. People just tend to just underestimate what it takes to scale The humanoid space is interesting in that respect, right? If you squint at it, it does feel a little bit like humanoid robotics is where Waymo was maybe in 2013, right? In the sense that there is a lot of demos, there is a lot of expectations, a lot of investments going towards that space. the question is, are we going to see a replay of the autonomous driving? story play out in the humanoid robotics space. If that's the case, then they have a long road ahead. And the question is, are the phonomotors of that industry similar enough that you can imagine replaying the same story, or are the phonomotors very different? The product story is slightly different. There is a question of, humanoids the right endpoint? Is it something that the world needs to be successful in robotics? Or are we going to be building more bespoke robots, more targeted robots for a long time? And that's what will eventually win out. Are we in the right generation of technology to enable humanoids to work?

Vincent Vanhoucke (50:21.708)

Do they have the same kind of characteristics in terms of the autonomous driving space? What is the dominant factor is really safety. You have to be able to drive safely. You have cars on the roads around people and the safety equation dominates the design space. I think there is in many circles in humanoid robotics, a belief that maybe safety is less of an issue in the development of humanoid robots. I vehemently disagree with that. think the safety question in humanoid robotics has to be central. It's a different question than what we see in autonomous driving, but it's still a very central problem and in some ways, even a harder problem. that needs to be solved and that will greatly influence the design space of humanoid robots.

Brian Heater:

Well, Vincent, it's been an absolute pleasure. Thank you so much for taking the time.

Vincent Vanhoucke:

Thanks for having me.

Brian Heater (51:36.664)

Thank you so much to Vincent and Waymo for that great conversation. Thanks to you as always for helping us pump up our watch hours by sticking around until the bitter end of the show. Please like and subscribe while you're at it. And don't forget to check out the Automated weekly newsletter. And with that, we will see you next week for another episode of Automated.

Unlock Full Access to Automated and Explore Everything Automation.

Subscribe today and leave a review on YouTube, Apple Podcasts, and Spotify.

PODCAST HOST

Meet Brian Heater

Brian Heater is A3’s Managing Editor. During his 20+ year career in technology journalism, he has worked as Hardware Editor at TechCrunch, Managing Editor at Tech Times, and Director of Media at Engadget. He is the host of the RiYL podcast and lives in New York’s Hudson Valley with his two rabbits, June and Flash.

Subscribe to the Automated Newsletter

The future of automation delivered to your inbox every Thursday. Interviews with the top minds in robotics and AI, the week’s biggest news, the latest job openings, and more.

We’d love to hear from you! Have thoughts or guest suggestions? Reach us at [email protected]

Automated

With Brian Heater

Waymo’s Vincent Vanhoucke on Embodied AI and Robotics

Transcript

Unlock Full Access to Automated and Explore Everything Automation.

Meet Brian Heater

Subscribe to the Automated Newsletter

Follow Us Everywhere: