Industry Insights

Brown University Researchers Teach Robots to Fetch Like Dogs

By Rebecca Szkutak, Contributing Writer, A3

05/04/2026

3 minutes

Brown Robot Dog Graphic

When Ivy He joined Brown University’s Human to Robot Laboratory in 2023, her first project was helping a fellow PhD student figure out how dogs respond to human gestures. He, who is focused on social robots, began to wonder if these same kinds of gestures could also be used to train and direct robots.

He told Jason Liu, a postdoctoral associate and researcher under the same professor, Stefanie Tellex, about her idea. Liu had recently wrapped a project that used natural language to instruct robots on how to navigate environments the same way people chat with ChatGPT.

In early 2024, the researchers combined their methods to see if they could get Boston Dynamics' Spot robots to retrieve objects in a manner similar to dogs playing fetch.

“In a natural way of communication, we use gestures, pointing to places, and we also talk along with it,” He said. “We wanted to study how robots interpret humans’ language and gesture instruction together to do object search in the room.”

First, the team had to input human gestures into a format that could be input to the robot. The pair designed a model that used 3D to denote gestures. The tip of the cone pointed in the same direction as the person.

To incorporate the language component, they layered on existing visual AI models that respond to prompts from natural language. The result was a partially observable Markov decision process (POMDP), a mathematical framework designed for making decisions under uncertain circumstances, designed to help these robot dogs figure out what they were supposed to retrieve.

When He and Liu wrapped up their study in 2025, their hypothesis proved out, demonstrating language and gestures had an 89% success rate for robots finding objects in complex environments.

This method isn’t perfect yet. He and Liu found that the more complicated an environment was, the worse the robots performed. Also, camera placement limits how and where humans can give these types of instructions.

“Some of the limitations we pointed out are very exciting,” Liu said. “They point to very good future research directions, and then maybe we even bridge the gap between research in the laboratory into real-world applications.”

He has since steered her focus to animal-inspired robot-human interaction, and Liu thinks layering on more modalities to these commands might help improve the accuracy. Now, He and Liu are looking to continue building out their multimodal communication strategy by incorporating human gaze.

“I can tell the robot to go get that object, and then with the gesture pointing towards that direction, it doesn’t have to be that precise, and the robot is still able to have a relatively robust capability to get the object,” He said. “I think this is a step toward having robots deployed in more social environments and even toward more home environments. “