Industry Insights
Generative AI Can Help Robots 'See' Through Objects

Researchers from MIT are working to help robots “see” through obstacles. The team has developed a machine vision method based on generative AI (GenAI) that allows for more accurate shape reconstructions and will enable robots to better grasp and manipulate hidden objects.
The wireless machine vision technology uses millimeter wave (mmWave) sensing — the same signals used in Wi-Fi — to reconstruct objects hidden from view. GenAI can improve the sensing of hidden objects by "filling in" the data gaps that usually occur with mmWave technology, which typically only generates a partial reconstruction of an object, leaving only the top surface visible.
GenAI algorithms normally require large datasets to train the model, but none exist for mmWave technology. Instead, the researchers created their own synthetic dataset by adapting the images in large computer vision models to mimic the properties in mmWave reflections. The complete system was called Wave-Former and could reconstruct 70 everyday objects hidden behind cardboard, wood, drywall, plastic, and fabric.
The machine vision capabilities were also expanded utilizing “ghost signals”. These are reflected copies of the original signal that change location as a human moves around the room because the signals reflect off the human, onto a wall or object, and then back to the sensor. Instead of discarding these signals as noise, they provided extra information about the room layout. The full reconstruction system, called RISE, generated reconstructions that were twice as precise as existing techniques, and could provide robots—such as those in warehouses—with much greater manipulation abilities for grasping and moving hidden objects around.
A3 conducted an interview with researchers from MIT, Laura Dodds, Maisy Lam, and Fadel Adib, to discuss their latest technological developments in more detail.
Why did you undertake this research? Where did the idea come from?
We have always been interested in using wireless signals to sense the world in new ways. We have been working in this area for several years with very promising results, ranging from tracking humans through walls to partial object reconstruction. However, compared to cameras, wireless signals have a much lower resolution and can “see” a much smaller portion of the world, which limits our ability to produce complete, high-resolution reconstructions. However, seeing how recent advances in GenAI have revolutionized computer vision, we started thinking that if we could incorporate the power of GenAI into wireless sensing, we might be able to unlock a new form of computer vision.
How difficult is it for robots to see hidden objects without AI, using the wireless mmWave sensing technology alone?
We have been developing techniques to use wireless signals to “see” objects without AI for several years. In doing so, we have developed new reconstruction algorithms which have shown significant leaps in how well we can reconstruct objects. However, even with our new reconstruction technique, an inherent limitation of wireless sensing (without AI) is that it can only reconstruct parts of an object which reflect signals back to the sensor. At these frequencies, almost every surface acts like a mirror, so many parts of the object will reflect signals away from the sensor and will be invisible to our reconstruction. By combining our reconstruction methods with the latest advances in GenAI, we can now fill in the missing parts of the object and infer complete 3D shapes using wireless. Our previous method, mmNorm, could accurately reconstruct the top surface of an object, but does not see the sides or bottom, while our new method, Wave-Former, can infer the complete shape. Quantitatively, we have gone from only reconstructing 54% of the object with our previous method to 72% by leveraging generative AI.
Aside from using AI to train and better understand object data from the synthetic dataset, is there a kind of self-learning, or human-led re-training, of the algorithms after they have been used in actual sensing operations?
One of the major challenges we faced when designing and training these systems is that there was a large lack of real-world mmWave datasets. Unlike computer vision or language, which often have millions (or more) data points to train from, we only had a few hundred. So, to build successful GenAI models, we needed to find a way to train with the data available to us. To do so, we used the domain knowledge we have built over the past several years to take existing large-scale (non-wireless) datasets of object shapes or scene layouts and predict what our wireless sensors would or would not be able to see, such as parts of an object or parts of a scene. Then, using these predictions, we trained wireless GenAI models without any real-world wireless data. And we showed that these models still perform accurately when tested on real-world data. And what's exciting about this is that in the future, as larger real-world wireless datasets start to become available, we imagine that these same models can be further trained on real-world data and perform even better than they do now.
Certified System Integrator Program
Set Yourself at the Forefront of the Global Vision Market
Vision system integrators certified by A3 are acknowledged globally throughout the industry as an elite group of accomplished, highly skilled and trusted professionals. You’ll be able to leverage your certification to enhance your competitiveness and expand your opportunities.
What are 'ghost signals'?
When a human walks through the environment, our sensor not only gets a direct reflection from the human, but also what are called “ghost signals” or a copy of the human which appears in a different location. This is because walls and furniture all act like mirrors for wireless signals, so the ghost signal is like the reflection of the human in a mirror. These ghost signals have traditionally been treated as unwanted interference to be suppressed or eliminated. However, these signals carry rich information about the scene, encoding additional geometric cues that are not available from a single sensor’s direct reflections alone. In traditional imaging, a single static sensor can only reconstruct portions of a scene from which direct reflections are received, leaving the rest unobserved. However, these ghost signals move as the human moves, and their motion depends on the layout of the scene. By leveraging these “ghost” signals rather than discarding them, we can extract information about these previously unobserved regions of the scene, drastically enhancing scene understanding.
What manner of robots would benefit from this AI-enhanced approach?
We envision that this new form of computer vision can revolutionize robotics across a wide range of applications such as shipping and logistics, warehousing, manufacturing, and even retail. Robots operating in dynamic, unstructured environments stand to benefit the most, such as warehouse automation systems, retail inventory robots, and collaborative industrial machines. With enhanced perception that enables them to “see” hidden objects, these systems can better understand their surroundings and execute increasingly complex tasks, including packing, quality control, and beyond.
What's next for the research?
We see significant real-world potential for this work, and we are currently in the process of exploring and bringing it to the market. Our research group has prior experience translating research into market-ready solutions, so we’re actively exploring commercialization pathways for our work. In smart homes, this technology could introduce spatial awareness and contextual understanding, enabling environments that respond more intelligently to occupants. In enterprise settings, it could optimize utilization and planning to improve efficiency and reduce operational costs. In retail and logistics, real-time awareness of scenes could enhance inventory management and streamline workflows.
A key advantage is that this approach can leverage existing infrastructure, such as Wi-Fi routers, which are already installed in most environments, making deployment both scalable and cost-effective. Importantly, this technology also enables privacy-preserving sensing. Many current solutions for scene understanding rely heavily on cameras, which raise privacy concerns. By using wireless signals instead, we can achieve similar levels of environmental awareness without capturing visual data, offering a more privacy-conscious alternative.
Beyond commercialization, we are also interested in taking the leap from GenAI focused on single, specific tasks to large wireless foundation models that can extract valuable insight from wireless signals to enable a large range of tasks. Similar to how foundation models, such as ChatGPT, have led to massive leaps in the capabilities of AI for language or computer vision, we believe a wireless foundation model can revolutionize wireless sensing and unlock new capabilities in using wireless signals to sense the world.
Association for Advancing Automation
Discover how Association for Advancing Automation can support your automation journey with their complete range of solutions and expertise.
Visit Company WebsiteOnRobot Opens First US Office Exclusively Dedicated to Research & Development in Los Angeles, California
New office location in Culver City provides OnRobot opportunity to attract talent and add new products to a portfolio
Weekly Bot Brief on Robotic Research and Investment Review 10-12-2018
"There is no force on earth more powerful than an idea
3D scanner used for finding unknown objects by matching point clouds with STL models
The University of Ostrava developed an application to automate the search for unknown objects within a large database





