Vision & Imaging Blog
Machines Creating Predictive Videos from Photographs
What if all it took to generate a high quality video was a series of photographs?
What if that video could effectively predict the future based on the still shots?
While it’s long been easy to break down an existing video into its component frames, the opposite has not been true. Extrapolating even short video based on individual frames is a processing challenge requiring AI to “fill in the blanks.”
However, researchers at MIT are now developing advanced machine intelligence approaches that may make it possible.
Predictive Video – Can Machines Understand Cause and Effect?
When shown a photo, humans can intuitively determine what will happen next based on the motion they see. For example, someone on a skateboard will probably continue moving in the same direction at roughly the same speed.
This depends on an enormous amount of contextual information humans can easily take for granted because they’re exposed to thousands of examples of it in everyday life. People may not know the mathematical underpinnings of gravity or inertia, but they know it when they see it!
Using sophisticated neural networks, the MIT project – Generating Videos with Scene Dynamics – is promising. Researchers have begun to strengthen artificial intelligence in two key areas:
- Generating short videos that adequately resemble existing footage;
- Predicting future “frames” in terms of how pixels might change.
Some Limitations Still to Be Overcome
After being trained on more than 2 million short videos, the experimental AI is able to generate short clips similar to the footage it encounters. It’s important to realize, however, that this doesn’t reflect a deep “understanding” of events in the footage.
ROI Calculator

Discover the potential cost savings of robotic automation over a 20-year system life
This calculator compares your current manual labor costs against the total cost of owning and operating a robotic system over its 20-year lifespan.
Current systems are capable of constructing only a few seconds of low-resolution footage in response to input. The output often includes significant distortion – it’s easy for a human onlooker to tell the original and generated video apart.
For now, research is focused on providing systems with the capabilities to generate “plausible” futures rather than the correct ones. Experimental machine vision would have to be combined with a deep understanding of physics, processed in real time, to make “correct” predictions.
Directions for the Future: Transportation, Media, and More
With time, however, more advanced predictive capabilities could be integrated into a wide range of applications. Perhaps most interesting, these exact concepts could be applied to refine the systems autonomous vehicles used to identify and avoid non-stationary obstacles. There’s also a wide variety of possibilities in virtual reality, entertainment, and general media production.
For now, however, work is continuing apace at MIT. With untold terabytes of raw video out on the Web that can be used for experimental purposes, AI could be developing predictive visual intelligence sooner than anyone might think.
Recent Posts
- How 3D Vision Systems Are Transforming Food Manufacturing
- These Farms Are Harnessing Machine Vision for Smarter Agriculture
- Revealing the Hidden Effects of Climate Change with Advanced Imaging
- Innovative Machine Vision Lenses and Trends
- An Early Look at GigE Vision 3.0
- Looking at the Latest Machine Vision Standards Updates
- View All Vision & Imaging Blogs