Artificial intelligence is not yet fully capable of understanding the physical world. This remains the primary challenge for the technology, stated Stanford University computer science professor Fei-Fei Li.
“Leading AI technologies like large language models (LLM) have changed how we access and work with abstract knowledge. However, they remain masters only in words: eloquent but inexperienced, knowledgeable yet unsubstantiated,” he believes.
According to the scientist, the emergence of “spatial intelligence” will transform how people “create and interact with real and virtual worlds, revolutionising literature, art, robotics, science, and more.”
Developing such technology requires training models not only on “language” but also on the physical properties of the world.
Li asserts that artificial intelligence is rapidly approaching the limits of text-based learning, and ultimately its progress will depend on “world models”—a new type of generative AI that must tackle a fundamentally different set of tasks than LLMs.
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it?
Today, I want to share with you my thoughts on… pic.twitter.com/L0bnJcCUqc
— Fei-Fei Li (@drfeifei) November 10, 2025
“Such systems must generate spatially coherent worlds that adhere to physical laws, process multimodal inputs—from images to actions—and predict the evolution of these worlds,” Li explained.
According to the professor’s vision, spatial intelligence represents “a frontier beyond language—the ability to create interconnections.”
The Concept of “World Models”
The concept emerged in the early 1940s during the research of Scottish philosopher and psychologist Kenneth Craik in cognitive science.
The idea resurfaced in the modern AI space in 2018 following a paper by David Ha and Jürgen Schmidhuber, suggesting that a neural network could learn and recreate a compact internal model of its environment and use it as a simulator for planning and control.
However, solving the problem requires creating complex systems capable of storing spatial memory and modeling scenes in more than two dimensions.
In September, Li’s company, World Labs, released a beta version of Marble—an early “world model” that created interactive three-dimensional environments using text or graphic prompts.
Users could navigate the generated environments without time constraints or scene loading, while the environment remained unified, unchanged, and intact.
“The next frontier in AI development will be spatial intelligence—a technology that will turn vision into reasoning, perception into action, and imagination into creativity,” said Li, describing Marble as merely the first step.
In October, Nvidia introduced a system for connecting quantum computers to the company’s AI chips. The technology will significantly accelerate data processing and open new opportunities for research in medicine and materials science.
