Nanda Yu Yang's in-depth interpretation: What is a 'world model”?
As the media frantically hypes Sora, OpenAI’s introductory material calls Sora a “world simulator.” The term world model has come into view again, but there are few articles introducing world models.
Here we review what a world model is and discuss whether Sora is a world simulator.
What are world models/world models
When the words world/world and environment/environment are mentioned in the field of AI, usually It is to distinguish it from the intelligent body/agent.
The fields where most agents are studied are reinforcement learning and robotics.
So we can see that world models and world modeling appear earliest and most often in papers in the field of robotics.
The word world models that has the greatest impact today may be this article named "world models" that Jurgen posted on arxiv in 2018. The article was eventually titled "Recurrent World Models" The title "Facilitate Policy Evolution" was published at NeurIPS'18.
The paper does not define what World models are, but instead makes an analogy to the mental model of the human brain in cognitive science, citing the 1971 of literature.
The mental model is the human brain’s mirror image of the surrounding world
The mental model introduced in Wikipedia, It is clearly pointed out that it may participate in cognition, reasoning, and decision-making processes. And when it comes to mental model, it mainly includes two parts: mental representations and mental simulation.
an internal representation of external reality, hypothesized to play a major role in cognition, reasoning and decision-making. The term was coined by Kenneth Craik in 1943 who suggested that the mind constructs " small-scale models" of reality that it uses to anticipate events.
It's still a bit confusing, but the structure diagram in the paper clearly explains what a world model is.
The vertical V->z in the figure is the low-dimensional representation of the observation, implemented with VAE, and the horizontal M->h-> M->h is the representation of the sequence that predicts the next moment, which is implemented using RNN. The two parts add up to the World Model.
In other words, the World model mainly includes state representation and transition model, which also corresponds to mental representations and mental simulation.
When you see the picture above, you may think, aren’t all sequence predictions world models?
In fact, students who are familiar with reinforcement learning can see at a glance that the structure of this picture is wrong (incomplete), and the real structure is the picture below. The input of RNN is not only It's z, and there's action. This is not the usual sequence prediction (will adding an action be very different? Yes, adding an action can allow the data distribution to change freely, which brings huge challenges).
#Jurgen’s paper belongs to the field of reinforcement learning.
So, aren’t there many model-based RL in reinforcement learning? What is the difference between the model and the world model? The answer is there is no difference, it is the same thing. Jurgen first said a paragraph
The basic meaning is that no matter how many model-based RL work, I am the RNN pioneer, RNN is the one who makes the model. Invented, I just want to do it.
In the early version of Jurgen's article, he also mentioned a lot of model-based RL. Although he learned the model, he did not fully train RL in the model.
The RL is not fully trained in the model. In fact, it is not the difference between the models of model-based RL, but the long-standing frustration of the model-based RL direction: the model is not accurate enough and the training is completely in the model. The RL effect is very poor. This problem has only been solved in recent years.
The smart Sutton realized the problem of inaccurate model a long time ago. In 1990, the paper Integrated Architectures for Learning, Planning and Reacting based on Dynamic Programming that proposed the Dyna framework (published on ICML, which was the first workshop to be a conference), called this model an action model, emphasizing predicting the results of action execution.
RL learns from real data (line 3) while learning from the model (line 5) to prevent inaccurate model learning from poor strategy.
#As you can see, the world model is very important for decision-making. If you can obtain an accurate world model, you can find the optimal decision in reality by trial and error in the world model.
This is the core function of the world model: counterfactual reasoning/Counterfactual reasoning, that is, even for decisions that have not been seen in the data, decisions can be inferred in the world model the result of.
Students who understand causal reasoning will be familiar with the term counterfactual reasoning. In the popular science book The book of why, Turing Award winner Judea Pearl draws a causal ladder, with the lowest level It is "association", which is what most prediction models today are mainly doing; the middle layer is "intervention", and exploration in reinforcement learning is a typical intervention; the top layer is counterfactual, answering the what if question through imagination. The schematic diagram Judea drew for counterfactual reasoning is what scientists imagine in their brains, which is similar to the schematic diagram Jurgen used in his paper.
Left: Schematic diagram of the world model in Jurgen’s paper. Right: The ladder of cause and effect in Judea’s book.
We can conclude here that AI researchers’ pursuit of world models is an attempt to transcend data, conduct counterfactual reasoning, and pursue the ability to answer what if questions. This is an ability that humans naturally have, but the current AI is still very poor at it. Once a breakthrough is made, AI decision-making capabilities will be greatly improved, enabling scenario applications such as fully autonomous driving.
Is Sora a world simulator
The word simulator appears more in the engineering field, and it works the same as a world model. Try those things that are difficult to High-cost, high-risk trial and error of real-world implementation. OpenAI seems to want to re-form a phrase, but the meaning remains the same.
The video generated by Sora can only be guided by vague prompt words, making it difficult to control accurately. Therefore, it is more of a video tool and is difficult to use as a counterfactual reasoning tool to accurately answer what if questions.
It is even difficult to evaluate how strong Sora’s generation ability is, because it is completely unclear how different the demo video is from the training data.
What’s even more disappointing is that these demos show that Sora has not accurately learned the laws of physics. I have seen someone point out the inconsistency with physical laws in the videos generated by Sora [OpenAI releases Vincent video model Sora, AI can understand the physical world in motion. Is this a world model? What does it mean? ]
I guess that OpenAI releases these demos based on very sufficient training data, even including data generated by CG. However, even so, the physical laws that can be described by equations with a few variables are still not grasped.
OpenAI believes that Sora proves a route to simulators of the physical world, but it seems that simply stacking data is not the path to more advanced intelligent technology.
The above is the detailed content of Nanda Yu Yang's in-depth interpretation: What is a 'world model”?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

DMA in C refers to DirectMemoryAccess, a direct memory access technology, allowing hardware devices to directly transmit data to memory without CPU intervention. 1) DMA operation is highly dependent on hardware devices and drivers, and the implementation method varies from system to system. 2) Direct access to memory may bring security risks, and the correctness and security of the code must be ensured. 3) DMA can improve performance, but improper use may lead to degradation of system performance. Through practice and learning, we can master the skills of using DMA and maximize its effectiveness in scenarios such as high-speed data transmission and real-time signal processing.

Handling high DPI display in C can be achieved through the following steps: 1) Understand DPI and scaling, use the operating system API to obtain DPI information and adjust the graphics output; 2) Handle cross-platform compatibility, use cross-platform graphics libraries such as SDL or Qt; 3) Perform performance optimization, improve performance through cache, hardware acceleration, and dynamic adjustment of the details level; 4) Solve common problems, such as blurred text and interface elements are too small, and solve by correctly applying DPI scaling.

C performs well in real-time operating system (RTOS) programming, providing efficient execution efficiency and precise time management. 1) C Meet the needs of RTOS through direct operation of hardware resources and efficient memory management. 2) Using object-oriented features, C can design a flexible task scheduling system. 3) C supports efficient interrupt processing, but dynamic memory allocation and exception processing must be avoided to ensure real-time. 4) Template programming and inline functions help in performance optimization. 5) In practical applications, C can be used to implement an efficient logging system.

In MySQL, add fields using ALTERTABLEtable_nameADDCOLUMNnew_columnVARCHAR(255)AFTERexisting_column, delete fields using ALTERTABLEtable_nameDROPCOLUMNcolumn_to_drop. When adding fields, you need to specify a location to optimize query performance and data structure; before deleting fields, you need to confirm that the operation is irreversible; modifying table structure using online DDL, backup data, test environment, and low-load time periods is performance optimization and best practice.

Measuring thread performance in C can use the timing tools, performance analysis tools, and custom timers in the standard library. 1. Use the library to measure execution time. 2. Use gprof for performance analysis. The steps include adding the -pg option during compilation, running the program to generate a gmon.out file, and generating a performance report. 3. Use Valgrind's Callgrind module to perform more detailed analysis. The steps include running the program to generate the callgrind.out file and viewing the results using kcachegrind. 4. Custom timers can flexibly measure the execution time of a specific code segment. These methods help to fully understand thread performance and optimize code.

The built-in quantization tools on the exchange include: 1. Binance: Provides Binance Futures quantitative module, low handling fees, and supports AI-assisted transactions. 2. OKX (Ouyi): Supports multi-account management and intelligent order routing, and provides institutional-level risk control. The independent quantitative strategy platforms include: 3. 3Commas: drag-and-drop strategy generator, suitable for multi-platform hedging arbitrage. 4. Quadency: Professional-level algorithm strategy library, supporting customized risk thresholds. 5. Pionex: Built-in 16 preset strategy, low transaction fee. Vertical domain tools include: 6. Cryptohopper: cloud-based quantitative platform, supporting 150 technical indicators. 7. Bitsgap:

How to achieve the effect of mouse scrolling event penetration? When we browse the web, we often encounter some special interaction designs. For example, on deepseek official website, �...
