Home web3.0 Luma Chief Scientist Jiaming Song on the History of Image and Video Models and the Future of Multimodal Models

Luma Chief Scientist Jiaming Song on the History of Image and Video Models and the Future of Multimodal Models

Jul 18, 2024 am 09:42 AM

In this episode of the AI + a16z podcast, Luma Chief Scientist Jiaming Song joins a16z General Partner Anjney Midha to discuss Jiaming's esteemed career in video models

Luma Chief Scientist Jiaming Song on the History of Image and Video Models and the Future of Multimodal Models

This episode of the AI + a16z podcast features Luma Chief Scientist Jiaming Song in conversation with a16z General Partner Anjney Midha about Jiaming’s impressive career in the field of video models, culminating in the recent release of Luma’s Dream Machine 3D video model, which showcases its ability to reason about the world across multiple dimensions. Jiaming discusses the evolution of image and video models, his vision for the future of multimodal models, and his reasoning behind Dream Machine’s ability to demonstrate emergent reasoning capabilities. According to Jiaming, the model was trained on a volume of high-quality video data that, if measured in relation to language data, would amount to hundreds of trillions of tokens.

Here’s a snippet from their discussion, where Jiaming explains the “bitter lesson” in the context of training generative models, and in the process sums up a key component of why Dream Machine can do what it does by using context-rich video data:

“For many of the problems related to artificial intelligence, it is often more productive in the long run to use simpler methods but more compute, [rather] than trying to develop priors and then trying to leverage the priors so that you can use less compute.

“Cases in this question first happened in language, where people were initially working on language understanding, trying to use grammar or semantic parsing, these kinds of techniques. But eventually these tasks began to be replaced by large language models. And a similar case is happening in the vision domain, as well . . . and now people have been using deep learning features for almost all the tasks. This is a clear demonstration of how using more compute and having less priors is good.

“But how does it work with language? Language by itself is also a human construct. Of course, it is a very good and highly compressed kind of knowledge, but it’s definitely a lot less data than what humans take in day to day from the real world . . .

“[And] it is a vastly smaller data set size than visual signals. And we are already almost exhausting the . . . high-quality language sources that we have in the world. The speed at which humans can produce language is definitely not enough to keep up with the demands of the scaling laws. So even if we have a world where we can scale up the compute infrastructure for that, we don’t really have the infrastructure to scale up the data efforts . . .

“Even though people would argue that the emergence of large language models is already evidence of the scaling law . . . against the rule-based methods in language understanding, we are arguing that language by itself is also a prior in the face of more of the richer data signal that is happening in the physical world.”

The above is the detailed content of Luma Chief Scientist Jiaming Song on the History of Image and Video Models and the Future of Multimodal Models. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1670
14
PHP Tutorial
1276
29
C# Tutorial
1256
24
OM Mantra Cryptocurrency Crashes 90%, Team Allegedly Dumps 90% of Token Supply OM Mantra Cryptocurrency Crashes 90%, Team Allegedly Dumps 90% of Token Supply Apr 14, 2025 am 11:26 AM

In a devastating blow to investors, the OM Mantra cryptocurrency has collapsed by approximately 90% in the past 24 hours, with the price plummeting to $0.58.

TrollerCat ($TCAT) Stands Out as a Dominant Force in the Meme Coin Market TrollerCat ($TCAT) Stands Out as a Dominant Force in the Meme Coin Market Apr 14, 2025 am 10:24 AM

Have you noticed the meteoric rise of meme coins in the cryptocurrency world? What started as an online joke has quickly evolved into a lucrative investment opportunity

Metaplanet Expands Its Bitcoin Treasury Holdings by Another 319 BTC Metaplanet Expands Its Bitcoin Treasury Holdings by Another 319 BTC Apr 15, 2025 am 11:20 AM

In an announcement made earlier today, Japanese firm Metaplanet revealed it has acquired another 319 Bitcoin (BTC), pushing its total corporate holdings beyond 4,500 BTC.

Bitwise Announces the Listing of Four of Its Crypto ETPs on the London Stock Exchange (LSE) Bitwise Announces the Listing of Four of Its Crypto ETPs on the London Stock Exchange (LSE) Apr 18, 2025 am 11:24 AM

Bitwise, a leading digital asset manager, has announced the listing of four of its crypto Exchange-Traded Products (ETPs) on the London Stock Exchange (LSE).

Bitcoin (BTC) analyst who nailed the 2021 market meltdown sees bullish reversal pattern Bitcoin (BTC) analyst who nailed the 2021 market meltdown sees bullish reversal pattern Apr 14, 2025 am 11:20 AM

Pseudonymous analyst Dave the Wave tells his 149,300 followers on the social media platform X that Bitcoin looks to be in the early stages of printing an inverse head-and-shoulders pattern

As Binance Coin (BNB) Gains Momentum Toward a $1,000 Breakout, New Altcoin RCO Finance (RCOF) Is Stirring Conversations As Binance Coin (BNB) Gains Momentum Toward a $1,000 Breakout, New Altcoin RCO Finance (RCOF) Is Stirring Conversations Apr 15, 2025 am 09:50 AM

As Binance Coin (BNB) gains momentum toward a $1,000 breakout

BlockDAG Cuts Through the Noise with 2380% Presale ROI BlockDAG Cuts Through the Noise with 2380% Presale ROI Apr 14, 2025 am 11:24 AM

Price swings and policy pressure aren't new in crypto, but every now and then, a project cuts through the noise with numbers too big to ignore.

Central banks across the world are ramping up their gold purchases Central banks across the world are ramping up their gold purchases Apr 15, 2025 am 11:00 AM

According to a report by The Kobeissi Letter on X, mentioning data from IMS IFS and the Global Gold Council, nations accumulated 24 tonnes of gold in February