Table of Contents
1 AI does not understand the relationship between language and images
2 Don’t know what a bicycle wheel is? How can it be called AGI?
Home Technology peripherals AI Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI

Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI

Apr 09, 2023 am 09:31 AM
text generation

This article is reproduced from Lei Feng.com. If you need to reprint, please go to the official website of Lei Feng.com to apply for authorization.

Since the advent of DALL-E 2, many people have believed that AI capable of drawing realistic images is a big step towards artificial general intelligence (AGI). OpenAI CEO Sam Altman once declared "AGI is going to be wild" when DALL-E 2 was released, and the media are also exaggerating the significance of these systems for the progress of general intelligence.

But is it really so? Gary Marcus, a well-known AI scholar and enthusiast who pours cold water on AI, expressed his "reservations."

Recently, he suggested that when evaluating progress in AGI, it is key to see whether systems like Dall-E, Imagen, Midjourney and Stable Diffusion truly understand the world and can reason based on this knowledge. and make decisions.

When judging the significance of these systems to AI (including narrow and broad AI), we can ask the following three questions:

Can the image synthesis system Generate high quality images?

Can they relate language input to the images they produce?

Do they understand the world behind the images they present?

1 AI does not understand the relationship between language and images

On the first question, the answer is yes. The only difference is that trained human artists can do a better job at using AI to generate images.

On the second question, the answer is not necessarily certain. These systems can perform well on certain language inputs. For example, the following picture is the "astronaut on a horse" generated by DALL-E 2:

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

But in other cases On some language inputs, these AIs perform poorly and are easily fooled. For example, Marcus pointed out on Twitter some time ago that these systems have difficulty generating corresponding accurate images when faced with "a horse riding an astronaut":

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

Although deep learning advocates have fiercely countered this, such as AI researcher Joscha Bach who believes that "Imagen may just use the wrong training set", machine learning professor Luca Ambrogioni counters that this shows that "Imagen already has a certain degree of common sense", so refuse to generate something ridiculous.

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

There is also a Google scientist Behnam Neyshabur who proposed that if "asked in the right way", Imagen can draw "a horse riding an astronaut":

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

However, Marcus believes that the key to the problem is not whether the system can generate images. Smart people can always find ways to make the system draw specific images, but these systems There is no deep understanding of the connection between language and images, which is the key.

2 Don’t know what a bicycle wheel is? How can it be called AGI?

The system's understanding of language is only one aspect. Marcus pointed out that the most important thing is that judging the contribution of systems such as DALL-E to AGI ultimately depends on the third question: If all the system can do is Converting many sentences into images in an accidental but stunning way, they may revolutionize human art, but still are not truly comparable to, and do not represent, AGI at all.

What makes Marcus despair about the ability of these systems to understand the world are recent examples, such as graphic designer Irina Blok’s “coffee cup with many holes” image generated using Imagen:

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

Normal people will think it goes against common sense after looking at this picture. It is impossible for coffee not to leak from the hole. Similar ones include:

"Bicycle with square wheels"

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI

"Toilet paper covered with cactus spines"

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

Gary Marcus: The text-generated image system cannot understand the world and is still far from AGI

It is easy to say "yes" but difficult to say "no", who Can you know what a thing that doesn't exist should look like? This is where the difficulty lies in getting AI to draw the impossible.

But maybe, the system just "wanted" to draw a surreal image. As DeepMind research professor Michael Bronstein said, he didn't think that was a bad result. Instead, it was He can also draw like this.

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

#So how to finally solve this problem? Gary Marcus found new inspiration in a recent conversation with philosopher Dave Chalmers.

In order to understand the system's understanding of parts and wholes, and functions, Gary Marcus proposed a task to have a clearer idea of ​​whether the system performance is correct, giving the text prompt "Sketch a bicycle and label the parts that roll on the ground" and "Sketch a ladder and label one of the parts you stand on" part).

The special thing about this test is that it does not directly give prompts such as "Draw a bicycle and mark the wheels" or "Draw a ladder and mark the pedals", but Letting AI deduce corresponding things from descriptions such as "parts rolling on the ground" and "parts standing" is a test of AI's ability to understand the world.

But Marcus’ test results show that Craiyon (formerly known as DALL-E mini) is terrible at this kind of thing. It does not understand what bicycle wheels and ladder pedals are:


Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

So is this a problem unique to DALL-E Mini?

Gary Marcus found that it was not the case. The same result also appeared in Stable Diffusion, the most popular text generation image system at present.

For example, let Stable Diffusion "Sketch a person and make the parts that hold things purple" (Sketch a person and make the parts that hold things purple), the result is:

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

Obviously, Stable Diffusion does not understand what human hands are.

And out of the next nine attempts, only one was successfully completed (in the upper right corner), and the accuracy was not high:

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

The next test is, "Draw a white bicycle and turn the part pushed by the foot into orange", and the resulting image is:

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

So it cannot understand what a bicycle pedal is.

And in the test of drawing "a sketch of the bicycle and marking the part rolling on the ground", its performance was not very good:

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

If the text prompt contains a negative word, such as "Draw a white bicycle without wheels", the result is as follows:

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

This Indicates that the system does not understand negative logical relationships.

Even if it is as simple as "drawing a white bicycle with green wheels" that only focuses on the relationship between the part and the whole, and does not have complex syntax or functions, the results still have problems. :

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

#So, Marcus asks, can a system that doesn’t understand what wheels are or what they are used for be considered a major step in artificial intelligence? Progress?

Today, Gary Marcus also issued a poll on this issue. He asked the question, "How much do systems such as Dall-E and Stable Diffusion know about the world they depict? ”

Among them, 86.1% of people think that systems do not understand the world well, and only 13.9% think that these systems understand the world to a high degree.

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

In response, Emad Mostique, CEO of Stability.AI, also responded that I voted for "not many" and admitted that "they are just puzzle pieces." A small piece of it."

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

Alexey Guzey from the scientific organization New Science also made a similar discovery to Marcus. He asked DALL-E to draw a bicycle , but the result is just a bunch of bike elements piled together.

Gary Marcus:文本生成图像系统理解不了世界,离 AGI 还差得远

#So he believes that there is no model that can truly understand what a bicycle is and how it works, and generating current ML models can almost rival or replace humans. Humans are ridiculous.

What do you think?

The above is the detailed content of Gary Marcus: Text-generated image systems cannot understand the world and are far from AGI. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Best AI Art Generators (Free & Paid) for Creative Projects Best AI Art Generators (Free & Paid) for Creative Projects Apr 02, 2025 pm 06:10 PM

The article reviews top AI art generators, discussing their features, suitability for creative projects, and value. It highlights Midjourney as the best value for professionals and recommends DALL-E 2 for high-quality, customizable art.

Getting Started With Meta Llama 3.2 - Analytics Vidhya Getting Started With Meta Llama 3.2 - Analytics Vidhya Apr 11, 2025 pm 12:04 PM

Meta's Llama 3.2: A Leap Forward in Multimodal and Mobile AI Meta recently unveiled Llama 3.2, a significant advancement in AI featuring powerful vision capabilities and lightweight text models optimized for mobile devices. Building on the success o

Best AI Chatbots Compared (ChatGPT, Gemini, Claude & More) Best AI Chatbots Compared (ChatGPT, Gemini, Claude & More) Apr 02, 2025 pm 06:09 PM

The article compares top AI chatbots like ChatGPT, Gemini, and Claude, focusing on their unique features, customization options, and performance in natural language processing and reliability.

Top AI Writing Assistants to Boost Your Content Creation Top AI Writing Assistants to Boost Your Content Creation Apr 02, 2025 pm 06:11 PM

The article discusses top AI writing assistants like Grammarly, Jasper, Copy.ai, Writesonic, and Rytr, focusing on their unique features for content creation. It argues that Jasper excels in SEO optimization, while AI tools help maintain tone consist

10 Generative AI Coding Extensions in VS Code You Must Explore 10 Generative AI Coding Extensions in VS Code You Must Explore Apr 13, 2025 am 01:14 AM

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let&#8217

Selling AI Strategy To Employees: Shopify CEO's Manifesto Selling AI Strategy To Employees: Shopify CEO's Manifesto Apr 10, 2025 am 11:19 AM

Shopify CEO Tobi Lütke's recent memo boldly declares AI proficiency a fundamental expectation for every employee, marking a significant cultural shift within the company. This isn't a fleeting trend; it's a new operational paradigm integrated into p

AV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and More AV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and More Apr 11, 2025 pm 12:01 PM

This week's AI landscape: A whirlwind of advancements, ethical considerations, and regulatory debates. Major players like OpenAI, Google, Meta, and Microsoft have unleashed a torrent of updates, from groundbreaking new models to crucial shifts in le

Choosing the Best AI Voice Generator: Top Options Reviewed Choosing the Best AI Voice Generator: Top Options Reviewed Apr 02, 2025 pm 06:12 PM

The article reviews top AI voice generators like Google Cloud, Amazon Polly, Microsoft Azure, IBM Watson, and Descript, focusing on their features, voice quality, and suitability for different needs.

See all articles