


One question distinguishes humans and AI! 'Beggars' version' Turing test, difficult for all big models
A "ultimate beggar's version" of the "Turing test" stumps all major language models.
Humans can pass the test effortlessly.
Capital Letter Test
The researchers used a very simple method.
Mix the real problem into some messy words written in capital letters and submit it to the large language model.
There is no way for large language models to effectively identify the real questions being asked.
Humans can easily remove the "capital letter" words from the questions, identify the real questions hidden in the chaotic capital letters, provide answers, and pass the test.
The question in the picture itself is very simple: is water wet or dry?
Humans just answer wet and that’s it.
But ChatGPT has no way to eliminate the interference of those capital letters to answer the question.
So a lot of meaningless words were mixed into the questions, making the answers very lengthy and meaningless.
In addition to ChatGPT, the researchers also conducted similar tests on GPT-3 and Meta’s LLaMA and several open source fine-tuning models, and they all failed the “capital letter test.”
The principle behind the test is actually simple: AI algorithms typically process text data in a case-insensitive manner.
So, when a capital letter is accidentally placed in a sentence, it can cause confusion.
AI doesn't know whether to treat it as a proper noun, an error, or simply ignore it.
In addition to the capital letter test mentioned above, researchers are trying to find a way to more effectively distinguish between humans and chatbots in an online environment.
Paper:
#########The researchers focus on the design of the weaknesses of large language models. ############In order to prevent the large language model from passing the test, seize the "seven inches" of AI and blast it with a hammer. ############The following test methods are hammered out. ###########################As long as the big model is not good at answering questions, we will target them like crazy. ######
Counting
The first is counting, knowing that counting large models is not enough.
Sure enough, I can count all three letters wrong.
Text replacement
Then text replacement, several letters replace each other, allowing the large model to spell out a new word.
AI struggled for a long time, but the output result was still wrong.
##Position replacement
This is not the case either The strengths of ChatGPT.
The chatbot cannot complete the letter filtering that can be accurately completed by elementary school students.
Question: Please enter the 4th letter after the second "S". The correct answer is " c》
Random editing
It takes almost no effort for humans to complete, and AI still Unable to pass.
Noise implant
This is also It’s the “capital letter test” we mentioned at the beginning.
By adding all kinds of noise (such as irrelevant capital letters words) to the question, the chatbot cannot accurately identify the question and therefore fails the test.
# The difficulty of seeing the real problem in these jumbled capital letters is really not worth mentioning.
Symbol text
This is another task with almost no challenge for humans.
But for a chatbot to be able to understand these symbolic texts without a lot of specialized training, it should be Very difficult.
After a series of "impossible tasks" designed by researchers specifically for large language models.
#########In order to distinguish humans, they also designed two tasks that are relatively simple for large language models but difficult for humans. ###############Memory and calculation###############Through advance training, large language models are relatively good in these two aspects. Performance. ######Human beings are limited in their inability to use various auxiliary devices, and basically have no effective answers to large amounts of memory and 4-digit calculations.
Human VS large language model
Researchers conducted this "human distinction" on GPT3, ChatGPT, and three other open source large models: LLaMA, Alpaca, and Vicuna Test》
It can be clearly seen from the results that the large model did not successfully blend into humans.
The research team open sourced the problem at https://github.com/hongwang600/FLAIR
##The best-performing ChatGPT only has a pass rate of less than 25% in the position replacement test.
And other large language models perform very poorly in these tests designed specifically for them.
It is completely impossible to pass the test.
But for humans it is very simple, almost 100% passed.
As for the problems that humans are not good at, humans are almost completely wiped out and completely defeated.
AI is obviously competent.
It seems that the researchers are indeed very careful about the test design.
"Don't let any AI go, but don't wrong any human being"
This distinction is very good!
References: https://www.php.cn/link/5e632913bf096e49880cf8b92d53c9ad
The above is the detailed content of One question distinguishes humans and AI! 'Beggars' version' Turing test, difficult for all big models. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Bitcoin’s price fluctuations today are affected by many factors such as macroeconomics, policies, and market sentiment. Investors need to pay attention to technical and fundamental analysis to make informed decisions.

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.
