Conjectures about eight technical issues of ChatGPT
Seeing the gorgeous birth of ChatGPT, I have mixed emotions, including joy, surprise, and panic. What makes me happy and surprised is that I did not expect to witness a major breakthrough in natural language processing (NLP) technology so quickly and experience the infinite charm of general technology. The scary thing is that ChatGPT can almost complete most tasks in NLP with high quality, and it is gradually realized that many NLP research directions have encountered great challenges.
Overall, the most amazing thing about ChatGPT is its versatility. Compared with GPT-3, which requires very sophisticated prompts to implement various NLPs that are not very effective. Ability, ChatGPT has made users unable to feel the existence of prompts.
As a dialogue system, ChatGPT allows users to ask questions naturally to achieve various tasks from understanding to generation, and its performance has almost reached the current best level in the open field, many tasks Go beyond models designed individually for specific tasks and excel in code programming.
Specifically, natural language understanding ability (especially the ability to understand user intent) is very prominent, whether it is Q&A, chat, classification, summary, translation and other tasks, although the reply may not be complete Correct, but almost always understand the user's intention, and the understanding ability is far beyond expectations.
Compared with understanding ability, ChatGPT's generation ability is more powerful, and it can generate long texts with certain logic and diversity for various questions. In general, ChatGPT is more amazing and is the initial stage towards AGI. It will become more powerful after some technical bottlenecks are solved.
There are already a lot of summaries of ChatGPT performance cases. Here I mainly summarize some of my thoughts on ChatGPT technical issues. It can be regarded as a simple summary of more than two months of intermittent interaction with ChatGPT. Since we are unable to understand the specific implementation technology and details of ChatGPT, they are almost all subjective conjectures. There must be many mistakes. Welcome to discuss them together.
1. Why is ChatGPT so versatile?
As long as we have used ChatGPT, we will find that it is not a human-computer dialogue system in the traditional sense, but is actually a general language processing platform that uses natural language as the interaction method.
Although GPT-3 in 2020 has the prototype of general capabilities, it requires carefully designed prompts to trigger corresponding functions. ChatGPT allows users to accurately identify using very natural questions. Intended to complete various functions. Traditional methods often identify user intentions first, and then call processing modules with corresponding functions for different intentions. For example, identifying summary or translation intentions through user data, and then calling text summary or machine translation models.
The accuracy of traditional methods in open domain intent recognition is not ideal, and different functional modules work independently and cannot share information, making it difficult to form a powerful NLP universal platform. ChatGPT breaks through the separate model and no longer distinguishes between different functions. It is unified as a specific need in the conversation process. So, why is ChatGPT so versatile? I have been thinking about this issue, but since there is no experimental confirmation, I can only guess.
According to Google's Instruction Tuning research work FLAN, when the model reaches a certain size (e.g. 68B) and the types of Instruction tasks reach a certain number (e.g. 40), the model will emerge with new intentions. recognition ability. OpenAI collects dialogue data of various task types from global users from its open API, classifies and annotates it according to intent, and then performs Instruction Tuning on 175B parameters GPT-3.5, and a universal intent recognition capability naturally emerges.
2. Why does conversation-oriented fine-tuning not suffer from the catastrophic forgetting problem?
The catastrophic forgetting problem has always been a challenge in deep learning, often because after training on a certain task, the performance on other tasks is lost. For example, if a basic model with 3 billion parameters is first fine-tuned on automatic question and answer data, and then fine-tuned on multiple rounds of dialogue data, it will be found that the model's question and answer ability has dropped significantly. ChatGPT does not seem to have this problem. It has made two fine-tunings on the basic model GPT-3.5. The first fine-tuning was based on manually annotated conversation data, and the second fine-tuning was based on reinforcement learning based on human feedback. The data used for fine-tuning is very small. There is less, especially less human feedback scoring and sorting data. After fine-tuning, it still shows strong general capabilities, but it is not completely over-fitted to the conversational task.
This is a very interesting phenomenon, and it is also a phenomenon that we have no conditions to verify. There may be two reasons for speculation. On the one hand, the dialogue fine-tuning data used by ChatGPT may actually include a very comprehensive range of NLP tasks. As can be seen from the classification of user questions using the API in InstructGPT, many of them are not simple conversations, but also There are classification, question and answer, summarization, translation, code generation, etc. Therefore, ChatGPT actually fine-tunes several tasks at the same time; on the other hand, when the basic model is large enough, fine-tuning on smaller data will not improve the model. has a large impact and may only be optimized in a very small neighborhood of the base model parameter space, so it does not significantly affect the general capabilities of the base model.
3. How does ChatGPT achieve its large-scale contextual continuous dialogue capabilities?
When you use ChatGPT, you will find a very surprising ability. Even after interacting with ChatGPT for more than ten rounds, it still remembers the first round of information and can be more accurate according to the user's intention. Identify fine-grained language phenomena such as omission and reference. These may not seem like problems to us humans, but in the history of NLP research, problems such as omission and reference have always been an insurmountable challenge. In addition, in traditional dialogue systems, after too many dialogue rounds, it is difficult to ensure the consistency of topics.
However, ChatGPT almost does not have this problem, and it seems that it can maintain the consistency and focus of the conversation topic even if there are more rounds. It is speculated that this ability may come from three sources. First of all, high-quality multi-turn dialogue data is the foundation and key. Just like Google's LaMDA, OpenAI also uses manual annotation to construct a large amount of high-quality multi-turn dialogue data. Fine-tuning on top of this will stimulate the multi-round dialogue of the model. Conversation skills.
Secondly, reinforcement learning based on human feedback improves the anthropomorphism of the model’s responses, which will also indirectly enhance the model’s consistency ability in multiple rounds of dialogue. Finally, the model's explicit modeling ability of 8192 language units (Tokens) allows it to remember almost a whole day's conversation data of ordinary people. It is difficult to exceed this length in a conversation exchange. Therefore, all conversation history has been Effective memorization, which can significantly improve the ability to hold multiple consecutive rounds of conversations.
4. How is ChatGPT’s interactive correction capability developed?
Interactive correction ability is an advanced manifestation of intelligence. Things that are commonplace to us are the pain points of machines. During the communication process, when a problem is pointed out, we will immediately realize the problem and correct the relevant information promptly and accurately. It is not easy for a machine to realize the problem, identify the scope of the problem and correct the corresponding information every step of the way. Before the emergence of ChatGPT, we had not seen a general model with strong interactive correction capabilities.
After interacting with ChatGPT, you will find that whether the user changes his previous statement or points out problems in ChatGPT's reply, ChatGPT can capture the modification intention and accurately identify it. The parts that need to be revised can finally be corrected.
So far, no model-related factors have been found to be directly related to the interactive correction ability, and we do not believe that ChatGPT has the ability to learn in real time. On the one hand, ChatGPT may still make mistakes after restarting the conversation. The same mistake, on the other hand, is that the optimization learning of the basic large model has always summarized frequent patterns from high-frequency data, and it is difficult to update the basic model in one conversation anyway.
I believe it is more of a historical information processing technique of the basic language model. Uncertain factors may include:
- OpenAI's artificially constructed dialogue data contains some interactive correction cases, and it has such capabilities after fine-tuning;
- The reinforcement learning of artificial feedback makes the model The output is more in line with human preferences, so that in conversations such as information correction, it is more consistent with human correction intentions;
- It is possible that after the large model reaches a certain scale (e.g. 60B), the original training data The interactive correction cases in the model were learned, and the ability of model interactive correction emerged naturally.
5. How is ChatGPT’s logical reasoning ability learned?
When we ask ChatGPT some questions related to logical reasoning, it does not give answers directly, but shows detailed logical reasoning steps and finally gives the reasoning results. Although many cases such as chickens and rabbits in the same cage show that ChatGPT has not learned the essence of reasoning, but only learned the superficial logic of reasoning, the reasoning steps and framework displayed are basically correct.
The ability of a language model to learn basic logical reasoning patterns has greatly exceeded expectations. Tracing the origin of its reasoning capabilities is a very interesting issue. Relevant comparative studies have found that when the model is large enough and the program code and text data are mixed for training, the complete logical chain of the program code will be migrated and generalized to the large language model, so that the large model has certain reasoning capabilities.
The acquisition of this kind of reasoning ability is a bit magical, but it is also understandable. Maybe code comments are a bridge for the transfer and generalization of reasoning ability from logical code to language large model. Multilingual capabilities should be similar. Most of ChatGPT's training data is in English, and Chinese data accounts for very little. However, we found that although ChatGPT's Chinese capabilities are not as good as English, they are still very powerful. Some Chinese-English parallel data in the training data may be a bridge for transferring English abilities to Chinese abilities.
6. Does ChatGPT use different decoding strategies for different downstream tasks?
ChatGPT has many amazing performances, one of which is that it can generate multiple different responses to the same question, which looks very smart.
For example, if we are not satisfied with ChatGPT’s answer, we can click the “Regenerate” button and it will immediately generate another reply. If we are still not satisfied, we can continue to let it regenerate. This is no mystery in the field of NLP. For language models, it is a basic capability, which is sampling decoding.
A text fragment may be followed by different words. The language model will calculate the probability of each word appearing. If the decoding strategy selects the word with the highest probability for output, then the result every time is Determined, it is impossible to generate diversity responses. If sampling is carried out according to the probability distribution of vocabulary output, for example, the probability of "strategy" is 0.5 and the probability of "algorithm" is 0.3, then the probability of sampling decoding output "strategy" is 50%, and the probability of output "algorithm" is 30%, thus ensuring the diversity of output. Because the sampling process is carried out according to probability distribution, even if the output results are diverse, the result with a higher probability is selected every time, so the various results look relatively reasonable. When comparing different types of tasks, we will find that the reply diversity of ChatGPT varies greatly for different downstream tasks.
When it comes to "How", "Why" tasks such as "How" and "Why", the regenerated reply is significantly different from the previous reply in terms of expression and specific content. Differences. For "What" tasks such as machine translation and mathematical word problems, the differences between different responses are very subtle, and sometimes there is almost no change. If they are all based on sampling decoding of probability distributions, why are the differences between different responses so small?
Guess an ideal situation may be that the probability distribution learned by the large model based on the "What" type task is very sharp (Sharp), for example, the learned "strategy" probability is 0.8, " The probability of "Algorithm" is 0.1, so most of the time the same result is sampled, that is, 80% of the possibility of sampling "Strategy" in the previous example; the probability distribution learned by the large model based on the "How" and "Why" type tasks Relatively smooth (Smooth), for example, the probability of "strategy" is 0.4 and the probability of "algorithm" is 0.3, so different results can be sampled at different times.
If ChatGPT can learn a very ideal probability distribution related to the task, it will be really powerful. The sampling-based decoding strategy can be applied to all tasks. Usually, for tasks such as machine translation, mathematical calculations, factual question and answer, etc., where the answers are relatively certain or 100% certain, greedy decoding is generally used, that is, the word with the highest probability is output each time. If you want to output diverse outputs with the same semantics, column search-based decoding methods are mostly used, but sampling-based decoding strategies are rarely used.
From the interaction with ChatGPT, it seems to use a sampling-based decoding method for all tasks, which is really violent aesthetics.
7. Can ChatGPT solve the problem of factual reliability?
The lack of reliability of answers is currently the biggest challenge facing ChatGPT. Especially for questions and answers related to facts and knowledge, ChatGPT sometimes makes up nonsense and generates false information. Even when asked to give sources and references or references, ChatGPT will often generate a non-existent URL or a document that has never been published.
However, ChatGPT usually gives users a better feeling, that is, it seems to know many facts and knowledge. In fact, ChatGPT is a large language model. The essence of a large language model is a deep neural network. The essence of a deep neural network is a statistical model, which is to learn relevant patterns from high-frequency data. Many common knowledge or facts appear frequently in the training data. The patterns between contexts are relatively fixed. The predicted probability distribution of words is relatively sharp and the entropy is relatively small. Large models are easy to remember and output correct words during the decoding process. Fact or knowledge.
However, there are many events and knowledge that rarely appear even in very large training data, and large models cannot learn relevant patterns. The patterns between contexts are relatively loose, and words The predicted probability distribution is relatively smooth and the entropy is relatively large. Large models are prone to produce uncertain random outputs during the inference process.
This is an inherent problem with all generative models, including ChatGPT. If the GPT series architecture is still continued and the basic model is not changed, it is theoretically difficult to solve the factual reliability problem of ChatGPT replies. The combination with search engines is currently a very pragmatic solution. Search engines are responsible for searching for reliable sources of factual information, and ChatGPT is responsible for summarizing and summarizing.
If you want ChatGPT to solve the problem of reliability of factual answers, you may need to further improve the model's rejection ability, that is, filter out those questions that the model is determined to be unable to answer, and you also need fact verification. module to verify the correctness of ChatGPT replies. It is hoped that the next generation of GPT can make a breakthrough on this issue.
8. Can ChatGPT realize learning of real-time information?
ChatGPT’s interactive correction capability makes it seem to have real-time autonomous learning capabilities.
As discussed above, ChatGPT can immediately modify relevant replies based on the modification intention or correction information provided by the user, demonstrating the ability of real-time learning. In fact, this is not the case. The learning ability reflects that the knowledge learned is universal and can be used at other times and other occasions. However, ChatGPT does not demonstrate this ability. ChatGPT can only make corrections based on user feedback in the current conversation. When we restart a conversation and test the same problem, ChatGPT will still make the same or similar mistakes.
One question is why ChatGPT does not store the modified and correct information in the model? There are two aspects to the problem here. First of all, the information fed back by users is not necessarily correct. Sometimes ChatGPT is deliberately guided to make unreasonable answers. This is just because ChatGPT has deepened its dependence on users in reinforcement learning based on human feedback, so ChatGPT is in the same conversation. We will rely heavily on user feedback during the process. Secondly, even if the information fed back by users is correct, because the frequency of occurrence may not be high, the basic large model cannot update parameters based on low-frequency data. Otherwise, the large model will overfit some long-tail data and lose its versatility.
Therefore, it is very difficult for ChatGPT to learn in real time. A simple and intuitive solution is to use new data to fine-tune ChatGPT every time a period of time passes. Or use a trigger mechanism to trigger parameter updates of the model when multiple users submit the same or similar feedback, thereby enhancing the dynamic learning ability of the model.
The author of this article, Zhang Jiajun, is a researcher at the Institute of Automation, Chinese Academy of Sciences. Original link:
https://zhuanlan .zhihu.com/p/606478660
The above is the detailed content of Conjectures about eight technical issues of ChatGPT. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

Handling high DPI display in C can be achieved through the following steps: 1) Understand DPI and scaling, use the operating system API to obtain DPI information and adjust the graphics output; 2) Handle cross-platform compatibility, use cross-platform graphics libraries such as SDL or Qt; 3) Perform performance optimization, improve performance through cache, hardware acceleration, and dynamic adjustment of the details level; 4) Solve common problems, such as blurred text and interface elements are too small, and solve by correctly applying DPI scaling.

DMA in C refers to DirectMemoryAccess, a direct memory access technology, allowing hardware devices to directly transmit data to memory without CPU intervention. 1) DMA operation is highly dependent on hardware devices and drivers, and the implementation method varies from system to system. 2) Direct access to memory may bring security risks, and the correctness and security of the code must be ensured. 3) DMA can improve performance, but improper use may lead to degradation of system performance. Through practice and learning, we can master the skills of using DMA and maximize its effectiveness in scenarios such as high-speed data transmission and real-time signal processing.

C performs well in real-time operating system (RTOS) programming, providing efficient execution efficiency and precise time management. 1) C Meet the needs of RTOS through direct operation of hardware resources and efficient memory management. 2) Using object-oriented features, C can design a flexible task scheduling system. 3) C supports efficient interrupt processing, but dynamic memory allocation and exception processing must be avoided to ensure real-time. 4) Template programming and inline functions help in performance optimization. 5) In practical applications, C can be used to implement an efficient logging system.

In MySQL, add fields using ALTERTABLEtable_nameADDCOLUMNnew_columnVARCHAR(255)AFTERexisting_column, delete fields using ALTERTABLEtable_nameDROPCOLUMNcolumn_to_drop. When adding fields, you need to specify a location to optimize query performance and data structure; before deleting fields, you need to confirm that the operation is irreversible; modifying table structure using online DDL, backup data, test environment, and low-load time periods is performance optimization and best practice.

Measuring thread performance in C can use the timing tools, performance analysis tools, and custom timers in the standard library. 1. Use the library to measure execution time. 2. Use gprof for performance analysis. The steps include adding the -pg option during compilation, running the program to generate a gmon.out file, and generating a performance report. 3. Use Valgrind's Callgrind module to perform more detailed analysis. The steps include running the program to generate the callgrind.out file and viewing the results using kcachegrind. 4. Custom timers can flexibly measure the execution time of a specific code segment. These methods help to fully understand thread performance and optimize code.

The built-in quantization tools on the exchange include: 1. Binance: Provides Binance Futures quantitative module, low handling fees, and supports AI-assisted transactions. 2. OKX (Ouyi): Supports multi-account management and intelligent order routing, and provides institutional-level risk control. The independent quantitative strategy platforms include: 3. 3Commas: drag-and-drop strategy generator, suitable for multi-platform hedging arbitrage. 4. Quadency: Professional-level algorithm strategy library, supporting customized risk thresholds. 5. Pionex: Built-in 16 preset strategy, low transaction fee. Vertical domain tools include: 6. Cryptohopper: cloud-based quantitative platform, supporting 150 technical indicators. 7. Bitsgap:

The top 10 digital virtual currency trading platforms are: 1. Binance, 2. OKX, 3. Coinbase, 4. Kraken, 5. Huobi Global, 6. Bitfinex, 7. KuCoin, 8. Gemini, 9. Bitstamp, 10. Bittrex. These platforms all provide high security and a variety of trading options, suitable for different user needs.
