


AI can prove 82% of the problems in mathematical databases. The new SOTA has been achieved, and it is still based on Transformer.
It has to be said that scientists have been obsessed with giving AI math lessons recently.
No, the Facebook team also joined in the fun and proposed a new model that can completely automate the demonstration of theorems and is significantly better than SOTA.
You must know that as mathematical theorems become more complex, it will only become more difficult to prove the theorems solely by human power.
Therefore, using computers to demonstrate mathematical theorems has become a research focus.
OpenAI has previously proposed a model GPT-f that specializes in this direction, which can demonstrate 56% of the problems in Metamath.
The latest method proposed this time can increase this number to 82.6%.
At the same time, researchers say that this method takes less time and can reduce computing consumption to one-tenth of the original compared to GPT-f.
Could it be said that this time AI will succeed in its battle with mathematics?
Or Transformer
The method proposed in this article is an online training program based on Transformer.
can be roughly divided into three steps:
First, pre-training in the mathematical proof library;
Second , Fine-tune the policy model on the supervised data set;
Third, Online training of the policy model and judgment model.
Specifically, it uses a search algorithm to let the model learn from the existing mathematical proof library, and then promotes and proves more problems.
The mathematical proof library includes three types, namely Metamath, Lean and a self-developed proof environment.
To put it simply, these proof libraries convert ordinary mathematical language into a form similar to a programming language.
Metamath’s main library is set.mm, which contains about 38,000 proofs based on ZFC set theory.
Lean is better known as Microsoft’s AI algorithm that can participate in IMO competitions. The Lean library is designed to teach the algorithm of the same name all the undergraduate mathematics knowledge and let it learn to prove these theorems.
The main goal of this research is to build a prover that can automatically generate a series of suitable strategies to prove the problem.
To this end, the researchers proposed a non-equilibrium hypergraph proof search algorithm based on MCTS.
MCTS is translated as Monte Carlo Tree Search, which is often used to solve game tree problems. It is well-known because of AlphaGo.
Its operation process is to find promising actions by randomly sampling in the search space, and then expand the search tree based on this action.
The idea adopted in this study is similar to this.
The search proof process starts from goal g, searches downward for methods, and gradually develops into a hypergraph.
When an empty set appears under a branch, it means that an optimal proof has been found.
Finally, during the backpropagation process, record the node values and total number of operations of the supertree.
In this link, the researchers assumed a strategy model and a judgment model.
The strategy model allows sampling by judgment models, which can evaluate the current strategy's ability to find proof methods.
The entire search algorithm uses the above two models as a reference.
These two models are Transformer models and share weights.
Next, comes the online training stage.
In this process, the controller will send the statement to asynchronous HTPS verification and collect training and proof data.
The validator will then send the training samples to the distributed trainer and periodically synchronize its model copies.
Experimental results
In the testing session, the researchers compared HTPS with GPT-f.
The latter is a mathematical theorem reasoning model previously proposed by OpenAI, also based on Transformer.
The results show that the model after online training can prove 82% of the problems in Metamath, far exceeding the previous record of 56.5% of GPT-f.
In the Lean library, this model can prove 43% of the theorems, which is 38% higher than SOTA. The following are the IMO test questions proved by this model.
#But it’s not perfect yet.
For example, in the following question, it did not solve the question in the simplest way. The researchers said this was because of errors in the annotations.
One More Thing
Using computers to demonstrate mathematical problems, the proof of the four-color theorem is one of the most well-known examples.
The four-color theorem is one of the three major problems in modern mathematics. It states that "any map can use only four colors to color countries with common borders in different colors."
Because the demonstration of this theorem requires a lot of calculations, no one could fully demonstrate it within 100 years after it was proposed.
Until 1976, after 1,200 hours and 10 billion judgments on two computers at the University of Illinois, it was finally possible to demonstrate that any map only needs 4 colors to mark it. It caused a sensation in the entire mathematical community.
In addition, as mathematical problems become more complex, it becomes more difficult to use human power to check whether the theorem is correct.
Recently, the AI community has gradually focused on mathematical problems.
In 2020, OpenAI launched the mathematical theorem reasoning model GPT-f, which can be used for automatic theorem proof.
This method can complete 56.5% of the proofs in the test set, exceeding the then SOTA model MetaGen-IL by more than 30%.
In the same year, Microsoft also released Lean, which can make IMO test questions, which means that AI can make questions that it has never seen before.
Last year, after OpenAI added a verifier to GPT-3, the effect of doing math problems was significantly better than the previous fine-tuning method, and it could reach 90% of the level of primary school students.
In January this year, a joint study from MIT, Harvard, Columbia University, and the University of Waterloo showed that the model they proposed can do high math.
In short, scientists are working hard to make AI, a partial subject, become both liberal arts and sciences.
The above is the detailed content of AI can prove 82% of the problems in mathematical databases. The new SOTA has been achieved, and it is still based on Transformer.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

Measuring thread performance in C can use the timing tools, performance analysis tools, and custom timers in the standard library. 1. Use the library to measure execution time. 2. Use gprof for performance analysis. The steps include adding the -pg option during compilation, running the program to generate a gmon.out file, and generating a performance report. 3. Use Valgrind's Callgrind module to perform more detailed analysis. The steps include running the program to generate the callgrind.out file and viewing the results using kcachegrind. 4. Custom timers can flexibly measure the execution time of a specific code segment. These methods help to fully understand thread performance and optimize code.

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.
