Table of Contents
Milestones in Higher Education
Workflow
Home Technology peripherals AI MIT releases enhanced version of 'Advanced Mathematics' solver: accuracy rate reaches 81% in 7 courses

MIT releases enhanced version of 'Advanced Mathematics' solver: accuracy rate reaches 81% in 7 courses

Apr 12, 2023 pm 04:04 PM
openai mit Pre-trained model

Not only solves elementary school math word problems, AI has also begun to conquer advanced math!

Recently, MIT researchers announced that based on the OpenAI Codex pre-training model, they successfully achieved an 81% accuracy rate on undergraduate-level mathematics problems through few-shot learning!

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

  • Paper link: https://arxiv.org/abs/2112.15594
  • Code link: https://github.com/idrori /mathq

Let’s take a look at some small questions first to see the answers, such as calculating the volume generated by rotating the graph of a single variable function around the axis, calculating the Lorenz attractor and projection, calculating and depicting singular value decomposition (SVD) geometric shape, not only can the answer be correct, but also the corresponding explanation can be given!

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

It’s really unbelievable. Looking back on the past, passing high numbers was always passed by. Now AI can score 81 points in one shot. I unilaterally declare that AI has surpassed Human beings.

What’s even more awesome is that in addition to solving problems that are difficult to solve with ordinary machine learning models, this research also shows that this technology can be promoted on a large scale and can solve problems in its courses and similar courses.

This is also the first time in history that a single machine learning model can solve such a large-scale mathematical problem, and can also explain, draw and even generate new questions!

In fact, this paper was released as early as the beginning of the year. After half a year of revision, the length has been increased from 114 pages to 181 pages. More mathematical problems can be solved. The appendices are numbered directly from A-Z. Laman.

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

There are four main author units of the article, namely MIT, Columbia University, Harvard University and University of Waterloo.

The first author, Iddo Drori, is a lecturer in the AI ​​Department of the Department of Electrical Engineering and Computer Science at MIT and an adjunct associate professor at Columbia University's School of Engineering and Applied Sciences. Won the CCAI NeurIPS 2021 Best Paper Award.

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

His main research directions are machine learning for education, which is trying to get machines to solve, explain and generate college-level mathematics and STEM courses; machine learning for climate science, which is based on data Thousands of years of data predicting extreme climate change and monitoring climate, integrating multidisciplinary work to predict changes in ocean biogeochemistry in the Atlantic Ocean over the years; machine learning algorithms for autonomous driving, and more.

He is also the author of The Science of Deep Learning published by Cambridge University Press.

Milestones in Higher Education

Before this paper, most researchers believed that neural networks could not handle high-number problems and could only solve some simple mathematical problems.

Even if the Transformer model surpasses human performance in various NLP tasks, it is still not good at solving mathematical problems. The main reason is because various large models such as GPT-3 only work on text data. Perform pre-training on.

Later, some researchers discovered that the language model can still be guided to reason and answer some simple mathematical questions through step-by-step analysis (chain of thoughts), but advanced mathematics problems are not so easy to solve.

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

#When the target is a high-number problem, you must first collect a wave of training data.

The author randomly selected 25 problems from each of seven courses at MIT, including:

  • 18.01 Single-variable Calculus
  • 18.02 Multi-variable Calculus Integral
  • 18.03 Differential Equations
  • 18.05 Introduction to Probability and Statistics
  • 18.06 Linear Algebra
  • 6.042 Computer Science Mathematics
  • Columbia University COMS3251 Computational Linear Algebra

For the MATH dataset, the researchers randomly selected 15 questions from the six topics of the dataset (Algebra, Counting and Probability, Intermediate Algebra, Number Theory, Pre-Algebra, and Pre-Algebra) .

In order to verify that the results generated by the model are not overfitting to the training data, the researchers chose the COMS3251 course that has not been published on the Internet to verify the generated results.

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

Workflow

The model takes a course question as input, then performs automatic augmentation with context on it, results in a synthesized program, and finally outputs the answer and generated explanation.

For different questions, the output results may be different. For example, the answer to 18.01 is an equation, the answer to 18.02 is a Boolean value, the answers to 18.03 and 18.06 are a graph or vector, and the answer to 18.05 is a numerical value.

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

#When you get a question, the first step is to let the model find the relevant context of the question. The researchers mainly focused on the Python program generated by Codex, so they added the text "write a program" before the question and placed the text within three quotation marks of the Python program, pretending to be a docstring in the program.

After generating the program, a Codex prompt is needed to specify which libraries to import. The author chose to add the "use sympy" string before the question as context, specifying that the program synthesized to solve the problem should use this package.

By counting the Python programming packages used by each course, you can see that all courses use NumPy and Sympy. Matplotlib is only used in courses with problems that require plotting. About half of the courses use math, random, and SciPy. During actual operation, the researchers only specified SymPy or drawing-related packages to import, and other imported packages were automatically synthesized.

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

In the Zero-shot learning method, 71% of the problems can be automatically solved by only using automatic enhancement on the original problem.

If a problem is not solved, researchers try to use few-shot learning to solve such problems.

First use OpenAI's text-similarity-babbag-001 embedding engine to obtain the 2048-dimensional embedding of all problems, and then use cosine similarity calculations for all vectors to find the unsolved problems that are most similar to the solved problems question. Finally, the most similar problem and its corresponding code are used as few-shot examples of the new problem.

If the generated code does not output the correct answer, add another solved question-code pair, each time using the next similar solved question.

In practice, it can be found that using up to 5 examples for few-shot learning has the best effect. The total number of problems that can be automatically solved increases from 71% of zero-shot learning to 81% of few-shot learning. .

To solve the remaining 19% of the problems, human editors are required to intervene.

The researchers first collected all the questions and found that most of them were vague (vague) or contained redundant information, such as references to movie characters or current events, etc. The questions needed to be sorted out to extract the essence of the questions.

Question sorting mainly involves removing redundant information, breaking down long sentence structures into smaller components, and converting prompts into programming format.

Another situation that requires manual intervention is that the answer to a question requires multiple steps of drawing to explain, that is, the Codex needs to be interactively prompted until the desired visualization effect is achieved.

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

In addition to generating answers, the model should also be able to explain the reasons for the answers. The researchers guide this through the prompt words "Here is what the above code is doing: 1." The model generates results that are explained step by step.

After being able to answer the questions, the next step is to use Codex to generate new questions for each course.

The researchers created a numbered list of questions written by students in each class. This list was cut off after a random number of questions, and the results were used to prompt Codex to generate the next question.

This process is repeated until enough new questions have been created for each course.

To evaluate the generated questions, the researchers surveyed MIT students who had taken these courses or their equivalents to compare the quality and difficulty of the machine-generated questions to the original courses.

MIT releases enhanced version of Advanced Mathematics solver: accuracy rate reaches 81% in 7 courses

From the results of the student survey we can see:

  • The quality of machine scoring is already comparable to that of human questions;
  • In terms of difficulty, human questions are more suitable as course questions, while the results generated by machines are slightly more difficult. Some;
  • More than half of the course questions can be seen by students as being generated by models, and the closest to humans is the 18.01 course

Reference Information:

https://www.reddit.com/r/artificial/comments/v8liqh/researchers_built_a_neural_network_that_not_only/​

The above is the detailed content of MIT releases enhanced version of 'Advanced Mathematics' solver: accuracy rate reaches 81% in 7 courses. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

A new programming paradigm, when Spring Boot meets OpenAI A new programming paradigm, when Spring Boot meets OpenAI Feb 01, 2024 pm 09:18 PM

In 2023, AI technology has become a hot topic and has a huge impact on various industries, especially in the programming field. People are increasingly aware of the importance of AI technology, and the Spring community is no exception. With the continuous advancement of GenAI (General Artificial Intelligence) technology, it has become crucial and urgent to simplify the creation of applications with AI functions. Against this background, "SpringAI" emerged, aiming to simplify the process of developing AI functional applications, making it simple and intuitive and avoiding unnecessary complexity. Through "SpringAI", developers can more easily build applications with AI functions, making them easier to use and operate.

Choosing the embedding model that best fits your data: A comparison test of OpenAI and open source multi-language embeddings Choosing the embedding model that best fits your data: A comparison test of OpenAI and open source multi-language embeddings Feb 26, 2024 pm 06:10 PM

OpenAI recently announced the launch of their latest generation embedding model embeddingv3, which they claim is the most performant embedding model with higher multi-language performance. This batch of models is divided into two types: the smaller text-embeddings-3-small and the more powerful and larger text-embeddings-3-large. Little information is disclosed about how these models are designed and trained, and the models are only accessible through paid APIs. So there have been many open source embedding models. But how do these open source models compare with the OpenAI closed source model? This article will empirically compare the performance of these new models with open source models. We plan to create a data

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI ​​model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

Rust-based Zed editor has been open sourced, with built-in support for OpenAI and GitHub Copilot Rust-based Zed editor has been open sourced, with built-in support for OpenAI and GitHub Copilot Feb 01, 2024 pm 02:51 PM

Author丨Compiled by TimAnderson丨Produced by Noah|51CTO Technology Stack (WeChat ID: blog51cto) The Zed editor project is still in the pre-release stage and has been open sourced under AGPL, GPL and Apache licenses. The editor features high performance and multiple AI-assisted options, but is currently only available on the Mac platform. Nathan Sobo explained in a post that in the Zed project's code base on GitHub, the editor part is licensed under the GPL, the server-side components are licensed under the AGPL, and the GPUI (GPU Accelerated User) The interface) part adopts the Apache2.0 license. GPUI is a product developed by the Zed team

Don't wait for OpenAI, wait for Open-Sora to be fully open source Don't wait for OpenAI, wait for Open-Sora to be fully open source Mar 18, 2024 pm 08:40 PM

Not long ago, OpenAISora quickly became popular with its amazing video generation effects. It stood out among the crowd of literary video models and became the focus of global attention. Following the launch of the Sora training inference reproduction process with a 46% cost reduction 2 weeks ago, the Colossal-AI team has fully open sourced the world's first Sora-like architecture video generation model "Open-Sora1.0", covering the entire training process, including data processing, all training details and model weights, and join hands with global AI enthusiasts to promote a new era of video creation. For a sneak peek, let’s take a look at a video of a bustling city generated by the “Open-Sora1.0” model released by the Colossal-AI team. Open-Sora1.0

The local running performance of the Embedding service exceeds that of OpenAI Text-Embedding-Ada-002, which is so convenient! The local running performance of the Embedding service exceeds that of OpenAI Text-Embedding-Ada-002, which is so convenient! Apr 15, 2024 am 09:01 AM

Ollama is a super practical tool that allows you to easily run open source models such as Llama2, Mistral, and Gemma locally. In this article, I will introduce how to use Ollama to vectorize text. If you have not installed Ollama locally, you can read this article. In this article we will use the nomic-embed-text[2] model. It is a text encoder that outperforms OpenAI text-embedding-ada-002 and text-embedding-3-small on short context and long context tasks. Start the nomic-embed-text service when you have successfully installed o

Microsoft, OpenAI plan to invest $100 million in humanoid robots! Netizens are calling Musk Microsoft, OpenAI plan to invest $100 million in humanoid robots! Netizens are calling Musk Feb 01, 2024 am 11:18 AM

Microsoft and OpenAI were revealed to be investing large sums of money into a humanoid robot startup at the beginning of the year. Among them, Microsoft plans to invest US$95 million, and OpenAI will invest US$5 million. According to Bloomberg, the company is expected to raise a total of US$500 million in this round, and its pre-money valuation may reach US$1.9 billion. What attracts them? Let’s take a look at this company’s robotics achievements first. This robot is all silver and black, and its appearance resembles the image of a robot in a Hollywood science fiction blockbuster: Now, he is putting a coffee capsule into the coffee machine: If it is not placed correctly, it will adjust itself without any human remote control: However, After a while, a cup of coffee can be taken away and enjoyed: Do you have any family members who have recognized it? Yes, this robot was created some time ago.

Sudden! OpenAI fires Ilya ally for suspected information leakage Sudden! OpenAI fires Ilya ally for suspected information leakage Apr 15, 2024 am 09:01 AM

Sudden! OpenAI fired people, the reason: suspected information leakage. One is Leopold Aschenbrenner, an ally of the missing chief scientist Ilya and a core member of the Superalignment team. The other person is not simple either. He is Pavel Izmailov, a researcher on the LLM inference team, who also worked on the super alignment team. It's unclear exactly what information the two men leaked. After the news was exposed, many netizens expressed "quite shocked": I saw Aschenbrenner's post not long ago and felt that he was on the rise in his career. I didn't expect such a change. Some netizens in the picture think: OpenAI lost Aschenbrenner, I

See all articles