Table of Contents
Background
New generation data set
Conclusion
Home Technology peripherals AI When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

Jun 06, 2023 pm 05:14 PM
ai

Background

Large Model (LLM) provides a new direction for the development of general artificial intelligence (AGI). It uses massive public data, such as the Internet, books and other corpora to conduct large-scale self-learning. Through supervised training, powerful language understanding, language generation, reasoning and other abilities have been obtained. However, large models still face some challenges in utilizing private domain data. Private domain data refers to data owned by specific companies or individuals and usually contains domain-specific knowledge. Combining large models with private domain knowledge will Provide great value.

Private domain knowledge can be divided into unstructured and structured data in terms of data form. Unstructured data, such as documents, are usually enhanced through retrieval, and tools such as langchain can be used to quickly implement a question and answer system. Structured data, such as databases (DB), require large models to interact with the database, query and analyze to obtain useful information. A series of products and applications have recently been derived around large models and databases, such as using LLM to build intelligent databases, perform BI analysis, and complete automatic table construction. Among them, text-to-SQL technology, which interacts with the database in a natural language, has always been a highly anticipated direction.

In academia, past text-to-SQL benchmarks only focused on small-scale databases. The most advanced LLM can already achieve an execution accuracy of 85.3%, but does this mean that LLM Already available as a natural language interface to the database?

New generation data set

Recently, Alibaba, together with the University of Hong Kong and other institutions, launched a new benchmark BIRD (Can LLM Already Serve as A Database) for large-scale real databases Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs), including 95 large-scale databases and high-quality Text-SQL pairs, with a data storage capacity of up to 33.4 GB. The previous best model only achieved 40.08% evaluation on BIRD, which is still far from the human result of 92.96%, proving that challenges still exist. In addition to evaluating the correctness of SQL, the author also added an evaluation of SQL execution efficiency, hoping that the model can not only write correct SQL, but also write efficient SQL.

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

Paper: https://arxiv.org/abs/2305.03111

Homepage: https://bird-bench.github.io

##Code: https://github. com/AlibabaResearch/DAMO-ConvAI/tree/main/bird

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

##Currently, BIRD’s data and code , and the list have all been open sourced, and have been downloaded more than 10,000 times around the world. BIRD has aroused widespread attention and discussion on Twitter since its launch.

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

# Comments from overseas users are also very exciting:

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

An LLM project not to be missed

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

Very useful checkpoint, hotbed for improvement

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

AI can help you, but it cannot replace you

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

##My job is safe for now...

Method Overview

New Challenges

This research is mainly oriented to Text-to-SQL evaluation of real databases, popular test benchmarks in the past, Spider and WikiSQL, for example, only focus on database schema with a small amount of database content, resulting in a gap between academic research and practical applications. BIRD focuses on three new challenges: massive and real database content, external knowledge reasoning between natural language questions and database content, and the efficiency of SQL when processing large databases.

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

First of all, the database contains massive and noisy data values. In the example on the left, the average salary needs to be calculated by converting the string in the database into a floating point value (Float) and then performing the aggregation calculation (Aggregation);

Secondly, external knowledge inference is necessary. In the middle example, in order to accurately return answers to the user, the model must first know that the account type eligible for the loan must be "OWNER", which represents a huge The mysteries hidden behind database content sometimes require external knowledge and reasoning to reveal;

Finally, query execution efficiency needs to be considered. In the example on the right, using more efficient SQL queries can significantly improve speed, which is of great value to the industry because users not only expect to write correct SQL, but also expect efficient SQL execution, especially in large databases In the case of;

Data annotation

BIRD decouples question generation and SQL annotation during the annotation process . At the same time, experts are added to write database description files to help problem and SQL annotation personnel better understand the database.

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

1. Database collection: The author collected and processed 80 databases from open source data platforms such as Kaggle and CTU Prague Relational Learning Repository. Fifteen databases were manually created as black-box tests by collecting real table data, building ER diagrams, and setting database constraints to avoid the current database being learned by the current large model. BIRD's database contains patterns and values ​​in multiple fields, 37 fields, covering blockchain, sports, medical care, games, etc.

2. Problem collection: First, the author hires experts to write a description file for the database. The description file includes a complete description of the column name, database value, and external parameters used to understand the value. knowledge etc. Then 11 native speakers from the United States, United Kingdom, Canada, Singapore and other countries were recruited to generate questions for BIRD. Every speaker has at least a bachelor's degree or above.

3. SQL generation: A global annotation team composed of data engineers and database course students was recruited to generate SQL for BIRD. Given a database and a reference database description file, the annotator needs to generate SQL to correctly answer the question. The Double-Blind annotation method is adopted, requiring two annotators to annotate the same question. Double-blind annotation can minimize errors caused by a single annotator.

4. Quality inspection: Quality inspection is divided into two parts: effectiveness and consistency of result execution. Validity not only requires the correctness of execution, but also requires that the execution result cannot be null (NULL). Experts will gradually modify the problem conditions until the SQL execution results are valid.

5. Difficulty division: The difficulty index of text-to-SQL can provide researchers with a reference for optimizing algorithms. The difficulty of Text-to-SQL depends not only on the complexity of the SQL, but also on factors such as the difficulty of the question, the ease of understanding with additional knowledge, and the complexity of the database. The authors therefore asked SQL annotators to rate the difficulty during the annotation process and divided the difficulty into three categories: easy, moderate, and challenging.

Data statistics

1. Question type statistics: Questions are divided into two categories, basic question types (Fundamental Type) and Reasoning Type. Basic question types include those covered in traditional Text-to-SQL datasets, while inference question types include questions that require external knowledge to understand the value:

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

##2. Database distribution: The author uses a sunburst diagram to show the relationship between the database domain and its data size. A larger radius means more text-SQL is based on that database, and vice versa. The darker the color, the larger the database size. For example, donor is the largest database in the benchmark, occupying 4.5GB of space.

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

3.SQL distribution: the number of tokens passed by the author through SQL, the number of keywords, the number of n-gram types, JOIN The number and other 4 dimensions prove that BIRD's SQL is by far the most diverse and complex.

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

Evaluation Indicators

1. Execution Accuracy: Compare the difference between the SQL execution results predicted by the model and the actual annotated SQL execution results;

2. Effective efficiency score: Taking into account both the accuracy and efficiency of SQL, compare the model predictions The relative difference between the SQL execution speed and the real annotated SQL execution speed takes the running time as the main indicator of efficiency.

Experimental analysis

The author selected the training-type T5 model and the large-scale model that performed well in previous benchmark tests Language models (LLM) as baseline models: Codex (code-davinci-002) and ChatGPT (gpt-3.5-turbo). In order to better understand whether multi-step reasoning can stimulate the reasoning capabilities of large language models in real database environments, their Chain-of-Thought version is also provided. And test the baseline model in two settings: one is full schema information input, and the other is human understanding of the database values ​​involved in the problem, summarized into natural language description (knowledge evidence) to assist the model in understanding the database.

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

The author gives some conclusions:

1. Additional knowledge Gain: Increasing the knowledge evidence (knowledge evidence) on the understanding of database values ​​has a significant improvement. This proves that in real database scenarios, relying solely on semantic parsing capabilities is not enough. Understanding database values ​​will help users find more accurately Answer.

2. The thinking chain is not necessarily completely beneficial: when the model does not have a given database value description and zero-shot, the model's own COT inference can be generated more accurately Answer. However, when given additional knowledge (knowledge evidence), LLM was asked to perform COT and found that the effect was not significant or even declined. Therefore in this scenario, LLM may generate knowledge conflicts. How to resolve this conflict so that the model can both accept external knowledge and benefit from its own powerful multi-step reasoning will be a key research direction in the future.

3. The gap with humans: BIRD also provides human indicators. The author uses an exam to test the performance of the annotator when facing the test set for the first time, and uses it as the basis for human indicators. Experiments have found that the current best LLM is still far behind humans, proving that challenges still exist. The authors performed a detailed error analysis and provided some potential directions for future research.

When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​

Conclusion

The application of LLM in the database field will provide users with smarter and more convenient Database interactive experience. The emergence of BIRD will promote the intelligent development of interaction between natural language and real databases, provide room for progress in text-to-SQL technology for real database scenarios, and help researchers develop more advanced and practical database applications.

The above is the detailed content of When LLM meets Database: Alibaba DAMO Academy and HKU launch a new Text-to-SQL benchmark​. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

The top ten free platform recommendations for real-time data on currency circle markets are released The top ten free platform recommendations for real-time data on currency circle markets are released Apr 22, 2025 am 08:12 AM

Cryptocurrency data platforms suitable for beginners include CoinMarketCap and non-small trumpet. 1. CoinMarketCap provides global real-time price, market value, and trading volume rankings for novice and basic analysis needs. 2. The non-small quotation provides a Chinese-friendly interface, suitable for Chinese users to quickly screen low-risk potential projects.

okx online okx exchange official website online okx online okx exchange official website online Apr 22, 2025 am 06:45 AM

The detailed introduction of OKX Exchange is as follows: 1) Development history: Founded in 2017 and renamed OKX in 2022; 2) Headquartered in Seychelles; 3) Business scope covers a variety of trading products and supports more than 350 cryptocurrencies; 4) Users are spread across more than 200 countries, with tens of millions of users; 5) Multiple security measures are adopted to protect user assets; 6) Transaction fees are based on the market maker model, and the fee rate decreases with the increase in trading volume; 7) It has won many honors, such as "Cryptocurrency Exchange of the Year".

A list of special services for major virtual currency trading platforms A list of special services for major virtual currency trading platforms Apr 22, 2025 am 08:09 AM

Institutional investors should choose compliant platforms such as Coinbase Pro and Genesis Trading, focusing on cold storage ratios and audit transparency; retail investors should choose large platforms such as Binance and Huobi, focusing on user experience and security; users in compliance-sensitive areas can conduct fiat currency trading through Circle Trade and Huobi Global, and mainland Chinese users need to go through compliant over-the-counter channels.

Top 10 latest releases of virtual currency trading platforms for bulk transactions Top 10 latest releases of virtual currency trading platforms for bulk transactions Apr 22, 2025 am 08:18 AM

The following factors should be considered when choosing a bulk trading platform: 1. Liquidity: Priority is given to platforms with an average daily trading volume of more than US$5 billion. 2. Compliance: Check whether the platform holds licenses such as FinCEN in the United States, MiCA in the European Union. 3. Security: Cold wallet storage ratio and insurance mechanism are key indicators. 4. Service capability: Whether to provide exclusive account managers and customized transaction tools.

A list of top ten virtual currency trading platforms that support multiple currencies A list of top ten virtual currency trading platforms that support multiple currencies Apr 22, 2025 am 08:15 AM

Priority is given to compliant platforms such as OKX and Coinbase, enabling multi-factor verification, and asset self-custody can reduce dependencies: 1. Select an exchange with a regulated license; 2. Turn on the whitelist of 2FA and withdrawals; 3. Use a hardware wallet or a platform that supports self-custody.

Recommended top 10 for easy access to digital currency trading apps (latest ranking in 25) Recommended top 10 for easy access to digital currency trading apps (latest ranking in 25) Apr 22, 2025 am 07:45 AM

The core advantage of gate.io (global version) is that the interface is minimalist, supports Chinese, and the fiat currency trading process is intuitive; Binance (simplified version) has the highest global trading volume, and the simple version model only retains spot trading; OKX (Hong Kong version) has the simple version of the interface is simple, supports Cantonese/Mandarin, and has a low threshold for derivative trading; Huobi Global Station (Hong Kong version) has the core advantage of being an old exchange, launches a meta-universe trading terminal; KuCoin (Chinese Community Edition) has the core advantage of supporting 800 currencies, and the interface adopts WeChat interaction; Kraken (Hong Kong version) has the core advantage of being an old American exchange, holding a Hong Kong SVF license, and the interface is simple; HashKey Exchange (Hong Kong licensed) has the core advantage of being a well-known licensed exchange in Hong Kong, supporting France

Tips and recommendations for the top ten market websites in the currency circle 2025 Tips and recommendations for the top ten market websites in the currency circle 2025 Apr 22, 2025 am 08:03 AM

Domestic user adaptation solutions include compliance channels and localization tools. 1. Compliance channels: Franchise currency exchange through OTC platforms such as Circle Trade, domestically, they need to go through Hong Kong or overseas platforms. 2. Localization tools: Use the currency circle network to obtain Chinese information, and Huobi Global Station provides a meta-universe trading terminal.

Summary of the top ten Apple version download portals for digital currency exchange apps Summary of the top ten Apple version download portals for digital currency exchange apps Apr 22, 2025 am 09:27 AM

Provides a variety of complex trading tools and market analysis. It covers more than 100 countries, has an average daily derivative trading volume of over US$30 billion, supports more than 300 trading pairs and 200 times leverage, has strong technical strength, a huge global user base, provides professional trading platforms, secure storage solutions and rich trading pairs.

See all articles