Home Backend Development PHP Tutorial How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Jul 27, 2020 pm 05:24 PM
mysql

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

##Interview Questions & Real Experience

Interview question: How to achieve deep paging when the amount of data is large?

You may encounter the above questions during interviews or when preparing for interviews. Most of the answers are basically to divide databases and tables to build indexes. This is a very standard correct answer, but Reality is always very hard, so the interviewer will usually ask you, now that the construction period is insufficient and the personnel are insufficient, how can we achieve deep paging?

At this time, students who have no practical experience are basically numb. So, please listen to me.

Painful Lessons

First of all, it must be clear: depth paging can be done, but depth is random Page jumps absolutely need to be banned.

Previous picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Guess, if I click on page 142360, will the service explode?

Like MySQL, MongoDB database is okay. It is a professional database in itself. The processing is not good, and at most it is slow. But if it involves ES, the nature is different. We have to use SearchAfter Api to loop Obtaining data involves the issue of memory usage. If the code is not written elegantly, it may directly lead to memory overflow.

Why random depth page jumps cannot be allowed

Let’s briefly talk about why random depth page jumps cannot be allowed from a technical point of view, or that Why is deep paging not recommended?

MySQL

The basic principle of paging:

SELECT * FROM test ORDER BY id DESC LIMIT 10000, 20;
Copy after login

LIMIT 10000, 20 means scanning 10020 rows that meet the conditions and throwing them away Drop the first 10,000 lines and return the last 20 lines. If it is LIMIT 1000000, 100, 1000100 rows need to be scanned. In a highly concurrent application, each query needs to scan more than 100W rows. It would be strange if it does not explode.

MongoDB

The basic principle of paging:

db.t_data.find().limit(5).skip(5);
Copy after login

Similarly, as the page number increases, the items skipped by skip will also increase. becomes larger, and this operation is implemented through the iterator of the cursor. The consumption of the CPU will be very obvious. When the page number is very large and frequent, it will inevitably explode.

ElasticSearch

From a business perspective, ElasticSearch is not a typical database. It is a search engine. If the desired data is not found under the filter conditions , we will not find the data we want if we continue deep paging. To take a step back, if we use ES as a database for query, we will definitely encounter the limit of max_result_window when paging. Did you see it? Officials tell you the maximum The offset limit is ten thousand.

Query process:

  • If you query page 501, with 10 items per page, the client sends a request to a certain node

  • This node broadcasts data to each shard, and each shard queries the first 5010 pieces of data.

  • The query results are returned to the node, and then the data is integrated and the first 5010 pieces of data are retrieved.

  • Return to the client

From this we can see why it is necessary to limit the offset. In addition, if you use a scrolling method such as Search After API's deep page jump query also requires scrolling thousands of items each time. It may be necessary to scroll millions or tens of millions of pieces of data in total, just for the last 20 pieces of data. The efficiency can be imagined.

Align with the product again

As the saying goes, problems that cannot be solved by technology should be solved by business!

During my internship, I believed in the evil of the product, and it was necessary to implement deep paging and page jumps. Now we must correct the chaos, and the following changes must be made in the business:

Add default filtering conditions as much as possible, such as : Time period, the purpose is to reduce the amount of data displayed

Modify the display method of page jumps, change it to scrolling display, or jump pages in a small range

Scrolling display reference picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Small-scale page jump reference picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

##General solutionThe quick solution in a short period of time mainly includes the following points:

    Required: For sorting fields and filter conditions, the index must be set
  • Core: Use known data of small range page numbers, or known data of rolling loading, to reduce the offset
  • Extra: If you encounter a situation that is difficult to handle, You can also obtain excess data and intercept it to a certain extent, and the performance impact will not be significant
MySQL

Original paging SQL:

# 第一页
SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit 0, 20;
# 第N页
SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit (N - 1) * 20, 20;
Copy after login

Through context, rewritten as:

# XXXX 代表已知的数据
SELECT * FROM `year_score` where `year` = 2017 and id > XXXX ORDER BY id limit 20;
Copy after login

在 没内鬼,来点干货!SQL优化和诊断 一文中提到过,LIMIT会在满足条件下停止查询,因此该方案的扫描总量会急剧减少,效率提升Max!

ES

方案和MySQL相同,此时我们就可以随用所欲的使用 FROM-TO Api,而且不用考虑最大限制的问题。

MongoDB

方案基本类似,基本代码如下:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

相关性能测试:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

如果非要深度随机跳页

如果你没有杠过产品经理,又该怎么办呢,没关系,还有一丝丝的机会。

在 SQL优化 一文中还提到过MySQL深度分页的处理技巧,代码如下:

# 反例(耗时129.570s)
select * from task_result LIMIT 20000000, 10;
# 正例(耗时5.114s)
SELECT a.* FROM task_result a, (select id from task_result LIMIT 20000000, 10) b where a.id = b.id;
# 说明
# task_result表为生产环境的一个表,总数据量为3400万,id为主键,偏移量达到2000万
Copy after login

该方案的核心逻辑即基于聚簇索引,在不通过回表的情况下,快速拿到指定偏移量数据的主键ID,然后利用聚簇索引进行回表查询,此时总量仅为10条,效率很高。

因此我们在处理MySQL,ES,MongoDB时,也可以采用一样的办法:

  • 限制获取的字段,只通过筛选条件,深度分页获取主键ID

  • 通过主键ID定向查询需要的数据

瑕疵:当偏移量非常大时,耗时较长,如文中的 5s

推荐教程:《MySQL教程

文章来源:https://juejin.im/post/5f0de4d06fb9a07e8a19a641

The above is the detailed content of How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1662
14
PHP Tutorial
1261
29
C# Tutorial
1234
24
MySQL's Role: Databases in Web Applications MySQL's Role: Databases in Web Applications Apr 17, 2025 am 12:23 AM

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

Laravel Introduction Example Laravel Introduction Example Apr 18, 2025 pm 12:45 PM

Laravel is a PHP framework for easy building of web applications. It provides a range of powerful features including: Installation: Install the Laravel CLI globally with Composer and create applications in the project directory. Routing: Define the relationship between the URL and the handler in routes/web.php. View: Create a view in resources/views to render the application's interface. Database Integration: Provides out-of-the-box integration with databases such as MySQL and uses migration to create and modify tables. Model and Controller: The model represents the database entity and the controller processes HTTP requests.

How to start mysql by docker How to start mysql by docker Apr 15, 2025 pm 12:09 PM

The process of starting MySQL in Docker consists of the following steps: Pull the MySQL image to create and start the container, set the root user password, and map the port verification connection Create the database and the user grants all permissions to the database

MySQL and phpMyAdmin: Core Features and Functions MySQL and phpMyAdmin: Core Features and Functions Apr 22, 2025 am 12:12 AM

MySQL and phpMyAdmin are powerful database management tools. 1) MySQL is used to create databases and tables, and to execute DML and SQL queries. 2) phpMyAdmin provides an intuitive interface for database management, table structure management, data operations and user permission management.

Solve database connection problem: a practical case of using minii/db library Solve database connection problem: a practical case of using minii/db library Apr 18, 2025 am 07:09 AM

I encountered a tricky problem when developing a small application: the need to quickly integrate a lightweight database operation library. After trying multiple libraries, I found that they either have too much functionality or are not very compatible. Eventually, I found minii/db, a simplified version based on Yii2 that solved my problem perfectly.

MySQL vs. Other Programming Languages: A Comparison MySQL vs. Other Programming Languages: A Comparison Apr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages ​​such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages ​​have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Laravel framework installation method Laravel framework installation method Apr 18, 2025 pm 12:54 PM

Article summary: This article provides detailed step-by-step instructions to guide readers on how to easily install the Laravel framework. Laravel is a powerful PHP framework that speeds up the development process of web applications. This tutorial covers the installation process from system requirements to configuring databases and setting up routing. By following these steps, readers can quickly and efficiently lay a solid foundation for their Laravel project.

MySQL for Beginners: Getting Started with Database Management MySQL for Beginners: Getting Started with Database Management Apr 18, 2025 am 12:10 AM

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

See all articles