


How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?
##Interview Questions & Real Experience
Interview question: How to achieve deep paging when the amount of data is large?You may encounter the above questions during interviews or when preparing for interviews. Most of the answers are basically to divide databases and tables to build indexes. This is a very standard correct answer, but Reality is always very hard, so the interviewer will usually ask you, now that the construction period is insufficient and the personnel are insufficient, how can we achieve deep paging? At this time, students who have no practical experience are basically numb. So, please listen to me.
Painful Lessons
First of all, it must be clear: depth paging can be done, but depth is random Page jumps absolutely need to be banned. Previous picture:Why random depth page jumps cannot be allowed
Let’s briefly talk about why random depth page jumps cannot be allowed from a technical point of view, or that Why is deep paging not recommended?MySQL
The basic principle of paging:SELECT * FROM test ORDER BY id DESC LIMIT 10000, 20;
MongoDB
The basic principle of paging:db.t_data.find().limit(5).skip(5);
ElasticSearch
From a business perspective, ElasticSearch is not a typical database. It is a search engine. If the desired data is not found under the filter conditions , we will not find the data we want if we continue deep paging. To take a step back, if we use ES as a database for query, we will definitely encounter the limit of max_result_window when paging. Did you see it? Officials tell you the maximum The offset limit is ten thousand. Query process:- If you query page 501, with 10 items per page, the client sends a request to a certain node
- This node broadcasts data to each shard, and each shard queries the first 5010 pieces of data.
- The query results are returned to the node, and then the data is integrated and the first 5010 pieces of data are retrieved.
- Return to the client
Align with the product again
As the saying goes, problems that cannot be solved by technology should be solved by business! During my internship, I believed in the evil of the product, and it was necessary to implement deep paging and page jumps. Now we must correct the chaos, and the following changes must be made in the business: Add default filtering conditions as much as possible, such as : Time period, the purpose is to reduce the amount of data displayedModify the display method of page jumps, change it to scrolling display, or jump pages in a small rangeScrolling display reference picture:##General solutionThe quick solution in a short period of time mainly includes the following points:
- Required: For sorting fields and filter conditions, the index must be set
- Core: Use known data of small range page numbers, or known data of rolling loading, to reduce the offset
- Extra: If you encounter a situation that is difficult to handle, You can also obtain excess data and intercept it to a certain extent, and the performance impact will not be significant
Original paging SQL:
# 第一页 SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit 0, 20; # 第N页 SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit (N - 1) * 20, 20;
Through context, rewritten as:
# XXXX 代表已知的数据 SELECT * FROM `year_score` where `year` = 2017 and id > XXXX ORDER BY id limit 20;
在 没内鬼,来点干货!SQL优化和诊断 一文中提到过,LIMIT会在满足条件下停止查询,因此该方案的扫描总量会急剧减少,效率提升Max!
ES
方案和MySQL相同,此时我们就可以随用所欲的使用 FROM-TO Api,而且不用考虑最大限制的问题。
MongoDB
方案基本类似,基本代码如下:
相关性能测试:
如果非要深度随机跳页
如果你没有杠过产品经理,又该怎么办呢,没关系,还有一丝丝的机会。
在 SQL优化 一文中还提到过MySQL深度分页的处理技巧,代码如下:
# 反例(耗时129.570s) select * from task_result LIMIT 20000000, 10; # 正例(耗时5.114s) SELECT a.* FROM task_result a, (select id from task_result LIMIT 20000000, 10) b where a.id = b.id; # 说明 # task_result表为生产环境的一个表,总数据量为3400万,id为主键,偏移量达到2000万
该方案的核心逻辑即基于聚簇索引,在不通过回表的情况下,快速拿到指定偏移量数据的主键ID,然后利用聚簇索引进行回表查询,此时总量仅为10条,效率很高。
因此我们在处理MySQL,ES,MongoDB时,也可以采用一样的办法:
限制获取的字段,只通过筛选条件,深度分页获取主键ID
通过主键ID定向查询需要的数据
瑕疵:当偏移量非常大时,耗时较长,如文中的 5s
推荐教程:《MySQL教程》
文章来源:https://juejin.im/post/5f0de4d06fb9a07e8a19a641
The above is the detailed content of How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

Laravel is a PHP framework for easy building of web applications. It provides a range of powerful features including: Installation: Install the Laravel CLI globally with Composer and create applications in the project directory. Routing: Define the relationship between the URL and the handler in routes/web.php. View: Create a view in resources/views to render the application's interface. Database Integration: Provides out-of-the-box integration with databases such as MySQL and uses migration to create and modify tables. Model and Controller: The model represents the database entity and the controller processes HTTP requests.

The process of starting MySQL in Docker consists of the following steps: Pull the MySQL image to create and start the container, set the root user password, and map the port verification connection Create the database and the user grants all permissions to the database

MySQL and phpMyAdmin are powerful database management tools. 1) MySQL is used to create databases and tables, and to execute DML and SQL queries. 2) phpMyAdmin provides an intuitive interface for database management, table structure management, data operations and user permission management.

I encountered a tricky problem when developing a small application: the need to quickly integrate a lightweight database operation library. After trying multiple libraries, I found that they either have too much functionality or are not very compatible. Eventually, I found minii/db, a simplified version based on Yii2 that solved my problem perfectly.

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Article summary: This article provides detailed step-by-step instructions to guide readers on how to easily install the Laravel framework. Laravel is a powerful PHP framework that speeds up the development process of web applications. This tutorial covers the installation process from system requirements to configuring databases and setting up routing. By following these steps, readers can quickly and efficiently lay a solid foundation for their Laravel project.

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA
