Home Database Mysql Tutorial What are the disadvantages of mysql clustered index?

What are the disadvantages of mysql clustered index?

Sep 19, 2017 am 09:35 AM
mysql Which shortcoming

The clustered index is not a separate index type, but a data storage method (not a data structure, but a storage structure). The specific details depend on its implementation, but the clustered index of innodb is actually The btree index and data rows are saved in the same structure.

When a table has an index, its data rows are actually stored in the leaf pages of the index. Clustering means that the data rows and adjacent key values ​​​​are compactly stored together, because the data rows cannot be stored at the same time. Stored in two different places, so a table can only have one clustered index. Because the storage engine is responsible for implementing the index, not all storage engines support clustered indexes. The following mainly introduces innodb, but the principles discussed below are applicable to any engine that supports clustered indexes:

The leaf page contains all the data of the row, but the node page only contains the index column (or it can be said that the non-leaf page The node pages of the node contain the index of the index value, because the values ​​contained in these node pages are extracted from the index column).

Innodb will aggregate data by primary key. If there is no primary key defined, Innodb will choose the first non-empty unique index instead. If there is no non-empty unique index, Innodb will implicitly define a 6-byte rowid primary key. as a clustered index. InnoDB only aggregates records in the same page, pages containing adjacent key values ​​may be far apart.

Note: Clustered primary keys may help performance, but they may also cause serious performance problems, especially when the table's storage engine is converted from innodb to another engine.

Aggregated data has some important advantages:

A: Related data can be saved together. For example, when implementing email, you can aggregate data based on user ID, so you only need to All emails of a user can be obtained by reading a small number of data pages from the disk. If the clustered index is not used, each email may cause a disk IO

B: Data access is faster, the clustered index will index and The data is stored in the same btree, so retrieving data from a clustered index is usually faster than looking up in a non-clustered index

C: Queries using covering index scans can directly use the primary key value in the page node

Disadvantages of clustered indexes:

A: Clustered data maximizes the performance of IO-intensive applications, but if the data is all placed in memory, the order of access is not so important. No, the clustered index has no advantage anymore

B: The insertion speed depends heavily on the insertion order. Inserting in the order of the primary key is the fastest way to load data into the innodb table, but if it is not loaded in the order of the primary key data, then it is best to use the optimize table command to reorganize the table after the loading is complete

C: Updating clustered index columns is very expensive because it forces innodb to move each updated row to a new location

D: When a table based on a clustered index inserts a new row, or when the primary key is updated and the row needs to be moved, it may face the problem of page splitting. When the primary key value of a row requires that the row must be inserted into a certain When the page is full, the storage engine will split the page into two pages to accommodate the row. This is a page splitting operation. Page splitting will cause the table to occupy more disk space

 E: Aggregation Indexes may cause full table scans to slow down, especially when rows are sparse, or data storage is discontinuous due to page splits

F: The secondary index may be larger than expected, because in the secondary index Leaf nodes contain the primary key columns of the reference rows.

G: Secondary index access requires two index lookups instead of one

Because what is stored in the secondary index leaf node is not the pointer to the physical location of the row, but the primary key value of the row. This means that when searching for rows through the secondary index, the storage engine needs to find the leaf node of the secondary index to obtain the corresponding primary key value, and then use this primary key value to find the corresponding row in the clustered index. Repeated work is done here, two btree lookups instead of once. For innodb, adaptive hash indexes can reduce such repeated work.

Comparison of data distribution between innodb and myisam physical storage:

Myisam:

It is stored on the disk in the order of data insertion. The primary key index and secondary level in myisam There is no difference in the structure of the index. The primary key index is a unique non-empty index named primary.

innodb:

Because innodb supports clustered indexes, it uses a very different way to store the same data. The innodb clustered index contains the data of the entire table, not just the index, because in Innodb , the clustered index is a table, so it does not require independent row storage like myisam. Each leaf node of the clustered index contains the primary key value, transaction ID, rollback pointer for transaction and MVCC, and the values ​​of all remaining columns. If the primary key is a column prefix index, InnoDB also contains the complete primary key column and The remaining column values.

Another thing that is different from myisam is that the secondary index of innodb is very different from the clustered index. The leaf nodes of the secondary index of innodb store not the row pointer, but the primary key value, and use this as Pointers to rows. This strategy reduces the maintenance work of the secondary index when rows are moved or data pages are split. Using the primary key value as a pointer will make the secondary index take up more space. The benefit is, InnoDB does not need to update this pointer in the secondary index when moving rows.

Insert rows in the order of primary key in the innodb table. If you are using the Innodb table and there is no data to be aggregated, you can define a surrogate key as the primary key. This primary key data should have nothing to do with the application. The simplest The method is to use auto_increment to automatically increment the column, which can ensure that the data rows are inserted in order, and the performance of association operations based on the primary key will be better.

Do not use UUID as a clustered index, otherwise the performance will be very bad, because it makes the insertion of the clustered index completely random, making the data without any clustering characteristics. Because UUID is used as the primary key to insert rows, not only does it take longer, but the index is also larger. This is because the primary key field has become longer. On the other hand, it is undoubtedly due to the longer time caused by page splitting and the index change caused by fragmentation. big. Because the primary key values ​​are sequential, Innodb stores each record after the previous record. When the maximum fill factor of the page is reached (InnoDB's default maximum fill factor is 15/16 of the page size, leaving (to free up some space for later modification), the next record will be written to a new page. Once the data is loaded in this sequence, the primary key page will be approximately filled with sequential records, which is what is expected. The results (however, secondary index pages may be different).

Under the UUID primary key, because the primary key value of the newly inserted row is not necessarily greater than the previous one, innodb cannot simply always insert the new row at the end of the index, but needs to find the new row. The appropriate location is usually the middle location of the existing data, and allocating new space will add a lot of extra work and lead to less than optimal data distribution. The following are some disadvantages of using UUID as the primary key:

A: The written target page may have been flushed to the disk and removed from the cache, or it has not been loaded into the cache. InnoDB has to find and read the target page from the disk into the memory before inserting it. This is Will result in a lot of random IO

B: Because writes are out of order, innodb has to do page splitting operations frequently to allocate space for new rows. Page splitting will cause a large amount of data to be moved and inserted at one time At least three pages need to be modified instead of one page

C: Due to frequent page splits, pages will become sparse and filled irregularly, so the final data will be fragmented

After loading these random values ​​into the clustered index, you may need to do an optimize table to rebuild the table and optimize page filling. When using InnoDB, you should insert data in primary key order as much as possible, and use a simple increment of the value of the clustering key to insert new rows whenever possible.

Note: When does a sequential primary key cause worse results?

For high-concurrency workloads, inserting in the order of primary keys in Innodb may cause obvious contention. The upper bound of the primary key will be called a hotspot, because all insertions occur here, so concurrent insertions may cause Gap lock contention, another hotspot may be the auto_increment lock mechanism. If you encounter this problem, you may need to redesign the table or application, or change the innodb_autoinc_lock_mode configuration.

The above is the detailed content of What are the disadvantages of mysql clustered index?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MySQL: An Introduction to the World's Most Popular Database MySQL: An Introduction to the World's Most Popular Database Apr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

MySQL's Place: Databases and Programming MySQL's Place: Databases and Programming Apr 13, 2025 am 12:18 AM

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

Why Use MySQL? Benefits and Advantages Why Use MySQL? Benefits and Advantages Apr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

How to connect to the database of apache How to connect to the database of apache Apr 13, 2025 pm 01:03 PM

Apache connects to a database requires the following steps: Install the database driver. Configure the web.xml file to create a connection pool. Create a JDBC data source and specify the connection settings. Use the JDBC API to access the database from Java code, including getting connections, creating statements, binding parameters, executing queries or updates, and processing results.

How to start mysql by docker How to start mysql by docker Apr 15, 2025 pm 12:09 PM

The process of starting MySQL in Docker consists of the following steps: Pull the MySQL image to create and start the container, set the root user password, and map the port verification connection Create the database and the user grants all permissions to the database

MySQL's Role: Databases in Web Applications MySQL's Role: Databases in Web Applications Apr 17, 2025 am 12:23 AM

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

Laravel Introduction Example Laravel Introduction Example Apr 18, 2025 pm 12:45 PM

Laravel is a PHP framework for easy building of web applications. It provides a range of powerful features including: Installation: Install the Laravel CLI globally with Composer and create applications in the project directory. Routing: Define the relationship between the URL and the handler in routes/web.php. View: Create a view in resources/views to render the application's interface. Database Integration: Provides out-of-the-box integration with databases such as MySQL and uses migration to create and modify tables. Model and Controller: The model represents the database entity and the controller processes HTTP requests.

How to install mysql in centos7 How to install mysql in centos7 Apr 14, 2025 pm 08:30 PM

The key to installing MySQL elegantly is to add the official MySQL repository. The specific steps are as follows: Download the MySQL official GPG key to prevent phishing attacks. Add MySQL repository file: rpm -Uvh https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm Update yum repository cache: yum update installation MySQL: yum install mysql-server startup MySQL service: systemctl start mysqld set up booting

See all articles