Home Database Mysql Tutorial How to index string fields in MySQL

How to index string fields in MySQL

May 28, 2023 pm 02:38 PM
mysql string

Assume that you currently maintain a system that supports email login. The user table is defined like this:

create table SUser(
 ID bigint unsigned primary key,
 email varchar(64), 
 ... 
 )engine=innodb;
Copy after login

Since you need to use email to log in, statements similar to this must appear in the business code:

select f1, f2 from SUser where email='xxx';
Copy after login

If there is no index on the email field, then this statement can only perform a full table scan.

1) Can I build an index on the email address field?

  • MySQL supports prefix index, you can define a part of the string as the index

2) If you create an index What happens if the statement does not specify the prefix length?

  • The index will contain the entire string

3) Can you give me an example?

alter table SUser add index index1(email);
 或
 alter table SUser add index index2(email(6));
Copy after login
  • index1 In the index, it contains the entire string of each record

  • index2 In the index, for each record Just take the first 6 bytes

#4) What are the differences between these two different definitions in data structure and storage?

How to index string fields in MySQL

How to index string fields in MySQL

It is obvious that the email(6) index structure will occupy less space

5) Email(6) Are there any disadvantages to this index structure?

  • may increase the number of additional record scans

6) The following statement is defined in these two indexes How are the following implemented?

select id,name,email from SUser where email='zhangssxyz@xxx.com';
Copy after login
Copy after login

index1 (that is, the index structure of the entire email string), execution sequence

  • Find the index value that satisfies the index value from the index1 index tree and is ’zhangssxyz@xxx For this record of .com’, get the value of ID2;

  • Go back to the table and find the row whose primary key value is ID2, judge that the value of email is correct, and add this row of records Result set;

  • Continue to the next record in the index index tree, and find that the condition of email='zhangssxyz@xxx.com’ is no longer met, and the loop ends.

In this process, you only need to retrieve data from the primary key index once, so the system thinks that only one row has been scanned.

index2 (i.e. email(6) index structure), execution sequence

  • Find the record that satisfies the index value of "zhangs" from the index2 index tree, and find the first One is ID1;

  • Go to the primary key and find the row whose primary key value is ID1. It is judged that the value of email is not "zhangssxyz@xxx.com", and this row of records is discarded;

  • Get the next record at the location just found on index2, and find that it is still "zhangs". Take out ID2, then get the entire row on the ID index and judge, this time the value By the way, add this row of records to the result set;

  • Repeat the previous step until the value obtained on idxe2 is not "zhangs", and the loop ends.

In this process, the primary key index needs to be retrieved 4 times, that is, 4 rows are scanned.

7) What conclusions can be drawn from the above comparison?

  • #Using a prefix index may cause the query statement to read data more times.

8) Are prefix indexes really useless?

  • If the index2 we define is not email(6) but email(7), then there is only one record that satisfies the prefix ’zhangss’, and ID2 can be directly found. , scanning only one line and it ends.

#9) So what are the precautions for using prefix index?

  • Length selection is reasonable

10) When creating a prefix index for a string, how do I know what I should do? What length of prefix index should be used?

  • # Count how many different values ​​there are on the index to determine how long a prefix should be used.

11) How to count how many different values ​​there are on the index?

select count(distinct email) as L from SUser;
Copy after login

12) What should we do next after getting how many different values ​​corresponding to the index?

  • Select prefixes of different lengths in turn to see this value

    select 
       count(distinct left(email,4))as L4,
       count(distinct left(email,5))as L5,
       count(distinct left(email,6))as L6,
       count(distinct left(email,7))as L7,
     from SUser;
    Copy after login
  • Then, among L4~L7, find the first A value not less than L * 95% means that more than 95% of the data can be found through this index.

13) What is the impact of prefix index on covering index?

The following SQL statement:

select id,email from SUser where email='zhangssxyz@xxx.com';
Copy after login

Compared with the SQL statement

select id,name,email from SUser where email='zhangssxyz@xxx.com';
Copy after login
Copy after login

in the previous example, the first statement only requires the return of the id and email fields.

  • If you use index1 (that is, the index structure of the entire email string), you can get the ID by checking the email. Then there is no need to return the table. This is a covering index.

  • 用 index2(即 email(6) 索引结构)的话,就不得不回到 ID 索引再去判断 email 字段的值。

14)那我把index2 的定义修改为 email(18) 的前缀索引不就行了?

  • 这个18是你自己定义的,系统不知道18这个长度是否已经大于我的email长度,所以它还是会回表去查一下验证。

总而言之:使用前缀索引就用不上覆盖索引对查询性能的优化了

针对类似于邮箱这样的字段,使用前缀索引可能会产生不错的效果。但是,遇到身份证这种前缀的区分度不够好的情况时,我们要怎么办呢?

  • 索引选取的要更长一些。

    • 但是所以越长的话,占的磁盘空间更大,相同的一页能放下的索引值就变少了,反而会影响查询效率。

16)如果我们能够确定业务需求里面只有按照身份证进行等值查询的需求,还有没有别的处理方法呢?

  • 既然正过来相同的多,那我就把它倒过来存。查询时候这样查

    select field_list from t where id_card = reverse('input_id_card_string');
    Copy after login

    使用 的时候用count(distinct) 方法去做个验证

  • 使用 hash 字段。在表上再创建一个整数字段,来保存身份证的校验码,同时在这个字段上创建索引。

    alter table t add id_card_crc int unsigned, add index(id_card_crc);
    Copy after login

    新记录插入时必须使用 crc32() 函数生成校验码,并填入新字段中。由于校验码可能存在冲突,也就是说两个不同的身份证号通过 crc32() 函数得到的结果可能是相同的,所以你的查询语句 where 部分要判断 id_card 的值是否精确相同。

    select field_list from t where id_card_crc=crc32('input_id_card_string') and id_card='input_id_card_string'
    Copy after login

    这样,索引的长度变成了 4 个字节(int类型),比原来小了很多

    17)使用倒序存储和使用 hash 字段这两种方法有什么异同点?

    • 相同点:都不支持范围查询

      • 倒序存储的字段上创建的索引是按照倒序字符串的方式排序的,已经没有办法利用索引方式查出身份证号码在[ID_X, ID_Y]的所有市民了。同样地,hash 字段的方式也只能支持等值查询。

    • 区别

      • 从占用的额外空间来看,倒序存储方式在主键索引上,不会消耗额外的存储空间,而 hash 字段方法需要增加一个字段。当然,倒序存储方式使用 4 个字节的前缀长度应该是不够的,如果再长一点,这个消耗跟额外这个 hash 字段也差不多抵消了。

      • 在 CPU 消耗方面,倒序方式每次写和读的时候,都需要额外调用一次 reverse 函数,而 hash 字段的方式需要额外调用一次 crc32() 函数。以仅考虑这两个函数的计算复杂度为前提,reverse 函数对 CPU 资源的额外消耗将较少。

      • 就查询性能而言,采用哈希字段方式的查询更具可靠性。虽然crc32算法不可避免地存在冲突的风险,但这种风险极其微小,因此我们可以认为查询时平均扫描行数接近于1。使用倒序存储方式仍然需要使用前缀索引来进行扫描,因此会增加扫描的行数。

    案例:如果你在维护一个学校的学生信息数据库,学生登录名的统一格式是”学号 @gmail.com", 而学号的规则是:十五位的数字,其中前三位是所在城市编号、第四到第六位是学校编号、第七位到第十位是入学年份、最后五位是顺序编号。

    学生必须输入正确的登录名和密码,方可继续使用系统。如果只考虑登录验证这个行为,你会如何为登录名设计索引?

    • 如果一个学校每年预计2万新生,50年才100万记录,如果直接使用全字段索引,可以节省多少存储空间?。除非遇到超大规模数据,否则不需要使用后两种方法,从而避免了开发转换和限制风险

    • 在实际操作中,只需对所有字段进行索引,一个学校的数据库数据量和查询负担不会变得很大。 如果单从优化数据表的角度: \1. 后缀@gmail可以单独一个字段来存,或者用业务代码来保证, \2. 城市编号和学校编号估计也不会变,也可以用业务代码来配置 \3. 然后直接存年份和顺序编号就行了,这个字段可以全字段索引

    The above is the detailed content of How to index string fields in MySQL. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1669
14
PHP Tutorial
1273
29
C# Tutorial
1256
24
Laravel Introduction Example Laravel Introduction Example Apr 18, 2025 pm 12:45 PM

Laravel is a PHP framework for easy building of web applications. It provides a range of powerful features including: Installation: Install the Laravel CLI globally with Composer and create applications in the project directory. Routing: Define the relationship between the URL and the handler in routes/web.php. View: Create a view in resources/views to render the application's interface. Database Integration: Provides out-of-the-box integration with databases such as MySQL and uses migration to create and modify tables. Model and Controller: The model represents the database entity and the controller processes HTTP requests.

MySQL and phpMyAdmin: Core Features and Functions MySQL and phpMyAdmin: Core Features and Functions Apr 22, 2025 am 12:12 AM

MySQL and phpMyAdmin are powerful database management tools. 1) MySQL is used to create databases and tables, and to execute DML and SQL queries. 2) phpMyAdmin provides an intuitive interface for database management, table structure management, data operations and user permission management.

MySQL vs. Other Programming Languages: A Comparison MySQL vs. Other Programming Languages: A Comparison Apr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages ​​such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages ​​have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Laravel framework installation method Laravel framework installation method Apr 18, 2025 pm 12:54 PM

Article summary: This article provides detailed step-by-step instructions to guide readers on how to easily install the Laravel framework. Laravel is a powerful PHP framework that speeds up the development process of web applications. This tutorial covers the installation process from system requirements to configuring databases and setting up routing. By following these steps, readers can quickly and efficiently lay a solid foundation for their Laravel project.

Explain the purpose of foreign keys in MySQL. Explain the purpose of foreign keys in MySQL. Apr 25, 2025 am 12:17 AM

In MySQL, the function of foreign keys is to establish the relationship between tables and ensure the consistency and integrity of the data. Foreign keys maintain the effectiveness of data through reference integrity checks and cascading operations. Pay attention to performance optimization and avoid common errors when using them.

Compare and contrast MySQL and MariaDB. Compare and contrast MySQL and MariaDB. Apr 26, 2025 am 12:08 AM

The main difference between MySQL and MariaDB is performance, functionality and license: 1. MySQL is developed by Oracle, and MariaDB is its fork. 2. MariaDB may perform better in high load environments. 3.MariaDB provides more storage engines and functions. 4.MySQL adopts a dual license, and MariaDB is completely open source. The existing infrastructure, performance requirements, functional requirements and license costs should be taken into account when choosing.

SQL vs. MySQL: Clarifying the Relationship Between the Two SQL vs. MySQL: Clarifying the Relationship Between the Two Apr 24, 2025 am 12:02 AM

SQL is a standard language for managing relational databases, while MySQL is a database management system that uses SQL. SQL defines ways to interact with a database, including CRUD operations, while MySQL implements the SQL standard and provides additional features such as stored procedures and triggers.

What software is better for yi framework? Recommended software for yi framework What software is better for yi framework? Recommended software for yi framework Apr 18, 2025 pm 11:03 PM

Abstract of the first paragraph of the article: When choosing software to develop Yi framework applications, multiple factors need to be considered. While native mobile application development tools such as XCode and Android Studio can provide strong control and flexibility, cross-platform frameworks such as React Native and Flutter are becoming increasingly popular with the benefits of being able to deploy to multiple platforms at once. For developers new to mobile development, low-code or no-code platforms such as AppSheet and Glide can quickly and easily build applications. Additionally, cloud service providers such as AWS Amplify and Firebase provide comprehensive tools

See all articles