MySQL基于时间字段进行分区的方案总结_MySQL
MySQL支持的分区类型一共有四种:RANGE,LIST,HASH,KEY。其中,RANGE又可分为原生RANGE和RANGE COLUMNS,LIST分为原生LIST和LIST COLUMNS,HASH分为原生HASH和LINEAR HASH,KEY包含原生KEY和LINEAR HASH。关于这些分区之间的差别,改日另写文章进行阐述。
最近,碰到一个需求,要对表的时间字段(类型:datetime)基于天进行分区。于是遍历MySQL官方文档分区章节,总结如下:
实现方式
主要是以下几种:
1. 基于RANGE
2. 基于RANGE COLUMNS
3. 基于HASH
测试数据
为了测试以上三种方案,特构造了100万的测试数据,放在test表中,test表只有两列:id和hiredate,其中hiredate只包含10天的数据,从2015-12-01到2015-12-10。具体信息如下:
mysql> show create table test\G *************************** 1. row *************************** Table: test Create Table: CREATE TABLE `test` ( `id` int(11) DEFAULT NULL, `hiredate` datetime DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> select min(hiredate),max(hiredate) from test; +---------------------+---------------------+ | min(hiredate) | max(hiredate) | +---------------------+---------------------+ | 2015-12-01 00:00:00 | 2015-12-10 23:59:56 | +---------------------+---------------------+ 1 row in set (0.44 sec) mysql> select date(hiredate),count(*) from test group by date(hiredate); +----------------+----------+ | date(hiredate) | count(*) | +----------------+----------+ | 2015-12-01 | 99963 | | 2015-12-02 | 100032 | | 2015-12-03 | 100150 | | 2015-12-04 | 99989 | | 2015-12-05 | 99908 | | 2015-12-06 | 99897 | | 2015-12-07 | 100137 | | 2015-12-08 | 100171 | | 2015-12-09 | 99851 | | 2015-12-10 | 99902 | +----------------+----------+ 10 rows in set (0.98 sec)
测试的维度
测试的维度主要从两个方面进行,
一、分区剪裁
针对特定的查询,是否能进行分区剪裁(即只查询相关的分区,而不是所有分区)
二、查询时间
鉴于该批测试数据是静止的(即没有并发进行的insert,update和delete操作),数据量也不太大,从这个维度来考量貌似意义也不是很大。
因此,重点测试第一个维度。
基于RANGE的分区方案
在这里,选用了TO_DAYS函数
CREATE TABLE range_datetime( id INT, hiredate DATETIME ) PARTITION BY RANGE (TO_DAYS(hiredate) ) ( PARTITION p1 VALUES LESS THAN ( TO_DAYS('20151202') ), PARTITION p2 VALUES LESS THAN ( TO_DAYS('20151203') ), PARTITION p3 VALUES LESS THAN ( TO_DAYS('20151204') ), PARTITION p4 VALUES LESS THAN ( TO_DAYS('20151205') ), PARTITION p5 VALUES LESS THAN ( TO_DAYS('20151206') ), PARTITION p6 VALUES LESS THAN ( TO_DAYS('20151207') ), PARTITION p7 VALUES LESS THAN ( TO_DAYS('20151208') ), PARTITION p8 VALUES LESS THAN ( TO_DAYS('20151209') ), PARTITION p9 VALUES LESS THAN ( TO_DAYS('20151210') ), PARTITION p10 VALUES LESS THAN ( TO_DAYS('20151211') ) );
插入数据并查看特定查询的执行计划
mysql> insert into range_datetime select * from test; Query OK, 1000000 rows affected (8.15 sec) Records: 1000000 Duplicates: 0 Warnings: 0 mysql> explain partitions select * from range_datetime where hiredate >= '20151207124503' and hiredate<='20151210111230'; +----+-------------+----------------+--------------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+--------------+------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | range_datetime | p7,p8,p9,p10 | ALL | NULL | NULL | NULL | NULL | 400061 | Using where | +----+-------------+----------------+--------------+------+---------------+------+---------+------+--------+-------------+ 1 row in set (0.03 sec)
注意执行计划中的partitions的内容,只查询了p7,p8,p9,p10三个分区,由此来看,使用to_days函数确实可以实现分区裁剪。
基于RANGE COLUMNS的分区方案
RANGE COLUMNS可以直接基于列,而无需像上述RANGE那种,分区的对象只能为整数。
创表语句如下:
CREATE TABLE range_columns ( id INT, hiredate DATETIME ) PARTITION BY RANGE COLUMNS(hiredate) ( PARTITION p1 VALUES LESS THAN ( '20151202' ), PARTITION p2 VALUES LESS THAN ( '20151203' ), PARTITION p3 VALUES LESS THAN ( '20151204' ), PARTITION p4 VALUES LESS THAN ( '20151205' ), PARTITION p5 VALUES LESS THAN ( '20151206' ), PARTITION p6 VALUES LESS THAN ( '20151207' ), PARTITION p7 VALUES LESS THAN ( '20151208' ), PARTITION p8 VALUES LESS THAN ( '20151209' ), PARTITION p9 VALUES LESS THAN ( '20151210' ), PARTITION p10 VALUES LESS THAN ('20151211' ) );
插入数据并查看上述查询的执行计划
mysql> insert into range_columns select * from test; Query OK, 1000000 rows affected (9.20 sec) Records: 1000000 Duplicates: 0 Warnings: 0 mysql> explain partitions select * from range_columns where hiredate >= '20151207124503' and hiredate<='20151210111230'; +----+-------------+---------------+--------------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+--------------+------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | range_columns | p7,p8,p9,p10 | ALL | NULL | NULL | NULL | NULL | 400210 | Using where | +----+-------------+---------------+--------------+------+---------------+------+---------+------+--------+-------------+ 1 row in set (0.11 sec)
同样,使用该分区方案也实现了分区剪裁。
基于HASH的分区方案
因HASH分区对象同样只能为整数,所以我们无法像上述RANGE COLUMNS那种直接引用列,在这里,同样用了TO_DAYS函数进行转换。
创表语句如下:
CREATE TABLE hash_datetime ( id INT, hiredate DATETIME ) PARTITION BY HASH( TO_DAYS(hiredate) ) PARTITIONS 10;
插入数据并查看上述查询的执行计划
mysql> insert into hash_datetime select * from test; Query OK, 1000000 rows affected (9.43 sec) Records: 1000000 Duplicates: 0 Warnings: 0 mysql> explain partitions select * from hash_datetime where hiredate >= '20151207124503' and hiredate<='20151210111230'; +----+-------------+---------------+-------------------------------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+-------------------------------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | hash_datetime | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9 | ALL | NULL | NULL | NULL | NULL | 1000500 | Using where | +----+-------------+---------------+-------------------------------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.00 sec)
不难看出,使用hash分区并不能有效的实现分区裁剪,至少在本例,基于天的需求中如此。
以上三种方案都是基于datetime的,那么,对于timestamp类型,又该如何选择呢?
事实上,MySQL提供了一种基于UNIX_TIMESTAMP函数的RANGE分区方案,而且,只能使用UNIX_TIMESTAMP函数,如果使用其它函数,譬如to_days,会报如下错误:“ERROR 1486 (HY000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed”。
而且官方文档中也提到“Any other expressions involving TIMESTAMP values are not permitted. (See Bug #42849.)”。
下面来测试一下基于UNIX_TIMESTAMP函数的RANGE分区方案,看其能否实现分区裁剪。
针对TIMESTAMP的分区方案
创表语句如下:
CREATE TABLE range_timestamp ( id INT, hiredate TIMESTAMP ) PARTITION BY RANGE ( UNIX_TIMESTAMP(hiredate) ) ( PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2015-12-02 00:00:00') ), PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2015-12-03 00:00:00') ), PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2015-12-04 00:00:00') ), PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2015-12-05 00:00:00') ), PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2015-12-06 00:00:00') ), PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2015-12-07 00:00:00') ), PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2015-12-08 00:00:00') ), PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2015-12-09 00:00:00') ), PARTITION p9 VALUES LESS THAN ( UNIX_TIMESTAMP('2015-12-10 00:00:00') ), PARTITION p10 VALUES LESS THAN (UNIX_TIMESTAMP('2015-12-11 00:00:00') ) );
插入数据并查看上述查询的执行计划
mysql> insert into range_timestamp select * from test; Query OK, 1000000 rows affected (13.25 sec) Records: 1000000 Duplicates: 0 Warnings: 0 mysql> explain partitions select * from range_timestamp where hiredate >= '20151207124503' and hiredate<='20151210111230'; +----+-------------+-----------------+--------------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+--------------+------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | range_timestamp | p7,p8,p9,p10 | ALL | NULL | NULL | NULL | NULL | 400448 | Using where | +----+-------------+-----------------+--------------+------+---------------+------+---------+------+--------+-------------+ 1 row in set (0.00 sec)
同样也能实现分区裁剪。
总结:
1. 经过对比,个人倾向于第二种方案,即基于RANGE COLUMNS的分区实现。
2. 在5.7版本之前,对于DATA和DATETIME类型的列,如果要实现分区裁剪,只能使用YEAR() 和TO_DAYS()函数,在5.7版本中,又新增了TO_SECONDS()函数。
3. 其实LIST也能实现基于天的分区方案,但在这个需求上,相比于RANGE,还是显得很鸡肋。
4. TIMESTAMP类型的列,只能基于UNIX_TIMESTAMP函数进行分区,切记!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











On the Douyin platform, many users are eager to obtain level certification, and the level 10 light sign shows the user's influence and recognition on Douyin. This article will delve into the price of Douyin’s level 10 light boards and the time it takes to reach this level to help users better understand the process. 1. How much does a level 10 Douyin light sign cost? The price of Douyin's 10-level light signs will vary depending on market fluctuations and supply and demand. The general price ranges from a few thousand yuan to ten thousand yuan. This price mainly includes the cost of the light sign itself and possible service fees. Users can purchase level 10 light signs through Douyin’s official channels or third-party service agencies, but they should pay attention to legal channels when purchasing to avoid false or fraudulent transactions. 2. How many days does it take to create a level 10 fan sign? Reach level 10 light sign

Artificial intelligence is a computing system that attempts to imitate human intelligence, including some human functions that are intuitively related to intelligence, such as learning, problem solving, and rational thinking and action. Broadly interpreted, the term AI covers many closely related fields such as machine learning. Systems that make heavy use of AI are having significant social impacts in areas such as healthcare, transportation, finance, social networks, e-commerce, and education. This growing social impact has also brought with it a series of risks and concerns, including errors in artificial intelligence software, cyberattacks and artificial intelligence system security. Therefore, the issue of verification of AI systems, and the broader topic of trustworthy AI, has begun to attract attention from the research community. “Verifiable AI” has been confirmed

Players can experience the main plot of the game and collect game achievements when playing in Elden's Circle. Many players don't know how long it takes to clear Elden's Circle. The player's clearance process is 30 hours. How long does it take to clear the Elden Ring? Answer: 30 hours. 1. Although this 30-hour clearance time does not refer to a master-like speed pass, it also omits a lot of processes. 2. If you want to get a better game experience or experience the complete plot, then you will definitely need to spend more time on the duration. 3. If players collect them all, it will take about 100-120 hours. 4. If you only take the main line to brush BOSS, it will take about 50-60 hours. 5. If you want to experience it all: 150 hours of base time.

Linux can reset the system time. The reset method is: 1. Use the date command to check the time; 2. Use the "yum install ntp" command to install ntp; 3. Use the "ntpdate -u ntp.api.bz" command to implement network time Just sync.

How to use PHP to remove hours, minutes and seconds from time: 1. Create a PHP sample file; 2. Use the strtotime function to convert the date and time into a timestamp; 3. Use the date function to format the date or time to remove the hours, minutes and seconds.

Xiaohongshu, a platform full of life and knowledge sharing, allows more and more creators to express their opinions freely. In order to get more attention and likes on Xiaohongshu, in addition to the quality of content, the time of publishing works is also crucial. So, how to set the time for Xiaohongshu to publish works? 1. How to set the time for publishing works on Xiaohongshu? 1. Understand the active time of users. First, it is necessary to clarify the active time of Xiaohongshu users. Generally speaking, 8 pm to 10 pm and weekend afternoons are the times when user activity is high. However, this time period will also vary depending on factors such as audience group and geography. Therefore, in order to better grasp the active period of users, it is recommended to conduct a more detailed analysis of the behavioral habits of different groups. By understanding users’ lives

In recent years, Go language has become the choice of more and more developers. However, compared to other programming languages, the compilation speed of Go language is not fast enough. Many developers will encounter this problem when compiling Go programs: Why does my Go program take longer to compile? This article will explore this issue from several aspects. The compiler architecture of Go language The compiler architecture of Go language adopts a three-stage design, which are front-end, middle layer and back-end. The front-end is responsible for translating the source code into intermediate code in Go language, and the middle layer will

Detailed explanation of Linux file time viewing techniques In Linux systems, file time information is very important for file management and tracking changes. The Linux system records file change information through three main time attributes, namely access time (atime), modification time (mtime) and change time (ctime). This article details how to view and manage this file time information, and provides specific code examples. 1. Check the file time information by using the ls command with the parameter -l to list the files.
