Home Database Mysql Tutorial 关于mongodb创建索引的一些经验总结

关于mongodb创建索引的一些经验总结

Jun 07, 2016 pm 03:58 PM
mongodb about create index Experience summary

想来接触mongodb已经快一年了,对于它的索引知识也积攒了不少经验,趁着这个月黑风高的夜晚,就把mongodb的索引总结一番吧。 一,索引介绍 mongodb具有两类索引,分别为单键索引和复合索引。 1.单键索引是最简单的一种索引,创建单键索引的开销要比复合索引

想来接触mongodb已经快一年了,对于它的索引知识也积攒了不少经验,趁着这个月黑风高的夜晚,就把mongodb的索引总结一番吧。

一,索引介绍


mongodb具有两类索引,分别为单键索引和复合索引。

1.单键索引是最简单的一种索引,创建单键索引的开销要比复合索引小很多。单键索引主要用于针对单值查询的条件。

2.复合索引是将文档中的几个键联合起来创建的一种索引,创建这种索引需要更多的空间与性能开销。分别体现在:

1).在给大量数据创建复合索引时,会阻塞数据库的查询,更不用说修改和插入操作了;

2).插入一条数据时,要花费更多的时间来给复合索引加数据;

3).创建的复合索引所站得空间大小根据数据的类型以及键的数量而有所不同。比如,如果你用五个NumberInt的键创建的复合索引的空间大小,并不会比两个NumberInt和一个String类型创建的复合索引占用更多的空间。索引在设计数据类型时,尽量将数据类型设置为NumberInt类型,以及尽量少使用string类型的数据做索引;

二,创建索引


创建索引的语句很简单。

1.单键索引的创建:db.test.ensureIndex({name:1},{name:'index_name'})

2.复合索引的创建:db.test.ensureIndex({name:1,age:1,sex:1},{name:'index_nas'})

三,索引优化


索引的优化是一个重头戏,需要详细的来解释。我得测试数据插入了100万条。字段分别为name,sex,type,time,id

1.我们来看一个简单的查询:db.test.find({name:'name_1'}) 相信大家对这个查询已经很熟悉了,然后我们来看看这个语句的索引执行计划:

{
	"cursor" : "BasicCursor",   查询语句所用到的索引,而BasicCursor代表没有索引
	"isMultiKey" : false,     是否为复合索引
	"n" : 1,       查询到的结果数
	"nscannedObjects" : 1000000,    扫描的文档数量
	"nscanned" : 1000000,     扫面的索引数量
	"nscannedObjectsAllPlans" : 1000000,   //影响的所有的被扫描文档的总数量
	"nscannedAllPlans" : 1000000,      //所有被扫描的索引的总数量
	"scanAndOrder" : false,  是否排序
	"indexOnly" : false,
	"nYields" : 2,
	"nChunkSkips" : 0,
	"millis" : 342,   花费的时间
	"indexBounds" : {
		
	},
	"server" : "node1:27017"
}
Copy after login

从这个执行计划中可以看出,该条查询语句查询一条数据需要扫描整个表,这肯定扯淡了嘛,那这时候就该给这个字段创建索引了,创建一个单键索引

db.test.ensureIndex({name:1},{name:'index_name'})

创建完索引之后,再来查看看这条查询语句的执行计划:

{
	"cursor" : "BtreeCursor index_name",
	"isMultiKey" : false,
	"n" : 1,
	"nscannedObjects" : 1,
	"nscanned" : 1,
	"nscannedObjectsAllPlans" : 1,
	"nscannedAllPlans" : 1,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"name" : [
			[
				"name_1",
				"name_1"
			]
		]
	},
	"server" : "node1:27017"
}
Copy after login

简直是逆天啊,nscanned和nscannedObjects居然从100万下降到1条,也就是查询数据时,只扫描了一条就已经找到,而且花费的时间是0秒,没有创建索引时,居然是342毫秒,绝对索引威武啊。

2.这时候我想通过type和sex来组合查询某一条件的数据: db.test.find({type:1,sex:0}) 看看这句的执行计划:

{
	"cursor" : "BasicCursor",
	"isMultiKey" : false,
	"n" : 55555,
	"nscannedObjects" : 1000000,
	"nscanned" : 1000000,
	"nscannedObjectsAllPlans" : 1000000,
	"nscannedAllPlans" : 1000000,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 529,
	"indexBounds" : {
		
	},
	"server" : "node1:27017"
}
Copy after login

从这个计划中可以看出,为了查找几万条数据,它也扫描了整个表,很显然,该创建索引了:

db.test.ensureIndex({type:1,sex:1},{name:'index_ts'})

创建完索引之后,再来执行查询语句,看看执行计划:

db.test.find({type:1,sex:0}).explain()
{
	"cursor" : "BtreeCursor index_ts",
	"isMultiKey" : false,
	"n" : 55555,
	"nscannedObjects" : 55555,
	"nscanned" : 55555,
	"nscannedObjectsAllPlans" : 55555,
	"nscannedAllPlans" : 55555,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 112,
	"indexBounds" : {
		"type" : [
			[
				1,
				1
			]
		],
		"sex" : [
			[
				0,
				0
			]
		]
	},
	"server" : "node1:27017"
}
Copy after login

很显然,绝对是一个最佳索引,因为n=nscannedObjects=nscanned了,而且查询时间从529毫秒下降到112毫秒了,这也是一个质的飞跃,可以明显的看到,它使用了刚刚创建的index_ts索引。

现在我又有一个需求了,我想通过时间再来排序,好的,我们执行查询语句: db.test.find({type:1,sex:0}).sort({time:-1}) 我们来看看这个查询语句的执行计划:

{
	"cursor" : "BtreeCursor index_ts",
	"isMultiKey" : false,
	"n" : 55555,
	"nscannedObjects" : 1000000,
	"nscanned" : 1000000,
	"nscannedObjectsAllPlans" : 1000000,
	"nscannedAllPlans" : 1000000,
	"scanAndOrder" : true,
	"indexOnly" : false,
	"nYields" : 1,
	"nChunkSkips" : 0,
	"millis" : 695,
	"indexBounds" : {
		"type" : [
			[
				1,
				1
			]
		],
		"sex" : [
			[
				0,
				0
			]
		]
	},
	"server" : "node1:27017"
}
Copy after login

看到没,这个查询语句跟上一个创建索引之后的查询出来的结果相差还是很大的,scanAndOrder和millis,时间花费了将近700毫秒,而且在查询完毕之后还要排序,这也太不近人情了,就加了一个排序操作,怎么会让它从白天鹅变成丑小鸭了呢?啊,关键参数就是scanAndOrder,意思就是在内存中把结果排序了嘛,那好啊,既然你如此薄情,那我就建个复合索引来对抗: db.test.ensureIndex({type:1,sex:1,time:-1},{name:'index_tst'})

{
	"cursor" : "BtreeCursor index_tst",
	"isMultiKey" : false,
	"n" : 55555,
	"nscannedObjects" : 55555,
	"nscanned" : 55555,
	"nscannedObjectsAllPlans" : 55555,
	"nscannedAllPlans" : 55555,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 126,
	"indexBounds" : {
		"type" : [
			[
				1,
				1
			]
		],
		"sex" : [
			[
				0,
				0
			]
		],
		"time" : [
			[
				{
					"$maxElement" : 1
				},
				{
					"$minElement" : 1
				}
			]
		]
	},
	"server" : "node1:27017"
}
Copy after login

看到了吗?各种参数又回到最佳状态了。这时候可能有人会问了,为什么要把time放到索引的最后而不是其它位置呢?其实这在创建索引时是有要求的,即:

  1. 将等值索引放在最前面

  2. 尽量将排序字段放在范围字段的前面

  3. $nin和$ne跟索引没有关系

    接下来我们再给查询语句加条件: db.test.find({type:1,sex:0,id:{$gt:1,$lt:500000}}) 执行计划如下:

    {
    	"cursor" : "BasicCursor",
    	"isMultiKey" : false,
    	"n" : 55555,
    	"nscannedObjects" : 1000000,
    	"nscanned" : 1000000,
    	"nscannedObjectsAllPlans" : 1000000,
    	"nscannedAllPlans" : 1000000,
    	"scanAndOrder" : false,
    	"indexOnly" : false,
    	"nYields" : 2,
    	"nChunkSkips" : 0,
    	"millis" : 553,
    	"indexBounds" : {
    		
    	},
    	"server" : "node1:27017"
    }
    Copy after login

    可以看到,只返回两万多条数据,但是却扫描了整个表,这肯定是很蛋疼的事情嘛,索引走起:

    db.test.ensureIndex({type:1,sex:1,id:1},{name:'index_tis'})

    {
    	"cursor" : "BtreeCursor index_tis",
    	"isMultiKey" : false,
    	"n" : 55555,
    	"nscannedObjects" : 55555,
    	"nscanned" : 55555,
    	"nscannedObjectsAllPlans" : 55555,
    	"nscannedAllPlans" : 55555,
    	"scanAndOrder" : false,
    	"indexOnly" : false,
    	"nYields" : 1,
    	"nChunkSkips" : 0,
    	"millis" : 137,
    	"indexBounds" : {
    		"type" : [
    			[
    				1,
    				1
    			]
    		],
    		"sex" : [
    			[
    				0,
    				0
    			]
    		],
    		"id" : [
    			[
    				1,
    				1000000
    			]
    		]
    	},
    	"server" : "node1:27017"
    }
    Copy after login

    很显然,这是个非常不错的组合索引,那为何不把id放在其它地方,偏偏放在最后面呢?因为在mongodb中,索引是从左到右执行的,因此显然要从左到右一次过滤最大数量的数据显然type和sex的组合过滤数据量要比id高更多,因为id的忙查率要远高于这两个组合。

    接着再把按time排序加上,查询:db.test.find({type:1,sex:1,id:{$gt:0,$lt:1000000}}).sort({time:-1}).explain()

    {
    	"cursor" : "BasicCursor",
    	"isMultiKey" : false,
    	"n" : 55556,
    	"nscannedObjects" : 1000000,
    	"nscanned" : 1000000,
    	"nscannedObjectsAllPlans" : 1000000,
    	"nscannedAllPlans" : 1000000,
    	"scanAndOrder" : true,
    	"indexOnly" : false,
    	"nYields" : 1,
    	"nChunkSkips" : 0,
    	"millis" : 725,
    	"indexBounds" : {
    		
    	},
    	"server" : "node1:27017"
    }
    Copy after login

    可以看到,这个查询语句也是极其慢的,而且还要再内存中排序,所以肯定要创建索引了:

    db.test.ensureIndex({type:1,sex:1,id:1,time:-1},{name:'index_tist'}) 我们先这样创建索引,看看执行计划:

    {
    	"cursor" : "BtreeCursor index_tist",
    	"isMultiKey" : false,
    	"n" : 55556,
    	"nscannedObjects" : 55556,
    	"nscanned" : 55556,
    	"nscannedObjectsAllPlans" : 55657,
    	"nscannedAllPlans" : 55657,
    	"scanAndOrder" : true,
    	"indexOnly" : false,
    	"nYields" : 0,
    	"nChunkSkips" : 0,
    	"millis" : 404,
    	"indexBounds" : {
    		"type" : [
    			[
    				1,
    				1
    			]
    		],
    		"sex" : [
    			[
    				1,
    				1
    			]
    		],
    		"id" : [
    			[
    				0,
    				1000000
    			]
    		],
    		"time" : [
    			[
    				{
    					"$maxElement" : 1
    				},
    				{
    					"$minElement" : 1
    				}
    			]
    		]
    	},
    	"server" : "node1:27017"
    }
    Copy after login

    看到了没有,虽然查询时间缩短了,但是这个查询结果还是会排序结果,好,我们再把索引改改:

    db.test.ensureIndex({type:1,sex:1,time:-1,id:1},{name:'index_tist'})

    {
    	"cursor" : "BtreeCursor index_tist",
    	"isMultiKey" : false,
    	"n" : 55556,
    	"nscannedObjects" : 55556,
    	"nscanned" : 55556,
    	"nscannedObjectsAllPlans" : 55657,
    	"nscannedAllPlans" : 55657,
    	"scanAndOrder" : false,
    	"indexOnly" : false,
    	"nYields" : 0,
    	"nChunkSkips" : 0,
    	"millis" : 168,
    	"indexBounds" : {
    		"type" : [
    			[
    				1,
    				1
    			]
    		],
    		"sex" : [
    			[
    				1,
    				1
    			]
    		],
    		"time" : [
    			[
    				{
    					"$maxElement" : 1
    				},
    				{
    					"$minElement" : 1
    				}
    			]
    		],
    		"id" : [
    			[
    				0,
    				1000000
    			]
    		]
    	},
    	"server" : "node1:27017"
    }
    Copy after login

    再来看看,快到什么程度了,这个查询的速度和参数条件已经比上一个索引的快了很多,那为什么会出现这种情况呢?为什么time在id的前后会有不同的表现?这是因为通过type和sex字段过滤完之后,已经在内存中有了数据,而这些数据下一步需要怎么办?是先通过id来筛选,还是按照排序筛选呢?这里有一个知识点,在把id放在time前面时,程序首先会取复合id值,然后再把复合的数据排序,但是如果id放在排序的后面,那么程序将直接通过顺序扫描索引树的方式取出复合id范围的数据。

    四,总结


    1.mongodb创建索引难点在于排序和范围查询的字段位置选择

    2.mongodb的复合索引的索引截取查询是顺序的,即如果(a:1,b:1,c:1},则可以是查询{a:1},{a:1,b:1},{a:1,b:1,c:1}中得任何一种都会使用该索引,其它查询情况将不会用到该索引;

    3.尽量创建更少的索引以提高数据库性能

    4.以上的索引优化只是生产环境的一部分,具体情况可能还要看自己的业务来定

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1669
14
PHP Tutorial
1273
29
C# Tutorial
1256
24
Use Composer to solve the dilemma of recommendation systems: andres-montanez/recommendations-bundle Use Composer to solve the dilemma of recommendation systems: andres-montanez/recommendations-bundle Apr 18, 2025 am 11:48 AM

When developing an e-commerce website, I encountered a difficult problem: how to provide users with personalized product recommendations. Initially, I tried some simple recommendation algorithms, but the results were not ideal, and user satisfaction was also affected. In order to improve the accuracy and efficiency of the recommendation system, I decided to adopt a more professional solution. Finally, I installed andres-montanez/recommendations-bundle through Composer, which not only solved my problem, but also greatly improved the performance of the recommendation system. You can learn composer through the following address:

Navicat's method to view MongoDB database password Navicat's method to view MongoDB database password Apr 08, 2025 pm 09:39 PM

It is impossible to view MongoDB password directly through Navicat because it is stored as hash values. How to retrieve lost passwords: 1. Reset passwords; 2. Check configuration files (may contain hash values); 3. Check codes (may hardcode passwords).

How to choose a database for GitLab on CentOS How to choose a database for GitLab on CentOS Apr 14, 2025 pm 04:48 PM

GitLab Database Deployment Guide on CentOS System Selecting the right database is a key step in successfully deploying GitLab. GitLab is compatible with a variety of databases, including MySQL, PostgreSQL, and MongoDB. This article will explain in detail how to select and configure these databases. Database selection recommendation MySQL: a widely used relational database management system (RDBMS), with stable performance and suitable for most GitLab deployment scenarios. PostgreSQL: Powerful open source RDBMS, supports complex queries and advanced features, suitable for handling large data sets. MongoDB: Popular NoSQL database, good at handling sea

What is the CentOS MongoDB backup strategy? What is the CentOS MongoDB backup strategy? Apr 14, 2025 pm 04:51 PM

Detailed explanation of MongoDB efficient backup strategy under CentOS system This article will introduce in detail the various strategies for implementing MongoDB backup on CentOS system to ensure data security and business continuity. We will cover manual backups, timed backups, automated script backups, and backup methods in Docker container environments, and provide best practices for backup file management. Manual backup: Use the mongodump command to perform manual full backup, for example: mongodump-hlocalhost:27017-u username-p password-d database name-o/backup directory This command will export the data and metadata of the specified database to the specified backup directory.

MongoDB and relational database: a comprehensive comparison MongoDB and relational database: a comprehensive comparison Apr 08, 2025 pm 06:30 PM

MongoDB and relational database: In-depth comparison This article will explore in-depth the differences between NoSQL database MongoDB and traditional relational databases (such as MySQL and SQLServer). Relational databases use table structures of rows and columns to organize data, while MongoDB uses flexible document-oriented models to better suit the needs of modern applications. Mainly differentiates data structures: Relational databases use predefined schema tables to store data, and relationships between tables are established through primary keys and foreign keys; MongoDB uses JSON-like BSON documents to store them in a collection, and each document structure can be independently changed to achieve pattern-free design. Architectural design: Relational databases need to pre-defined fixed schema; MongoDB supports

How to set up users in mongodb How to set up users in mongodb Apr 12, 2025 am 08:51 AM

To set up a MongoDB user, follow these steps: 1. Connect to the server and create an administrator user. 2. Create a database to grant users access. 3. Use the createUser command to create a user and specify their role and database access rights. 4. Use the getUsers command to check the created user. 5. Optionally set other permissions or grant users permissions to a specific collection.

How to encrypt data in Debian MongoDB How to encrypt data in Debian MongoDB Apr 12, 2025 pm 08:03 PM

Encrypting MongoDB database on a Debian system requires following the following steps: Step 1: Install MongoDB First, make sure your Debian system has MongoDB installed. If not, please refer to the official MongoDB document for installation: https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/Step 2: Generate the encryption key file Create a file containing the encryption key and set the correct permissions: ddif=/dev/urandomof=/etc/mongodb-keyfilebs=512

What are the tools to connect to mongodb What are the tools to connect to mongodb Apr 12, 2025 am 06:51 AM

The main tools for connecting to MongoDB are: 1. MongoDB Shell, suitable for quickly viewing data and performing simple operations; 2. Programming language drivers (such as PyMongo, MongoDB Java Driver, MongoDB Node.js Driver), suitable for application development, but you need to master the usage methods; 3. GUI tools (such as Robo 3T, Compass) provide a graphical interface for beginners and quick data viewing. When selecting tools, you need to consider application scenarios and technology stacks, and pay attention to connection string configuration, permission management and performance optimization, such as using connection pools and indexes.

See all articles