首页 数据库 mysql教程 Efficient Indexing in MongoDB 2.6

Efficient Indexing in MongoDB 2.6

Jun 07, 2016 pm 04:35 PM
mongodb

By Osmar Olivo, Product Manager at MongoDB One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Havi

By Osmar Olivo, Product Manager at MongoDB

One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Having extensive indexing functionality makes it easier for developers to build apps that provide rich functionality and low latency.

MongoDB 2.6 introduces a new query planner, including the ability to perform index intersection. Prior to 2.6 the query planner could only make use of a single index for most queries. That meant that if you wanted to query on multiple fields together, you needed to create a compound index. It also meant that if there were several different combinations of fields you wanted to query on, you might need several different compound indexes.

Each index adds overhead to your deployment - indexes consume space, on disk and in RAM, and indexes are maintained during updates, which adds disk IO. In other words, indexes improve the efficiency of many operations, but they also come at a cost. For many applications, index intersection will allow users to reduce the number of indexes they need while still providing rich features and low latency.

In the following sections we will take a deep dive into index intersection and how it can be applied to applications.

An Example - The Phone Book

Let’s take the example of a phone book with the following schema.

{
    FirstName
    LastName
    Phone_Number
    Address
}
登录后复制

If I were to search for “Smith, John” how would I index the following query to be as efficient as possible?

db.phonebook.find({ FirstName : “John”, LastName : “Smith” })
登录后复制

I could use an individual index on FirstName and search for all of the “Johns”.

This would look something like ensureIndex( { FirstName : 1 } )

We run this query and we get back 200,000 John Smiths. Looking at the explain() output below however, we see that we scanned 1,000,000 “Johns” in the process of finding 200,000 “John Smiths”.

> db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor FirstName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 1000000,
    "nscanned" : 1000000,
    "nscannedObjectsAllPlans" : 1000101,
    "nscannedAllPlans" : 1000101,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 2,
    "nChunkSkips" : 0,
    "millis" : 2043,
    "indexBounds" : {
        "FirstName" : [
            [
                "John",
                "John"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}
登录后复制

How about creating an individual index on LastName?

This would look something like ensureIndex( { LastName : 1 } )

Running this query we get back 200,000 “John Smiths” but our explain output says that we now scanned 400,000 “Smiths”. How can we make this better?

db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor LastName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 400000,
    "nscanned" : 400000,
    "nscannedObjectsAllPlans" : 400101,
    "nscannedAllPlans" : 400101,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 1,
    "nChunkSkips" : 0,
    "millis" : 852,
    "indexBounds" : {
        "LastName" : [
            [
                "Smith",
                "Smith"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}
登录后复制

So we know that there are 1,000,000 “John” entries, 400,000 “Smith” entries, and 200,000 “John Smith” entries in our phonebook. Is there a way that we can scan just the 200,000 we need?

In the case of a phone book this is somewhat simple; since we know that we want it to be sorted by Lastname, Firstname we can create a compound index on them, like the below.

ensureIndex( {  LastName : true, FirstName : 1  } ) 
db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor LastName_1_FirstName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 200000,
    "nscanned" : 200000,
    "nscannedObjectsAllPlans" : 200000,
    "nscannedAllPlans" : 200000,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 370,
    "indexBounds" : {
        "LastName" : [
            [
                "Smith",
                "Smith"
            ]
        ],
        "FirstName" : [
            [
                "John",
                "John"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}
登录后复制

Looking at the explain on this, we see that the index only scanned the 200,000 documents that matched, so we got a perfect hit.

Beyond Compound Indexes

The compound index is a great solution in the case of a phonebook in which we always know how we are going to be querying our data. Now what if we have an application in which users can arbitrarily query for different fields together? We can’t possibly create a compound index for every possible combination because of the overhead imposed by indexes, as we discussed above, and because MongoDB limits you to 64 indexes per collection. Index intersection can really help.

Imagine the case of a medical application which doctors use to filter through patients. At a high level, omitting several details, a basic schema may look something like the below.

{
      Fname
      LName
      SSN
      Age
      Blood_Type
      Conditions : [] 
      Medications : [ ]
      ...
      ...
}
登录后复制

Some sample searches that a doctor/nurse may run on this system would look something like the below.

Find me a Patient with Blood_Type = O under the age of 50

db.patients.find( {   Blood_Type : “O”,  Age : {   $lt : 50  }     } )
登录后复制
登录后复制

Find me all patients over the age of 60 on Medication X

db.patients.find( { Medications : “X” , Age : { $gt : 60} })
登录后复制

Find me all Diabetic patients on medication Y

db.patients.find( { Conditions : “Diabetes”, Medications : “Y” } )
登录后复制

With all of the unstructured data in modern applications, along with the desire to be able to search for things as needed in an ad-hoc way, it can become very difficult to predict usage patterns. Since we can’t possibly create compound indexes for every combination of fields, because we don’t necessarily know what those will be ahead of time, we can try indexing individual fields to try to salvage some performance. But as shown above in our phone book application, this can lead to performance issues in which we pull documents into memory that are not matches.

To avoid the paging of unnecessary data, the new index intersection feature in 2.6 increases the overall efficiency of these types of ad-hoc queries by processing the indexes involved individually and then intersecting the result set to find the matching documents. This means you only pull the final matching documents into memory and everything else is processed using the indexes. This processing will utilize more CPU, but should greatly reduce the amount of IO done for queries where all of the data is not in memory as well as allow you to utilize your memory more efficiently.

For example, looking at the earlier example:

db.patients.find( {   Blood_Type : “O”,  Age : {   $lt : 50  }     } )
登录后复制
登录后复制

It is inefficient to find all patients with BloodType: O (which could be millions) and then pull into memory each document to find the ones with age

Instead, the query planner finds all patients with bloodType: O using the index on BloodType, and all patients with age

Index intersection allows for much more efficient use of existing RAM so less total memory will usually be required to fit the working set then previously. Also, if you had several compound indices that were made up of different combinations of fields, then you can reduce the total number of indexes on the system. This means storing less indices in memory as well as achieving better insert/update performance since fewer indices must be updated.

As of version 2.6.0, you cannot intersect with geo or text indices and you can intersect at most 2 separate indices with each other. These limitations are likely to change in a future release.

Optimizing Multi-key Indexes It is also possible to intersect an index with itself in the case of multi-key indexes. Consider the below query:

Find me all patients with Diabetes & High Blood Pressure

db.patients.find( {  Conditions : { $all : [ “Diabetes”, “High Blood Pressure” ]  }    }  )
登录后复制

In this case we will find the result set of all Patients with Diabetes, and the result set of all patients with High blood pressure, and intersect the two to get all patients with both. Again, this requires less memory and disk speed for better overall performance. As of the 2.6.0 release, an index can intersect with itself up to 10 times.

Do We Still Need Compound Indexes?

To be clear, compound indexing will ALWAYS be more performant IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.

  • Download MongoDB 2.6 Today
  • Learn about all of the key new features in MongoDB 2.6 by downloading the whitepaper
本站声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn

热AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover

AI Clothes Remover

用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool

Undress AI Tool

免费脱衣服图片

Clothoff.io

Clothoff.io

AI脱衣机

Video Face Swap

Video Face Swap

使用我们完全免费的人工智能换脸工具轻松在任何视频中换脸!

热工具

记事本++7.3.1

记事本++7.3.1

好用且免费的代码编辑器

SublimeText3汉化版

SublimeText3汉化版

中文版,非常好用

禅工作室 13.0.1

禅工作室 13.0.1

功能强大的PHP集成开发环境

Dreamweaver CS6

Dreamweaver CS6

视觉化网页开发工具

SublimeText3 Mac版

SublimeText3 Mac版

神级代码编辑软件(SublimeText3)

热门话题

Java教程
1664
14
CakePHP 教程
1422
52
Laravel 教程
1316
25
PHP教程
1267
29
C# 教程
1239
24
使用 Composer 解决推荐系统的困境:andres-montanez/recommendations-bundle 的实践 使用 Composer 解决推荐系统的困境:andres-montanez/recommendations-bundle 的实践 Apr 18, 2025 am 11:48 AM

在开发一个电商网站时,我遇到了一个棘手的问题:如何为用户提供个性化的商品推荐。最初,我尝试了一些简单的推荐算法,但效果并不理想,用户的满意度也因此受到影响。为了提升推荐系统的精度和效率,我决定采用更专业的解决方案。最终,我通过Composer安装了andres-montanez/recommendations-bundle,这不仅解决了我的问题,还大大提升了推荐系统的性能。可以通过一下地址学习composer:学习地址

如何在Debian上配置MongoDB自动扩容 如何在Debian上配置MongoDB自动扩容 Apr 02, 2025 am 07:36 AM

本文介绍如何在Debian系统上配置MongoDB实现自动扩容,主要步骤包括MongoDB副本集的设置和磁盘空间监控。一、MongoDB安装首先,确保已在Debian系统上安装MongoDB。使用以下命令安装:sudoaptupdatesudoaptinstall-ymongodb-org二、配置MongoDB副本集MongoDB副本集确保高可用性和数据冗余,是实现自动扩容的基础。启动MongoDB服务:sudosystemctlstartmongodsudosys

MongoDB在Debian上的高可用性如何保障 MongoDB在Debian上的高可用性如何保障 Apr 02, 2025 am 07:21 AM

本文介绍如何在Debian系统上构建高可用性的MongoDB数据库。我们将探讨多种方法,确保数据安全和服务持续运行。关键策略:副本集(ReplicaSet):利用副本集实现数据冗余和自动故障转移。当主节点出现故障时,副本集会自动选举新的主节点,保证服务的持续可用性。数据备份与恢复:定期使用mongodump命令进行数据库备份,并制定有效的恢复策略,以应对数据丢失风险。监控与报警:部署监控工具(如Prometheus、Grafana)实时监控MongoDB的运行状态,并

Navicat查看MongoDB数据库密码的方法 Navicat查看MongoDB数据库密码的方法 Apr 08, 2025 pm 09:39 PM

直接通过 Navicat 查看 MongoDB 密码是不可能的,因为它以哈希值形式存储。取回丢失密码的方法:1. 重置密码;2. 检查配置文件(可能包含哈希值);3. 检查代码(可能硬编码密码)。

CentOS MongoDB备份策略是什么 CentOS MongoDB备份策略是什么 Apr 14, 2025 pm 04:51 PM

CentOS系统下MongoDB高效备份策略详解本文将详细介绍在CentOS系统上实施MongoDB备份的多种策略,以确保数据安全和业务连续性。我们将涵盖手动备份、定时备份、自动化脚本备份以及Docker容器环境下的备份方法,并提供备份文件管理的最佳实践。手动备份:利用mongodump命令进行手动全量备份,例如:mongodump-hlocalhost:27017-u用户名-p密码-d数据库名称-o/备份目录此命令会将指定数据库的数据及元数据导出到指定的备份目录。

CentOS上GitLab的数据库如何选择 CentOS上GitLab的数据库如何选择 Apr 14, 2025 pm 04:48 PM

CentOS系统上GitLab数据库部署指南选择合适的数据库是成功部署GitLab的关键步骤。GitLab兼容多种数据库,包括MySQL、PostgreSQL和MongoDB。本文将详细介绍如何选择并配置这些数据库。数据库选择建议MySQL:一款广泛应用的关系型数据库管理系统(RDBMS),性能稳定,适用于大多数GitLab部署场景。PostgreSQL:功能强大的开源RDBMS,支持复杂查询和高级特性,适合处理大型数据集。MongoDB:流行的NoSQL数据库,擅长处理海

Debian MongoDB如何进行数据加密 Debian MongoDB如何进行数据加密 Apr 12, 2025 pm 08:03 PM

在Debian系统上为MongoDB数据库加密,需要遵循以下步骤:第一步:安装MongoDB首先,确保您的Debian系统已安装MongoDB。如果没有,请参考MongoDB官方文档进行安装:https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/第二步:生成加密密钥文件创建一个包含加密密钥的文件,并设置正确的权限:ddif=/dev/urandomof=/etc/mongodb-keyfilebs=512

mongodb怎么设置用户 mongodb怎么设置用户 Apr 12, 2025 am 08:51 AM

要设置 MongoDB 用户,请按照以下步骤操作:1. 连接到服务器并创建管理员用户。2. 创建要授予用户访问权限的数据库。3. 使用 createUser 命令创建用户并指定其角色和数据库访问权限。4. 使用 getUsers 命令检查创建的用户。5. 可选地设置其他权限或授予用户对特定集合的权限。

See all articles