Home Database Mysql Tutorial MongoDB 聚合

MongoDB 聚合

Jun 07, 2016 pm 05:45 PM
mongodb polymerization

MongoDB除了基本的查询功能,还提供了很多强大的聚合工具,其中简单的可计算集合中的文档个数, 复杂的可利用MapReduce做复杂数据分析. 1.count count返回集合中的文档数量 db.refactor.count() 不管集合有多大,都能很快的返回文档数量. 可以传递查询,MongoDB会

MongoDB除了基本的查询功能,还提供了很多强大的聚合工具,其中简单的可计算集合中的文档个数,

复杂的可利用MapReduce做复杂数据分析.

 

1.count

count返回集合中的文档数量

db.refactor.count()

不管集合有多大,都能很快的返回文档数量.

可以传递查询,MongoDB会计算查询结果的数量

db.refactor.count({"username":"refactor"})

但是增加查询条件会使count变慢.

 

2.distinct

distinct用来找出给定键的所有不同值.使用时必须指定集合和键.

如:

db.runCommand({"distinct":"refactor","key":"username"})

 

 3.group

group先选定分组所依据的键,MongoDB将会将集合依据选定键值的不同分成若干组.然后可以通过聚合每一组内的文档,

产生一个结果文档.

如:

db.runCommand(
{
  "group":
  {
    "ns":"refactor",
    "key":{"username":true},
    "initial":{"count":0},
    "$reduce":function(doc,prev)
    {
      prev.count++;
    },
    "condition":{"age":{"$gt":40}}
  }
}
)

   "ns":"refactor",

指定要进行分组的集合
    "key":{"username":true},

指定文档分组的依据,这里是username键,所有username键的值相等的被划分到一组,true为返回键username的值
    "initial":{"count":0},

每一组reduce函数调用的初始个数.每一组的所有成员都会使用这个累加器.
    "$reduce":function(doc,prev){...}

每个文档都对应的调用一次.系统会传递两个参数:当前文档和累加器文档.

"condition":{"age":{"$gt":40}}

这个age的值大于40的条件

 

4.使用完成器

完成器用于精简从数据库传到用户的数据.group命令的输出一定要能放在单个数据库相应中.

"finalize"附带一个函数,在数组结果传递到客户端之前被调用一次.

db.runCommand(
  {
    "group":
    {
      "ns":"refactor",
      "key":{"username":true},
      "initial":{"count":0},
      "$reduce":function(doc,prev)
      {
        prev.count++;
      },
      "finalize":function(doc)
      {
        doc.num=doc.count;
        delete doc.count;
      }
    }
  }
)

finalize能修改传递的参数也能返回新值.

 

5.将数组作为键使用

有些时候分组所依据的条件很复杂,不仅是一个键.比如要使用group计算每个类别有多篇博客文章.由于有很多作者,

给文章分类时可能不规律的使用了大小写.所以,如果要是按类别名来分组,最后"MongoDB"和"mongodb"就是不同的组.

为了消除这种大小写的影响,就要定义一个函数来确定文档所依据的键.

定义分组要用到$keyf

db.runCommand(
 {
  "group":
   {
    "ns":"refactor",
    "$keyf":function(doc){return {"username":doc.username.toLowerCase()}},
    "initial":{"count":0},
    "$reduce":function(doc,prev)
       {
        prev.count++;
       }
   }
 }
)

 

6.MapReduce

count,distinct,group能做的事情MapReduce都能做.它是一个可以轻松并行化到多个服务器的聚合方法.它会

拆分问题,再将各个部分发送到不同机器上,让每台机器完成一部分.当所有机器都完成时候,再把结果汇集起来形成

最终完整的结果.

MapReduce需要几个步骤:

1.映射,将操作映射到集合中的每个文档.这个操作要么什么都不做,要么 产生一个键和n个值.

2.洗牌,按照键分组,并将产生的键值组成列表放到对应键中.

3.化简,把列表中的值 化简 成一个单值,这个值被返回.

4.重新洗牌,直到每个键的列表只有一个值为止,这个值就是最终结果.

MapReduce的速度比group慢,group也很慢.在应用程序中,最好不要用MapReduce,可以在后台运行MapReduce

创建一个保存结果的集合,可以对这个集合进行实时查询.

 

找出集合中的所有键

MongoDB没有模式,所以并不知晓每个文档有多少个键.通常找到集合的所有键的做好方式是用MapReduce.

在映射阶段,想得到文档中的每个键.map函数使用emit 返回要处理的值.emit会给MapReduce一个键和一个值.

这里用emit将文档某个键的记数(count)返回({count:1}).我们为每个键单独记数,所以为文档中的每一个键调用一次emit,

this是当前文档的引用:

map=function(){
  for(var key in this)
  {
    emit(key,{count:1})
  }
};

这样返回了许许多多的{count:1}文档,每一个都与集合中的一个键相关.这种有一个或多个{count:1}文档组成的数组,

会传递给reduce函数.reduce函数有两个参数,一个是key,也就是emit返回的第一个值,另一个参数是数组,由一个或者多个

对应键的{count:1}文档组成.

reduce=function(key,emits){
  total=0;
  for(var i in emits){
    total+=emits[i].count;
  }
  return {count:total};
}

reduce要能被反复被调用,不论是映射环节还是前一个化简环节.reduce返回的文档必须能作为reduce的

第二个参数的一个元素.如x键映射到了3个文档{"count":1,id:1},{"count":1,id:2},{"count":1,id:3}

其中id键用于区别.MongoDB可能这样调用reduce:

>r1=reduce("x",[{"count":1,id:1},{"count":1,id:2}])

{count:2}

>r2=reduce("x",[{"count":1,id:3}])

{count:1}

>reduce("x",[r1,r2])

{count:3}

reduce应该能处理emit文档和其他reduce结果的各种集合.

如:

mr=db.runCommand(
  {
  "mapreduce":"refactor",
  "map":map,
  "reduce":reduce,
  "out":{inline:1}
  }
)

或:

db.refactor.mapReduce(map,reduce,{out:{inline:1}})

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1671
14
PHP Tutorial
1276
29
C# Tutorial
1256
24
Use Composer to solve the dilemma of recommendation systems: andres-montanez/recommendations-bundle Use Composer to solve the dilemma of recommendation systems: andres-montanez/recommendations-bundle Apr 18, 2025 am 11:48 AM

When developing an e-commerce website, I encountered a difficult problem: how to provide users with personalized product recommendations. Initially, I tried some simple recommendation algorithms, but the results were not ideal, and user satisfaction was also affected. In order to improve the accuracy and efficiency of the recommendation system, I decided to adopt a more professional solution. Finally, I installed andres-montanez/recommendations-bundle through Composer, which not only solved my problem, but also greatly improved the performance of the recommendation system. You can learn composer through the following address:

Navicat's method to view MongoDB database password Navicat's method to view MongoDB database password Apr 08, 2025 pm 09:39 PM

It is impossible to view MongoDB password directly through Navicat because it is stored as hash values. How to retrieve lost passwords: 1. Reset passwords; 2. Check configuration files (may contain hash values); 3. Check codes (may hardcode passwords).

How to choose a database for GitLab on CentOS How to choose a database for GitLab on CentOS Apr 14, 2025 pm 04:48 PM

GitLab Database Deployment Guide on CentOS System Selecting the right database is a key step in successfully deploying GitLab. GitLab is compatible with a variety of databases, including MySQL, PostgreSQL, and MongoDB. This article will explain in detail how to select and configure these databases. Database selection recommendation MySQL: a widely used relational database management system (RDBMS), with stable performance and suitable for most GitLab deployment scenarios. PostgreSQL: Powerful open source RDBMS, supports complex queries and advanced features, suitable for handling large data sets. MongoDB: Popular NoSQL database, good at handling sea

What is the CentOS MongoDB backup strategy? What is the CentOS MongoDB backup strategy? Apr 14, 2025 pm 04:51 PM

Detailed explanation of MongoDB efficient backup strategy under CentOS system This article will introduce in detail the various strategies for implementing MongoDB backup on CentOS system to ensure data security and business continuity. We will cover manual backups, timed backups, automated script backups, and backup methods in Docker container environments, and provide best practices for backup file management. Manual backup: Use the mongodump command to perform manual full backup, for example: mongodump-hlocalhost:27017-u username-p password-d database name-o/backup directory This command will export the data and metadata of the specified database to the specified backup directory.

MongoDB and relational database: a comprehensive comparison MongoDB and relational database: a comprehensive comparison Apr 08, 2025 pm 06:30 PM

MongoDB and relational database: In-depth comparison This article will explore in-depth the differences between NoSQL database MongoDB and traditional relational databases (such as MySQL and SQLServer). Relational databases use table structures of rows and columns to organize data, while MongoDB uses flexible document-oriented models to better suit the needs of modern applications. Mainly differentiates data structures: Relational databases use predefined schema tables to store data, and relationships between tables are established through primary keys and foreign keys; MongoDB uses JSON-like BSON documents to store them in a collection, and each document structure can be independently changed to achieve pattern-free design. Architectural design: Relational databases need to pre-defined fixed schema; MongoDB supports

How to set up users in mongodb How to set up users in mongodb Apr 12, 2025 am 08:51 AM

To set up a MongoDB user, follow these steps: 1. Connect to the server and create an administrator user. 2. Create a database to grant users access. 3. Use the createUser command to create a user and specify their role and database access rights. 4. Use the getUsers command to check the created user. 5. Optionally set other permissions or grant users permissions to a specific collection.

How to encrypt data in Debian MongoDB How to encrypt data in Debian MongoDB Apr 12, 2025 pm 08:03 PM

Encrypting MongoDB database on a Debian system requires following the following steps: Step 1: Install MongoDB First, make sure your Debian system has MongoDB installed. If not, please refer to the official MongoDB document for installation: https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/Step 2: Generate the encryption key file Create a file containing the encryption key and set the correct permissions: ddif=/dev/urandomof=/etc/mongodb-keyfilebs=512

What are the tools to connect to mongodb What are the tools to connect to mongodb Apr 12, 2025 am 06:51 AM

The main tools for connecting to MongoDB are: 1. MongoDB Shell, suitable for quickly viewing data and performing simple operations; 2. Programming language drivers (such as PyMongo, MongoDB Java Driver, MongoDB Node.js Driver), suitable for application development, but you need to master the usage methods; 3. GUI tools (such as Robo 3T, Compass) provide a graphical interface for beginners and quick data viewing. When selecting tools, you need to consider application scenarios and technology stacks, and pay attention to connection string configuration, permission management and performance optimization, such as using connection pools and indexes.

See all articles