Real-time Profiling a MongoDB Cluster
by A. Jesse Jiryu Davis, Python Evangelist at 10gen In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations
by A. Jesse Jiryu Davis, Python Evangelist at 10gen
In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations among its servers, you can predict how your application will scale as you add shards and add members to shards.
Operations are routed according to the type of operation, your shard key, and your read preference. Let’s set up a cluster and use the system profiler to see where each operation is run. This is an interactive, experimental way to learn how your cluster really behaves and how your architecture will scale.
Setup
You’ll need a recent install of MongoDB (I’m using 2.4.4), Python, a recent version of PyMongo (at least 2.4—I’m using 2.5.2) and the code in my cluster-profile repository on GitHub. If you install the Colorama Python package you’ll get cute colored output. These scripts were tested on my Mac.
Sharded cluster of replica sets
Run the cluster_setup.py
script in my repository. It sets up a standard sharded cluster for you running on your local machine. There’s a mongos
, three config servers, and two shards, each of which is a three-member replica set. The first shard’s replica set is running on ports 4000 through 4002, the second shard is on ports 5000 through 5002, and the three config servers are on ports 6000 through 6002:
For the finale, cluster_setup.py
makes a collection named sharded_collection
, sharded on a key named shard_key
.
In a normal deployment, we’d let MongoDB’s balancer automatically distribute chunks of data among our two shards. But for this demo we want documents to be on predictable shards, so my script disables the balancer. It makes a chunk for all documents with shard_key
less than 500 and another chunk for documents with shard_key
greater than or equal to 500. It moves the high chunk to replset_1
:
client = MongoClient() # Connect to mongos. admin = client.admin # admin database.
Pre-split.
admin.command( 'split', 'test.sharded_collection', middle={'shard_key': 500}) admin.command( 'moveChunk', 'test.sharded_collection', find={'shard_key': 500}, to='replset_1')
If you connect to mongos
with the MongoDB shell, sh.status()
shows there’s one chunk on each of the two shards:
{ "shard_key" : { "$minKey" : 1 } } -->> { "shard_key" : 500 } on : replset_0 { "t" : 2, "i" : 1 } { "shard_key" : 500 } -->> { "shard_key" : { "$maxKey" : 1 } } on : replset_1 { "t" : 2, "i" : 0 }
The setup script also inserts a document with a shard_key
of 0 and another with a shard_key
of 500. Now we’re ready for some profiling.
Profiling
Run the tail_profile.py
script from my repository. It connects to all the replica set members. On each, it sets the profiling level to 2 (“log everything”) on the test
database, and creates a tailable cursor on the system.profile
collection. The script filters out some noise in the profile collection—for example, the activities of the tailable cursor show up in the system.profile
collection that it’s tailing. Any legitimate entries in the profile are spat out to the console in pretty colors.
Experiments
Targeted queries versus scatter-gather
Let’s run a query from Python in a separate terminal:
>>> from pymongo import MongoClient >>> # Connect to mongos. >>> collection = MongoClient().test.sharded_collection >>> collection.find_one({'shard_key': 0}) {'_id': ObjectId('51bb6f1cca1ce958c89b348a'), 'shard_key': 0}
tail_profile.py
prints:
replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}
The query includes the shard key, so mongos
reads from the shard that can satisfy it. Adding shards can scale out your throughput on a query like this. What about a query that doesn’t contain the shard key?:
>>> collection.find_one({})
mongos
sends the query to both shards:
replset_0 primary on 4000: query test.sharded_collection {“shard_key”: 0}
replset_1 primary on 5000: query test.sharded_collection {“shard_key”: 500}
For fan-out queries like this, adding more shards won’t scale out your query throughput as well as it would for targeted queries, because every shard has to process every query. But we can scale throughput on queries like these by reading from secondaries.
Queries with read preferences
We can use read preferences to read from secondaries:
>>> from pymongo.read_preferences import ReadPreference >>> collection.find_one({}, read_preference=ReadPreference.SECONDARY)
tail_profile.py
shows us that mongos
chose a random secondary from each shard:
replset_0 secondary on 4001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}
replset_1 secondary on 5001: query test.sharded_collection {“$readPreference”: {“mode”: “secondary”}, “$query”: {}}
Note how PyMongo passes the read preference to mongos
in the query, as the $readPreference
field. mongos
targets one secondary in each of the two replica sets.
Updates
With a sharded collection, updates must either include the shard key or be “multi-updates”. An update with the shard key goes to the proper shard, of course:
>>> collection.update({'shard_key': -100}, {'$set': {'field': 'value'}})
replset_0 primary on 4000: update test.sharded_collection {“shard_key”: -100}
mongos
only sends the update to replset_0
, because we put the chunk of documents with shard_key
less than 500 there.
A multi-update hits all shards:
>>> collection.update({}, {'$set': {'field': 'value'}}, multi=True)
replset_0 primary on 4000: update test.sharded_collection {}
replset_1 primary on 5000: update test.sharded_collection {}
A multi-update on a range of the shard key need only involve the proper shard:
>>> collection.update({'shard_key': {'$gt': 1000}}, {'$set': {'field': 'value'}}, multi=True)
replset_1 primary on 5000: update test.sharded_collection {“shard_key”: {“$gt”: 1000}}
So targeted updates that include the shard key can be scaled out by adding shards. Even multi-updates can be scaled out if they include a range of the shard key, but multi-updates without the shard key won’t benefit from extra shards.
Commands
In version 2.4, mongos
can use secondaries not only for queries, but also for some commands. You can run count
on secondaries if you pass the right read preference:
>>> cursor = collection.find(read_preference=ReadPreference.SECONDARY) >>> cursor.count()
replset_0 secondary on 4001: command count: sharded_collection
replset_1 secondary on 5001: command count: sharded_collection
Whereas findAndModify
, since it modifies data, is run on the primaries no matter your read preference:
>>> db = MongoClient().test >>> test.command( ... 'findAndModify', ... 'sharded_collection', ... query={'shard_key': -1}, ... remove=True, ... read_preference=ReadPreference.SECONDARY)
replset_0 primary on 4000: command findAndModify: sharded_collection
Go Forth And Scale
To scale a sharded cluster, you should understand how operations are distributed: are they scatter-gather, or targeted to one shard? Do they run on primaries or secondaries? If you set up a cluster and test your queries interactively like we did here, you can see how your cluster behaves in practice, and design your application for future growth.
Read Jesse’s blog, Emptysquare and follow him on Github
原文地址:Real-time Profiling a MongoDB Cluster, 感谢原作者分享。

热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

Video Face Swap
使用我们完全免费的人工智能换脸工具轻松在任何视频中换脸!

热门文章

热工具

记事本++7.3.1
好用且免费的代码编辑器

SublimeText3汉化版
中文版,非常好用

禅工作室 13.0.1
功能强大的PHP集成开发环境

Dreamweaver CS6
视觉化网页开发工具

SublimeText3 Mac版
神级代码编辑软件(SublimeText3)

在开发一个电商网站时,我遇到了一个棘手的问题:如何为用户提供个性化的商品推荐。最初,我尝试了一些简单的推荐算法,但效果并不理想,用户的满意度也因此受到影响。为了提升推荐系统的精度和效率,我决定采用更专业的解决方案。最终,我通过Composer安装了andres-montanez/recommendations-bundle,这不仅解决了我的问题,还大大提升了推荐系统的性能。可以通过一下地址学习composer:学习地址

直接通过 Navicat 查看 MongoDB 密码是不可能的,因为它以哈希值形式存储。取回丢失密码的方法:1. 重置密码;2. 检查配置文件(可能包含哈希值);3. 检查代码(可能硬编码密码)。

CentOS系统上GitLab数据库部署指南选择合适的数据库是成功部署GitLab的关键步骤。GitLab兼容多种数据库,包括MySQL、PostgreSQL和MongoDB。本文将详细介绍如何选择并配置这些数据库。数据库选择建议MySQL:一款广泛应用的关系型数据库管理系统(RDBMS),性能稳定,适用于大多数GitLab部署场景。PostgreSQL:功能强大的开源RDBMS,支持复杂查询和高级特性,适合处理大型数据集。MongoDB:流行的NoSQL数据库,擅长处理海

CentOS系统下MongoDB高效备份策略详解本文将详细介绍在CentOS系统上实施MongoDB备份的多种策略,以确保数据安全和业务连续性。我们将涵盖手动备份、定时备份、自动化脚本备份以及Docker容器环境下的备份方法,并提供备份文件管理的最佳实践。手动备份:利用mongodump命令进行手动全量备份,例如:mongodump-hlocalhost:27017-u用户名-p密码-d数据库名称-o/备份目录此命令会将指定数据库的数据及元数据导出到指定的备份目录。

MongoDB与关系型数据库:深度对比本文将深入探讨NoSQL数据库MongoDB与传统关系型数据库(如MySQL和SQLServer)的差异。关系型数据库采用行和列的表格结构组织数据,而MongoDB则使用灵活的面向文档模型,更适应现代应用的需求。主要区别数据结构:关系型数据库使用预定义模式的表格存储数据,表间关系通过主键和外键建立;MongoDB使用类似JSON的BSON文档存储在集合中,每个文档结构可独立变化,实现无模式设计。架构设计:关系型数据库需要预先定义固定的模式;MongoDB支持

要设置 MongoDB 用户,请按照以下步骤操作:1. 连接到服务器并创建管理员用户。2. 创建要授予用户访问权限的数据库。3. 使用 createUser 命令创建用户并指定其角色和数据库访问权限。4. 使用 getUsers 命令检查创建的用户。5. 可选地设置其他权限或授予用户对特定集合的权限。

在Debian系统上为MongoDB数据库加密,需要遵循以下步骤:第一步:安装MongoDB首先,确保您的Debian系统已安装MongoDB。如果没有,请参考MongoDB官方文档进行安装:https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/第二步:生成加密密钥文件创建一个包含加密密钥的文件,并设置正确的权限:ddif=/dev/urandomof=/etc/mongodb-keyfilebs=512

连接MongoDB的工具主要有:1. MongoDB Shell,适用于快速查看数据和执行简单操作;2. 编程语言驱动程序(如PyMongo, MongoDB Java Driver, MongoDB Node.js Driver),适合应用开发,但需掌握其使用方法;3. GUI工具(如Robo 3T, Compass),提供图形化界面,方便初学者和快速数据查看。选择工具需考虑应用场景和技术栈,并注意连接字符串配置、权限管理及性能优化,如使用连接池和索引。
