Redis Cluster and limiting divergences.-mysql教程-PHP中文网

首页

数据库

mysql教程

Redis Cluster and limiting divergences.

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:35 PM

and cluster div redis

Redis Cluster is finally on its road to reach the first stable release in a short timeframe as already discussed in the Redis google group [1]. However despite a design never proposed for the implementation of Redis Cluster was analyzed and discussed at long in the past weeks (unfortunately creating some confusion: many people, including notable personalities of the NoSQL movement, confused the analyzed proposal with Redis Cluster implementation), no attempt was made to analyze or categorize Redis Cluster itself.

I believe that putting in perspective the simple ideas that Redis Cluster implements, in a more formal way, is an interesting exercise for the following reason: Redis Cluster is not a design that tries to achieve “AP” or “CP” of the CAP theorem, since for its goals CAP Availability and CAP Consistency are too hard goals to reach without sacrificing other practical qualities.

Once a design does not try to maximize what is theoretically possible, the design space becomes much larger, and the implementation main goal is to try to provide some Availability and some reasonable form of Consistency, in the face of other conflicting design requirements like asynchronous replication of data.

The goal of this article is to reply to the following question: how Redis Cluster, as an asynchronous system, tries to limit divergences between nodes?

Nodes divergence
===

One of the main problems with asynchronous systems is that a master accepting requests never actually knows in a given moment if it is still the authoritative master or a stale one. For example imagine a cluster of three nodes A, B, C, with one replica each, A1, B1, C1. When a master is partitioned away with some client, A1 may be elected in the other side as the new master, but A is not able, for every request processed, to verify with the other nodes if the request should be accepted or not. At best node A can get asynchronous acknowledges from replicas.

Every time this happens, two parallel time lines for the same set of data is created, one in A, and one in A1. In Redis Cluster there is no way to merge data as explained in [2], so you can imagine the merge function between A and A1 data set when the partition heals as just picking one time line between all the time lines created (two in this case).

Another case that creates different time lines is the replication process itself.

A master A may have three replicas A1, A2, A3. Because of the very concept of asynchronous replication, each of the slaves may represent a point in time in the past history of the time line of A. Usually since replication in Redis is very fast and has minimal delay, as data is transmitted to slaves at the same time as the reply is transmitted to the writing client, the time “delta” between A and the slaves is small. However slaves may be lagging for some reason, so it is possible that, for example, A1 is one second in the past.

Asynchronously replicated nodes can’t really avoid diverging, however there is no need for the divergence between replicas to be unbound. Actually systems allowing replicas to diverge in an uncontrolled way may be hard to use in practice, even when for use case does not requiring strong consistency, that is the target of Redis Cluster.

Putting bounds to divergence
===

Redis cluster uses a few heuristics in order to limit divergence of nodes. The algorithms employed are very simple, consisting merely on the following four rules that nodes follow.

1) When a master is isolated from the majority for node-timeout (that is the user configured time after a non responding node is considered to be failing by the failure detection algorithm), it stops accepting queries from clients. It is easy to see how this helps in practice: in the majority side of the cluster, no slave is able to get elected and replace the master before node-timeout has elapsed. However, after node-timeout time has elapsed, the isolated master knows that it is possible that a parallel history was created in the other side of the cluster, and this history will win over its history, so it stops accepting data that will be otherwise lost. This means that if a master is partitioned away together with one or more clients, the window for data loss is node-timeout, that is usually in the order of 500 — 1000 milliseconds. If the partition heals before node-timeout, no data loss happens as the master rejoins the cluster as master.

2) A related problem is, when a master is failing, how to pick the “best” history among the available histories for the same data (depending on the number of slaves). An algorithm is used in order to give an advantage to the slave with the most updated replication offset to be elected, that is, the slave that likely has the most recent data compared to the master that went down. However if the best slave fails to be elected the other slaves will try an election as well.

3) If you think at “2” you’ll see that actually this means that the divergence between a master and its slaves is unbound. A slave can be lagging for hours in theory. So actually there is another heuristic in use, consisting on slaves to don’t try to get elected at all if the last time they received data from the master is too much in the past. This maximum time is currently set to ten times node-timeout, however it will be user-configurable in the stable release of Redis Cluster. While usually the lag between master and slaves is in the sub-millisecond figure, this time limit ensures that the worst case scenario is the following:

- A slave is stopped/unavailable for just a little less than ten times node-timeout.
- Its master fails.
- At the same time the slave returns back available.

Rejoins went wrong
===

There is another failure mode that was worth covering in Redis Cluster as it is in some way a different instance of the same problems covered so far. What happens if a master rejoins the cluster after it was already failed over, and there are still clients with non updated configuration writing to it?

This may happens in two main ways: rejoining the majority after a partition, and restarting the process. The failure mode is conceptually the same, the master is not able to get a synchronous acknowledge from other replicas about every write, and the other nodes may take a few milliseconds before being able to reconfigure a node that just rejoined (the node is usually reconfigured with an UPDATE message as soon as it is detected to have a stale configuration: this usually happens immediately once the rejoining instance pings another node or sends a pong in reply to a ping).

Rejoins are handled with another heuristic:

4) When a node rejoins the majority, or is restarted, it waits a small time (but yet a few order of magnitudes bigger than the usual network latency), before accepting writes again, in order to maximize the probability to get reconfigured before accepting writes from clients with stale information.

History re-play
===

Some of the Redis data structures and operations are commutative. Obvious examples are INCR and SADD, the order of operations does not matter and eventually the Set or the counter will have the same exact value as long as all the operations are executed.

Because of this observation, and since Redis instances get asynchronous acknowledges from slaves about how much data was processed, it is possible for partitioned masters to remember commands sent by clients that are still not acknowledged by all the replicas.

This trick is able to improve data safety in a way similar to AP systems, but while in AP systems merging values is used, Redis Cluster would reply commands from clients instead.

I proposed this idea in a blog post [3] some time ago, however this is a practical example of what would happen implementing it in a next version of Redis Cluster:

- A master gets partitioned away with clients.
- Clients write to the master for node-timeout time.
- The master starts returning errors.
- In the majority side, a slave is elected as the new master.
- When the old master rejoins it is reconfigured as a replica of the new master.

So far this is the description of what happens currently. With node replying what would happen is that all the writes not acknowledged, from the time the partition is created, to the time the master starts to reply with errors, are accumulated. When the partition heals, as part of turning the old master into a slave, the old master would connect with the new master, and re-play the accumulated stream of commands.

How all this is tested?
===

Some time ago I wrote about what I use in order to test Redis Cluster [4]. The most valuable tool I found so far is a simple consistency-test that is part of the redis-rb-cluster project [5] (a Ruby Redis Cluster client). Basically stress testing the system is as simple as keeping the consistency test running, while simulating different partitions, restarts, and other failures in the cluster.

This test was so useful and is so simple to run, that I’m actually starting to think that everybody running a distributed database in production, whatever it is Cassandra or Zookeeper or Redis or whatever else, should keep a similar test running against the production system as a way to monitor what is happening.

Such tests are lightweight to run, they can be made to just set a few keys per second. However they can easily detect issues with the implementation or other unexpected consistency issues. Especially systems merging data that are at the same time always available, such as AP systems, have the tendency to “mask” bugs: it takes some luck to discover some consistency leak in real data sets. With a simple consistency test running instead it is possible to monitor how the system is behaving.

The following for example is the output of constency-test.rb running against my testing environment:

27523967 R (7187 err) | 27523968 W (7186 err) | 12 noack |

So I know that I read/wrote 27 million of times during my testing, and 12 writes that received no acknowledge were actually materialized inside the database.

Notes:

[1] https://groups.google.com/d/msg/redis-db/2laQRKBKkYg/ssaiQLhasNkJ
[2] http://antirez.com/news/67
[3] http://antirez.com/news/68
[4] http://antirez.com/news/69
[5] https://github.com/antirez/redis-rb-cluster Comments

本站声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

热AI工具

Undresser.AI Undress

人工智能驱动的应用程序，用于创建逼真的裸体照片

AI Clothes Remover

用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool

免费脱衣服图片

Clothoff.io

AI脱衣机

Video Face Swap

使用我们完全免费的人工智能换脸工具轻松在任何视频中换脸！

显示更多

热工具

记事本++7.3.1

好用且免费的代码编辑器

SublimeText3汉化版

中文版，非常好用

禅工作室 13.0.1

功能强大的PHP集成开发环境

Dreamweaver CS6

视觉化网页开发工具

SublimeText3 Mac版

神级代码编辑软件(SublimeText3)

显示更多

热门话题

Java教程

1675

CakePHP 教程

1429

Laravel 教程

1333

PHP教程

1278

C# 教程

1257

显示更多

Related knowledge

redis集群模式怎么搭建 Apr 10, 2025 pm 10:15 PM

Redis集群模式通过分片将Redis实例部署到多个服务器，提高可扩展性和可用性。搭建步骤如下：创建奇数个Redis实例，端口不同；创建3个sentinel实例，监控Redis实例并进行故障转移；配置sentinel配置文件，添加监控Redis实例信息和故障转移设置；配置Redis实例配置文件，启用集群模式并指定集群信息文件路径；创建nodes.conf文件，包含各Redis实例的信息；启动集群，执行create命令创建集群并指定副本数量；登录集群执行CLUSTER INFO命令验证集群状态；使

redis数据怎么清空 Apr 10, 2025 pm 10:06 PM

如何清空 Redis 数据：使用 FLUSHALL 命令清除所有键值。使用 FLUSHDB 命令清除当前选定数据库的键值。使用 SELECT 切换数据库，再使用 FLUSHDB 清除多个数据库。使用 DEL 命令删除特定键。使用 redis-cli 工具清空数据。

redis怎么读取队列 Apr 10, 2025 pm 10:12 PM

要从 Redis 读取队列，需要获取队列名称、使用 LPOP 命令读取元素，并处理空队列。具体步骤如下：获取队列名称：以 "queue:" 前缀命名，如 "queue:my-queue"。使用 LPOP 命令：从队列头部弹出元素并返回其值，如 LPOP queue:my-queue。处理空队列：如果队列为空，LPOP 返回 nil，可先检查队列是否存在再读取元素。

centos redis如何配置Lua脚本执行时间 Apr 14, 2025 pm 02:12 PM

在CentOS系统上，您可以通过修改Redis配置文件或使用Redis命令来限制Lua脚本的执行时间，从而防止恶意脚本占用过多资源。方法一：修改Redis配置文件定位Redis配置文件:Redis配置文件通常位于/etc/redis/redis.conf。编辑配置文件:使用文本编辑器（例如vi或nano）打开配置文件：sudovi/etc/redis/redis.conf设置Lua脚本执行时间限制:在配置文件中添加或修改以下行，设置Lua脚本的最大执行时间（单位：毫秒）

redis命令行怎么用 Apr 10, 2025 pm 10:18 PM

使用 Redis 命令行工具 (redis-cli) 可通过以下步骤管理和操作 Redis：连接到服务器，指定地址和端口。使用命令名称和参数向服务器发送命令。使用 HELP 命令查看特定命令的帮助信息。使用 QUIT 命令退出命令行工具。

redis计数器怎么实现 Apr 10, 2025 pm 10:21 PM

Redis计数器是一种使用Redis键值对存储来实现计数操作的机制，包含以下步骤：创建计数器键、增加计数、减少计数、重置计数和获取计数。Redis计数器的优势包括速度快、高并发、持久性和简单易用。它可用于用户访问计数、实时指标跟踪、游戏分数和排名以及订单处理计数等场景。

redis过期策略怎么设置 Apr 10, 2025 pm 10:03 PM

Redis数据过期策略有两种：定期删除：定期扫描删除过期键，可通过 expired-time-cap-remove-count、expired-time-cap-remove-delay 参数设置。惰性删除：仅在读取或写入键时检查删除过期键，可通过 lazyfree-lazy-eviction、lazyfree-lazy-expire、lazyfree-lazy-user-del 参数设置。

如何优化debian readdir的性能 Apr 13, 2025 am 08:48 AM

在Debian系统中，readdir系统调用用于读取目录内容。如果其性能表现不佳，可尝试以下优化策略：精简目录文件数量:尽可能将大型目录拆分成多个小型目录，降低每次readdir调用处理的项目数量。启用目录内容缓存:构建缓存机制，定期或在目录内容变更时更新缓存，减少对readdir的频繁调用。内存缓存（如Memcached或Redis）或本地缓存（如文件或数据库）均可考虑。采用高效数据结构:如果自行实现目录遍历，选择更高效的数据结构（例如哈希表而非线性搜索）存储和访问目录信

See all articles

Redis Cluster and limiting divergences.

热AI工具

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

热门文章

热工具

记事本++7.3.1

SublimeText3汉化版

禅工作室 13.0.1

Dreamweaver CS6

SublimeText3 Mac版

热门话题