Redis Advanced Learning High Availability Sentinel (Summary Sharing)
This article brings you relevant knowledge about high-availability sentinels in Redis, including issues related to function architecture, deployment and configuration. I hope it will be helpful to everyone.
Recommended learning: Redis video tutorial
1. Function and structure
1. Function
Before introducing Sentinel, let’s first review the technologies related to Redis’ high availability from a macro perspective. They include: persistence, replication, sentinel and cluster. Their main functions and problems solved are:
- Persistence: Persistence is the simplest high-availability method (sometimes not even classified as high-availability available means), its main function is data backup, that is, storing data on the hard disk to ensure that the data will not be lost due to process exit.
- Replication: Replication is the foundation of high-availability Redis. Both Sentinel and Cluster achieve high availability based on replication. Replication mainly implements multi-machine backup of data, as well as load balancing and simple fault recovery for read operations. Disadvantages: Failure recovery cannot be automated; write operations cannot be load balanced; storage capacity is limited by a single machine.
- Sentinel: Based on replication, Sentinel implements automated fault recovery. Disadvantages: Write operations cannot be load balanced; storage capacity is limited by a single machine.
- Cluster: Through clustering, Redis solves the problem that write operations cannot be load balanced and storage capacity is limited by a single machine, and implements a relatively complete high-availability solution.
Let’s talk about the sentry.
Redis Sentinel, Redis Sentinel, was introduced in Redis version 2.8. The core function of Sentinel is automatic failover of the master node. The following is the description of the Sentinel function in the Redis official document:
- Monitoring: The Sentinel will continuously check whether the master node and slave nodes are operating normally.
- Automatic failover: When the master node fails to work properly, Sentinel will start an automatic failover operation. It will upgrade one of the slave nodes of the failed master node to the new master node, and let the other The slave node is changed to replicate the new master node.
- Configuration provider: During initialization, the client obtains the master node address of the current Redis service by connecting to the sentinel.
- Notification: Sentinel can send the failover results to the client.
Among them, the monitoring and automatic failover functions allow Sentinel to detect master node failures in time and complete the transfer; while the configuration provider and notification functions need to be reflected in the interaction with the client.
Here is an explanation of the usage of the word "client" in the article: In the previous article, as long as the redis server is accessed through the API, it will be called the client, including redis-cli and Java client. Jedis, etc.; in order to facilitate the distinction and explanation, the client in this article does not include redis-cli, but is more complex than redis-cli: redis-cli uses the underlying interface provided by redis, while the client uses these interfaces and functions Encapsulated to take full advantage of Sentinel's configuration providers and notification capabilities.
2. Architecture
The typical sentinel architecture diagram is as follows:
It consists of two parts, the sentinel node and the data node :
- Sentinel node: The sentinel system consists of one or more sentinel nodes. Sentinel nodes are special redis nodes that do not store data.
- Data node: The master node and the slave node are both data nodes.
2. Deployment
This part will deploy a simple sentinel system, including 1 master node, 2 slave nodes and 3 sentinel nodes. For convenience: all these nodes are deployed on one machine (LAN IP: 192.168.92.128), distinguished by port numbers; the configuration of the nodes is simplified as much as possible.
1. Deploy master-slave nodes
The master-slave nodes in the Sentinel system are configured the same as ordinary master-slave nodes and do not require any additional configuration. The following are the configuration files of the master node (port=6379) and the two slave nodes (port=6380/6381). The configurations are relatively simple and will not be described in detail.
#redis-6379.conf port 6379 daemonize yes logfile "6379.log" dbfilename "dump-6379.rdb" #redis-6380.conf port 6380 daemonize yes logfile "6380.log" dbfilename "dump-6380.rdb" slaveof 192.168.92.128 6379 #redis-6381.conf port 6381 daemonize yes logfile "6381.log" dbfilename "dump-6381.rdb" slaveof 192.168.92.128 6379
redis-server redis-6379.conf redis-server redis-6380.conf redis-server redis-6381.conf
After the node starts, connect to the master node to check whether the master-slave status is normal. After the configuration is completed, start the master node and slave node in sequence:
2. Deploy the sentinel node
The sentinel node is essentially a special Redis node.
The configurations of the three sentinel nodes are almost exactly the same. The main difference is the port number (26379/26380/26381). The following uses the 26379 node as an example to introduce the configuration and startup method of the node; the configuration part is as far as possible Simplified, more configuration will be introduced later.
#sentinel-26379.conf port 26379 daemonize yes logfile "26379.log" sentinel monitor mymaster 192.168.92.128 6379 2
哨兵节点的启动有两种方式,二者作用是完全相同的:其中,sentinel monitor mymaster 192.168.92.128 6379 2 配置的含义是:该哨兵节点监控192.168.92.128:6379这个主节点,该主节点的名称是mymaster,最后的2的含义与主节点的故障判定有关:至少需要2个哨兵节点同意,才能判定主节点故障并进行故障转移。
redis-sentinel sentinel-26379.conf redis-server sentinel-26379.conf --sentinel
3. 总结
按照上述方式配置和启动之后,整个哨兵系统就启动完毕了。可以通过redis-cli连接哨兵节点进行验证
哨兵系统的搭建过程,有几点需要注意:
(1)哨兵系统中的主从节点,与普通的主从节点并没有什么区别,故障发现和转移是由哨兵来控制和完成的。
(2)哨兵节点本质上是redis节点。
(3)每个哨兵节点,只需要配置监控主节点,便可以自动发现其他的哨兵节点和从节点。
(4)在哨兵节点启动和故障转移阶段,各个节点的配置文件会被重写(config rewrite)。
三、客户端访问哨兵系统
上一小节演示了哨兵的两大作用:监控和自动故障转移,本小节则结合客户端演示哨兵的另外两个作用:配置提供者和通知。
1. 代码示例
在介绍客户端的原理之前,先以Java客户端Jedis为例,演示一下使用方法:下面代码可以连接我们刚刚搭建的哨兵系统,并进行各种读写操作(代码中只演示如何连接哨兵,异常处理、资源关闭等未考虑)。
public static void testSentinel() throws Exception { String masterName = "mymaster"; Set<String> sentinels = new HashSet<>(); sentinels.add("192.168.92.128:26379"); sentinels.add("192.168.92.128:26380"); sentinels.add("192.168.92.128:26381"); JedisSentinelPool pool = new JedisSentinelPool(masterName, sentinels); //初始化过程做了很多工作 Jedis jedis = pool.getResource(); jedis.set("key1", "value1"); pool.close(); }
Jedis客户端对哨兵提供了很好的支持。如上述代码所示,我们只需要向Jedis提供哨兵节点集合和masterName,构造JedisSentinelPool对象;然后便可以像使用普通redis连接池一样来使用了:通过pool.getResource()获取连接,执行具体的命令。2. 客户端原理
在整个过程中,我们的代码不需要显式的指定主节点的地址,就可以连接到主节点;代码中对故障转移没有任何体现,就可以在哨兵完成故障转移后自动的切换主节点。之所以可以做到这一点,是因为在JedisSentinelPool的构造器中,进行了相关的工作;主要包括以下两点:
(1)遍历哨兵节点,获取主节点信息:遍历哨兵节点,通过其中一个哨兵节点+masterName获得主节点的信息;该功能是通过调用哨兵节点的sentinel get-master-addr-by-name命令实现,该命令示例如下:
一旦获得主节点信息,停止遍历(因此一般来说遍历到第一个哨兵节点,循环就停止了)。
(2)增加对哨兵的监听:这样当发生故障转移时,客户端便可以收到哨兵的通知,从而完成主节点的切换。具体做法是:利用redis提供的发布订阅功能,为每一个哨兵节点开启一个单独的线程,订阅哨兵节点的+switch-master频道,当收到消息时,重新初始化连接池。
3. 总结
通过客户端原理的介绍,可以加深对哨兵功能的理解:
(1)配置提供者:客户端可以通过哨兵节点+masterName获取主节点信息,在这里哨兵起到的作用就是配置提供者。
需要注意的是,哨兵只是配置提供者,而不是代理。二者的区别在于:如果是配置提供者,客户端在通过哨兵获得主节点信息后,会直接建立到主节点的连接,后续的请求(如set/get)会直接发向主节点;如果是代理,客户端的每一次请求都会发向哨兵,哨兵再通过主节点处理请求。
举一个例子可以很好的理解哨兵的作用是配置提供者,而不是代理。在前面部署的哨兵系统中,将哨兵节点的配置文件进行如下修改:
sentinel monitor mymaster 192.168.92.128 6379 2 改为 sentinel monitor mymaster 127.0.0.1 6379 2
(2)通知:哨兵节点在故障转移完成后,会将新的主节点信息发送给客户端,以便客户端及时切换主节点。然后,将前述客户端代码在局域网的另外一台机器上运行,会发现客户端无法连接主节点;这是因为哨兵作为配置提供者,客户端通过它查询到主节点的地址为127.0.0.1:6379,客户端会向127.0.0.1:6379建立redis连接,自然无法连接。如果哨兵是代理,这个问题就不会出现了。
四、基本原理
前面介绍了哨兵部署、使用的基本方法,本部分介绍哨兵实现的基本原理。
1. Commands supported by sentinel nodes
As a redis node running in a special mode, the sentinel node supports commands that are different from ordinary redis nodes. In operation and maintenance, we can query or modify the Sentinel system through these commands; but more importantly, in order for the Sentinel system to implement various functions such as fault discovery and failover, it is inseparable from the communication between the Sentinel nodes, and the communication is very Most of this is achieved through commands supported by sentinel nodes. The following introduces the main commands supported by the sentinel node.
(1) Basic query: Through these commands, you can query the topology, node information, configuration information, etc. of the Sentinel system.
- info sentinel: Get basic information of all monitored master nodes
- sentinel masters: Get detailed information of all monitored master nodes
- sentinel master mymaster: Get Detailed information of the monitored master node mymaster
- sentinel slaves mymaster: Get detailed information of the monitored master node mymaster's slave node
- sentinel sentinels mymaster: Get the monitored master node mymaster's sentinel node Detailed information
- sentinel get-master-addr-by-name mymaster: Get the address information of the monitored master node mymaster, which has been introduced before
- sentinel is-master-down-by-addr : Sentinel nodes can use this command to ask whether the master node is offline, so as to judge whether it is objectively offline
(2) Add/remove monitoring of the master node
sentinel monitor mymaster2 192.168.92.128 16379 2: The function is exactly the same as the sentinel monitor function in the configuration file when deploying the sentinel node, and will not be described in detail
sentinel remove mymaster2: Cancel the monitoring of the master node mymaster2 by the current sentinel node
(3) Forced failover
sentinel failover mymaster: This command can force a failover on mymaster, Even if the current master node is running well; for example, if the machine where the current master node is located is about to be scrapped, you can use the failover command to perform failover in advance.
2. Basic Principles
Regarding the principles of sentry, the key is to understand the following concepts.
(1) Scheduled tasks: Each sentinel node maintains 3 scheduled tasks. The functions of the scheduled tasks are as follows: obtain the latest master-slave structure by sending the info command to the master-slave node; obtain the information of other sentinel nodes through the publish and subscribe function; perform heartbeat detection by sending the ping command to other nodes to determine whether they are offline.
(2) Subjective offline: In the scheduled task of heartbeat detection, if other nodes do not reply for a certain period of time, the sentinel node will subjectively offline them. As the name suggests, subjective offline means that a sentinel node "subjectively" judges offline; the counterpart to subjective offline is objective offline.
(3) Objective offline: After the sentinel node subjectively logs off the master node, it will ask other sentinel nodes about the status of the master node through the sentinel is-master-down-by-addr command; if it is judged When the number of sentinels that go offline on a master node reaches a certain value, the master node will be objectively taken offline.
It should be noted that objective offline is a concept that only exists for the master node; if the slave node and sentinel node fail, there will be no subsequent objective offline after being subjectively offline by the sentinel. line and failover operations.
(4) Elect the leader sentinel node: When the master node is judged to be objectively offline, each sentinel node will negotiate to elect a leader sentinel node, and the leader node will Perform failover operations.
All sentinels monitoring the master node may be elected as the leader. The algorithm used in the election is the Raft algorithm; the basic idea of the Raft algorithm is first come, first served: that is, in one round of election, Sentinel A Send an application to become the leader to B. If B has not agreed to other sentinels, it will agree to A becoming the leader. The specific process of the election will not be described in detail here. Generally speaking, the sentinel selection process is very fast. Whoever completes the objective offline first will generally become the leader.
(5) Failover: The elected leader sentinel starts the failover operation, which can be roughly divided into 3 steps:
- Select a new slave node Master node: The selection principle is to first filter out unhealthy slave nodes; then select the slave node with the highest priority (specified by slave-priority); if the priorities cannot be distinguished, select the slave node with the largest replication offset; If it still cannot be distinguished, select the slave node with the smallest runid.
- Update the master-slave status: Use the slaveof no one command to make the selected slave node the master node; and use the slaveof command to make other nodes its slave nodes.
- Set the offline master node (i.e. 6379) as the slave node of the new master node. When 6379 comes back online, it will become the slave node of the new master node.
5. Configuration and practical suggestions
1. Configuration
The following introduces several configurations related to Sentinel.
(1) sentinel monitor {masterName} {masterIp} {masterPort} {quorum}
sentinel monitor is the core configuration of the sentinel. It has been explained in the previous article when deploying the sentinel node. Among them: masterName specifies the master node name, masterIp and masterPort specify the master node address, and quorum is the sentinel that determines the objective offline of the master node. Quantity threshold: When the number of sentinels that determine that the master node is offline reaches the quorum, the master node will be objectively offline. The recommended value is half the number of sentinels plus 1.
(2) sentinel down-after-milliseconds {masterName} {time}
sentinel down-after-milliseconds is related to the judgment of subjective offline: the sentinel uses the ping command to perform heartbeats on other nodes Detection, if other nodes do not reply after the time configured by down-after-milliseconds, Sentinel will subjectively take them offline. This configuration is valid for the subjective offline determination of master nodes, slave nodes, and sentinel nodes.
The default value of down-after-milliseconds is 30000, which is 30s; it can be adjusted according to different network environments and application requirements: the larger the value, the looser the judgment of subjective offline, the advantage is misjudgment The possibility is small, but the disadvantage is that the time for fault discovery and failover will become longer, and the waiting time of the client will also become longer. For example, if the application has high availability requirements, the value can be appropriately reduced to complete the transfer as soon as possible when a failure occurs; if the network environment is relatively poor, the threshold can be appropriately increased to avoid frequent misjudgments.
(3) sentinel parallel-syncs {masterName} {number}
sentinel parallel-syncs is related to the replication of the slave node after failover: it specifies that each time it is initiated to the new master node The number of slave nodes for replication operations. For example, assume that after the master node switch is completed, 3 slave nodes want to initiate replication to the new master node; if parallel-syncs=1, the slave nodes will start replicating one by one; if parallel-syncs=3, then 3 slave nodes The nodes will start replicating together.
The larger the value of parallel-syncs, the faster it takes for the slave node to complete replication, but the greater the pressure on the network load and hard disk load of the master node; it should be set according to the actual situation. For example, if the load on the master node is low and the slave node has high service availability requirements, you can increase the parallel-syncs value appropriately. The default value for parallel-syncs is 1.
(4) sentinel failover-timeout {masterName} {time}
sentinel failover-timeout is related to the judgment of failover timeout, but this parameter is not used to judge the timeout of the entire failover phase. , but the timeout of several of its sub-stages. For example, if the time for the master node to promote the slave node exceeds timeout, or the time for the slave node to initiate a replication operation to the new master node (excluding the time to copy data) exceeds timeout, it will cause a failover timeout. fail.
The default value of failover-timeout is 180000, which is 180s; if it times out, the value will become twice the original value next time.
(5) In addition to the above parameters, there are some other parameters, such as parameters related to security verification, which will not be introduced here.
2. Practical Suggestions
(1) The number of sentinel nodes should be more than one. On the one hand, it increases the redundancy of sentinel nodes to avoid the sentinel itself becoming a high-availability bottleneck; on the other hand, it reduces the number of sentinel nodes. Offline misjudgment. Furthermore, these different sentinel nodes should be deployed on different physical machines.
(2) The number of sentinel nodes should be an odd number to facilitate the sentinels to make "decisions" through voting: decisions on leader election, decisions on objective offline, etc.
(3) The configuration of each sentinel node should be consistent, including hardware, parameters, etc.; in addition, all nodes should use ntp or similar services to ensure accurate and consistent time.
(4) Sentinel's configuration provider and notification client functions require client support to be implemented, such as Jedis mentioned above; if the library used by the developer does not provide corresponding support, the developer may need Implement it yourself.
(5) When the nodes in the Sentinel system are deployed in docker (or other software that may perform port mapping), special attention should be paid to the fact that port mapping may cause the Sentinel system to fail to work properly, because the work of Sentinel is based on Communication with other nodes, and docker's port mapping may cause Sentinel to be unable to connect to other nodes. For example, the discovery of each other by sentinels depends on the IP and port they declare to the outside world. If a sentinel A is deployed in a docker with port mapping, other sentinels cannot connect to A using the port declared by A.
6. Summary
This article first introduces the role of Sentinel: monitoring, failover, configuration provider and notification; then it describes the deployment method of Sentinel system and accessing Sentinel system through client method; then briefly explains the basic principles of sentinel implementation; and finally gives some suggestions on sentinel practice.
Based on master-slave replication, Sentinel introduces automatic failover of the master node, further improving the high availability of Redis; however, the defect of Sentinel is also obvious: Sentinel cannot automatically failover the slave node. In the read-write separation scenario, failure of the slave node will cause the read service to be unavailable, requiring us to perform additional monitoring and switching operations on the slave node.
In addition, Sentinel still has not solved the problem that write operations cannot be load balanced and storage capacity is limited by a single machine; the solution to these problems requires the use of clusters, which I will introduce in a later article. Welcome to pay attention.
Recommended learning: "Redis Video Tutorial", "2022 Latest Redis Interview Questions and Answers"
The above is the detailed content of Redis Advanced Learning High Availability Sentinel (Summary Sharing). For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

How to clear Redis data: Use the FLUSHALL command to clear all key values. Use the FLUSHDB command to clear the key value of the currently selected database. Use SELECT to switch databases, and then use FLUSHDB to clear multiple databases. Use the DEL command to delete a specific key. Use the redis-cli tool to clear the data.

To read a queue from Redis, you need to get the queue name, read the elements using the LPOP command, and process the empty queue. The specific steps are as follows: Get the queue name: name it with the prefix of "queue:" such as "queue:my-queue". Use the LPOP command: Eject the element from the head of the queue and return its value, such as LPOP queue:my-queue. Processing empty queues: If the queue is empty, LPOP returns nil, and you can check whether the queue exists before reading the element.

Using Redis to lock operations requires obtaining the lock through the SETNX command, and then using the EXPIRE command to set the expiration time. The specific steps are: (1) Use the SETNX command to try to set a key-value pair; (2) Use the EXPIRE command to set the expiration time for the lock; (3) Use the DEL command to delete the lock when the lock is no longer needed.

Using the Redis directive requires the following steps: Open the Redis client. Enter the command (verb key value). Provides the required parameters (varies from instruction to instruction). Press Enter to execute the command. Redis returns a response indicating the result of the operation (usually OK or -ERR).

The best way to understand Redis source code is to go step by step: get familiar with the basics of Redis. Select a specific module or function as the starting point. Start with the entry point of the module or function and view the code line by line. View the code through the function call chain. Be familiar with the underlying data structures used by Redis. Identify the algorithm used by Redis.

On CentOS systems, you can limit the execution time of Lua scripts by modifying Redis configuration files or using Redis commands to prevent malicious scripts from consuming too much resources. Method 1: Modify the Redis configuration file and locate the Redis configuration file: The Redis configuration file is usually located in /etc/redis/redis.conf. Edit configuration file: Open the configuration file using a text editor (such as vi or nano): sudovi/etc/redis/redis.conf Set the Lua script execution time limit: Add or modify the following lines in the configuration file to set the maximum execution time of the Lua script (unit: milliseconds)

Use the Redis command line tool (redis-cli) to manage and operate Redis through the following steps: Connect to the server, specify the address and port. Send commands to the server using the command name and parameters. Use the HELP command to view help information for a specific command. Use the QUIT command to exit the command line tool.
