


Analyze hot key storage issues in Redis and talk about solutions to cache exceptions
This article will talk about three common cache anomalies in Redis: cache penetration, cache breakdown and cache avalanche. Through them, we will talk about the hot key storage issues in Redis. I hope to be helpful!
Related recommendations: "Analysis of Redis cache consistency, cache penetration, cache breakdown and cache avalanche issues"
Cache penetration, cache breakdown and cache avalanche are issues that often need to be considered during Redis interviews and actual development. Many people are still not clear about the origin, cause and solution of this problem. In fact, we can find a good solution for these three situations by carefully analyzing the generated principle. [Related recommendations: Redis Video Tutorial]
This article helps you quickly understand these three problems through definitions, cases, hazards, and solutions.
I believe you have also seen many solutions to these three problems on the Internet. Are some of these solutions a correct
solution? This article will also analyze the advantages and disadvantages of such solutions one by one.
The picture below shows the content outline of this article. The article also analyzes and summarizes these points.
Comparison of the three
Cache penetration, cache breakdown and Cache avalanche is caused by the fact that the data in the cache does not exist, causing the database to be used to query the data.
Since the cached data does not exist, all requests will go to the database, which will cause excessive pressure on the database or even cause a service crash, making the entire system unusable.
Cache penetration
Definition: Cache penetration is due to the fact that the data requested by the client does not exist in the cache , and then query the database. However, the database does not have the data that the client wants to query, causing every request to go through a database query operation. The real problem is that the data itself does not exist
.
Example: When the client requests product details, it carries a product ID. At this time, the product ID does not exist (either in the cache or in the database). As a result, every time the data of the product with this ID is requested, it will go to the database.
Hazards: Since the data corresponding to the requested parameters does not exist at all, the database will be requested every time, increasing the pressure on the database or the service crashing, and even affecting other business modules. It often occurs when the user makes a malicious request
.
Solution:
1. Cache a null value based on the requested parameters. And set an expiration time for this value, you can set the time to be shorter.
2. Use the Bloom filter. First filter through the Bloom filter. If it exists in the filter, query the database and then add it to the cache. If it does not exist, it will directly return that the client data does not exist.
3. Since cache penetration may be a malicious request initiated by the user, the user IP can be recorded and malicious IP requests can be blocked.
Scheme analysis:
The first scheme will cache an empty value for a non-existent key. Suppose there are too many such requests, will they all set a cache of null values one by one? At this time, there will be a large number of invalid cache null values in Redis. Assume that such a key is the ID of a product or article. After we set the null value, if data is added in the background, we should update the cache value corresponding to the ID and set a reasonable expiration time.
The second solution is also the most commonly used solution in the industry. The advantage of Bloom filter is that it is based on Redis implementation, memory operation and the underlying implementation is also very memory-saving. When data is added successfully in the background, the ID of the data is added to the Bloom filter, and the front end first goes through the Bloom filter to verify whether it exists when making a request. But Bloom filters also have a drawback, which is the problem of hash conflict. What does the hash conflict here mean? That is to say, when multiple IDs are hashed, the hash bits obtained are the same value, which leads to misjudgment when verifying their existence. There is something in itself, but there is nothing in the result.
One of the drawbacks of the Bloom filter is that it doesn't necessarily exist if it says it exists, and it means it doesn't exist if it says it doesn't.
The third option is to initiate a large number of requests for the same user within a period of time, triggering the cache penetration mechanism. At this time, we can display the client's access. However, if an attacker launches a DDOS attack, he cannot completely avoid such attacks, so this solution is not a good solution.
Summary of the plan:
We first add the third solution at the request level, creating a current limiting mechanism and IP blacklist mechanism to control some malicious requests. If it is a misjudgment, we can implement operations such as IP unblocking. . The cache layer is implemented using the first solution. Set a reasonable cache time.
For business scenarios that can tolerate misjudgments, you can directly use the second solution. Completely based on Redis, reducing system complexity.
Cache breakdown
Definition: Cache breakdown occurs because a certain hot key does not exist, causing the database to be accessed Inquire. Increased pressure on the database. This pressure may be momentary or longer-lasting. The real problem is that the key exists, but does not exist in the cache, resulting in database operations
.
For example: There is a popular product. When users view product details, they carry the product ID to obtain product details. At this time, the data in the cache has expired, so all incoming requests must go to the database to query.
Hazards: Compared with cache penetration, the data exists in the database, but because the cache has expired, the database has to be accessed once, and then added to the cache, the next request can proceed normally. cache. The so-called harm is also aimed at the database level.
Solution:
1. Add a mutex lock. For the first request, it was found that there was no data in the cache. At this time, the query database was added to the cache. In this way, subsequent requests do not need to go through database queries.
2. Increase the business logic expiration time. When setting up the cache, we can add a cache expiration time. Every time you read, make a judgment. If the expiration time is less than the current time, trigger a background thread, go to the database to pull the data, and then update the cached data and cached expiration time. In fact, the principle is to extend the cache duration for the cache at the code level.
3. Data warm-up. Implement adding data to the cache through the background. For example, before the flash sale scene starts, the inventory of the product is added to the cache, so that when the user request comes, it will go directly to the cache.
4. Never expires. When setting an expiration time for the cache, make it never expire. A separate thread is opened in the background to maintain the expiration time and data updates of these caches.
Program analysis:
The mutex lock ensures that only one request goes to the database, which is an advantage. However, for distributed systems, distributed locks are used to implement distributed locks. The implementation of distributed locks itself has certain difficulties, which increases the complexity of the system.
The second solution is implemented by using the solution that Redis does not expire and the business expires. This ensures that data can be obtained with every request, and a background thread can also be used to update the data. The disadvantage is that the background thread has not finished updating the data. At this time, the data requested is old data, which may have disadvantages in business scenarios with high real-time requirements.
The third solution is to use cache preheating and cache every time it is loaded, which is similar to the second solution. However, there is also the problem of hot data update, so this solution is suitable for data that does not require high real-time data.
The fourth solution is similar to the second and third solutions. On this basis, certain optimizations have been made, using background asynchronous threads to actively update cached data. The difficulty lies in controlling the frequency of updates.
Scheme summary:
For data with high real-time requirements, it is recommended to use the first scheme, although It is technically difficult but can process data in real time. If some requests wait for a long time, an exception can be returned and the client can resend the request.
For data that does not require high real-time performance, you can use the fourth solution.
Cache avalanche
Definition: The cache breakdown mentioned earlier is due to a certain hot spot in the cache The key fails, causing a large number of requests to go to the database. However, the cache avalanche is actually the same, but this one is more serious. Most of the cached keys are invalid, rather than one or two keys.
Example: In an e-commerce system, the product data under a certain category are invalid in the cache. However, many requests from the current system are for product data under this category. This causes all requests to go through database queries.
Hazards: Due to the influx of a large number of requests in an instant, each request must be queried in the database. The instantaneous influx of traffic into the database seriously increases the burden on the database and can easily lead to direct paralysis of the database.
Solution:
1. The cache time is random. Because a large number of caches expire at a certain time, it means that the cache expiration time is relatively concentrated. We directly set the expiration time to be unfocused and random. In this way, the cache expiration time will not be very concentrated, and there will not be a large number of requests to the database for query operations at the same time.
2. Multi-level cache. Instead of simply relying on Redis for caching, we can also use memcached for caching (here is just an example, other caching services can also be used). When caching data, make a cache for Redis and a cache for memcached. If Redis fails, we can use memcached.
3. Mutex lock. In cache breakdown, we mentioned the use of mutex locks, and we can also use it in the case of avalanche.
4. Set the expiration flag. In fact, you can also use the permanent non-expiration mentioned in cache breakdown. When requesting, the expiration time is determined. If the expiration time is approaching, an expiration flag is set and an independent thread is triggered to update the cache.
Scheme analysis:
The first scheme uses random number caching time, which can ensure that the key expiration time is dispersed. The difficulty lies in how to set the cache time. For some data that requires a short cache time and a very large amount of data, this solution requires reasonable control of the time.
The second solution uses multi-level caching to ensure that all requests are cached. However, this increases the architectural difficulty of the system and various other problems, such as caching multi-level updates.
The third option uses mutex locks. We mentioned mutex locks in cache breakdown. Although we can use them in avalanche scenarios, this will produce a large number of distributions. style lock.
The fourth solution uses logical cache time, which well guarantees the cache pressure of the system.
Summary of the plan:
In actual projects, it is recommended to use the 1st, 2nd and 4th options to try them out, which will be better.
Summary
Cache penetration is because the database itself does not have the data.
Cache breakdown and cache avalanche mean that the data exists in the database, but the data in the cache is invalid, causing the database to be queried again and then added to the cache.
Cache breakdown is targeted at some hot keys, while cache avalanche is a large-scale cache failure. The two principles are actually the same, except that the division of cache keys is different.
For more programming-related knowledge, please visit: Introduction to Programming! !
The above is the detailed content of Analyze hot key storage issues in Redis and talk about solutions to cache exceptions. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

How to clear Redis data: Use the FLUSHALL command to clear all key values. Use the FLUSHDB command to clear the key value of the currently selected database. Use SELECT to switch databases, and then use FLUSHDB to clear multiple databases. Use the DEL command to delete a specific key. Use the redis-cli tool to clear the data.

To read a queue from Redis, you need to get the queue name, read the elements using the LPOP command, and process the empty queue. The specific steps are as follows: Get the queue name: name it with the prefix of "queue:" such as "queue:my-queue". Use the LPOP command: Eject the element from the head of the queue and return its value, such as LPOP queue:my-queue. Processing empty queues: If the queue is empty, LPOP returns nil, and you can check whether the queue exists before reading the element.

Using the Redis directive requires the following steps: Open the Redis client. Enter the command (verb key value). Provides the required parameters (varies from instruction to instruction). Press Enter to execute the command. Redis returns a response indicating the result of the operation (usually OK or -ERR).

Using Redis to lock operations requires obtaining the lock through the SETNX command, and then using the EXPIRE command to set the expiration time. The specific steps are: (1) Use the SETNX command to try to set a key-value pair; (2) Use the EXPIRE command to set the expiration time for the lock; (3) Use the DEL command to delete the lock when the lock is no longer needed.

On CentOS systems, you can limit the execution time of Lua scripts by modifying Redis configuration files or using Redis commands to prevent malicious scripts from consuming too much resources. Method 1: Modify the Redis configuration file and locate the Redis configuration file: The Redis configuration file is usually located in /etc/redis/redis.conf. Edit configuration file: Open the configuration file using a text editor (such as vi or nano): sudovi/etc/redis/redis.conf Set the Lua script execution time limit: Add or modify the following lines in the configuration file to set the maximum execution time of the Lua script (unit: milliseconds)

Use the Redis command line tool (redis-cli) to manage and operate Redis through the following steps: Connect to the server, specify the address and port. Send commands to the server using the command name and parameters. Use the HELP command to view help information for a specific command. Use the QUIT command to exit the command line tool.

In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information
