Table of Contents
1. The set based on set
2. Bit based on bit
3. Based on HyperLogLog
4. Based on bloomfilter
Home Database Redis How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates

How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates

Nov 09, 2021 am 10:03 AM
redis Remove duplicates

How to remove duplicates in Redis? The following article will introduce to you 4 methods of Redis deduplication. I hope it will be helpful to you!

How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates

This article mainly introduces the sharing of three methods for realizing unique counting in Redis. This article explains three methods based on SET, based on bit, and based on HyperLogLog. Friends can refer to

Unique counting is a very common feature in website systems. For example, a website needs to count the number of unique visitors (that is, UV) that visits every day. Counting problems are very common, but they can be very complicated to solve: first, the amount that needs to be counted may be very large, for example, a large site is visited by millions of people every day, and the amount of data is quite large; second, it is usually desirable to expand the dimension of counting. For example, in addition to daily UV, you also want to know weekly or monthly UV, which makes the calculation very complicated. [Related recommendation: Redis Video Tutorial]

In a system stored in a relational database, the method to achieve unique counting is select count(distinct ). It is very simple, but if The amount of data is large, and the execution of this statement is very slow. Another problem with using relational databases is that the performance of inserting data is not high.

Redis is easy to solve this kind of counting problem. It is faster and consumes less resources than relational databases. It even provides 3 different methods.

1. The set based on set

Redis is used to save a unique data collection. Through it, you can quickly determine whether an element exists in the collection, and you can also quickly Counts the number of elements in a set, and can merge sets into a new set. The commands involved are as follows:

Copy the code as follows:

SISMEMBER key member  # 判断 member 是否存在
SADD key member  # 往集合中加入 member
SCARD key   # 获取集合元素个数
Copy after login

The set-based method is simple and effective, with accurate counting, wide application and easy to understand. Its disadvantage is that it consumes a lot of resources (of course Much less than a relational database), if the number of elements is large (such as hundreds of millions), the memory consumption is terrible.

2. Bit based on bit

Redis can be used to implement counting that is more highly compressed than set memory. It uses a bit 1 or 0 to store whether an element is Information exists. For example, for the count of unique visitors to a website, user_id can be used as the offset of the bit. Set to 1 to indicate access. Using 1 MB of space can store the one-day access count of more than 8 million users. The commands involved are as follows: Copy the code as follows:

SETBIT key offset value  # 设置位信息
GETBIT key offset        # 获取位信息
BITCOUNT key [start end] # 计数
BITOP operation destkey key [key ...]  # 位图合并
Copy after login

The bit-based method consumes much less space than the set method, but it requires that the elements can be simply mapped to bit offsets, and the applicable scope is much narrower. In addition, it consumes a lot of space. Depends on the maximum offset, regardless of the count value. If the maximum offset is large, the memory consumption is also considerable.

3. Based on HyperLogLog

It is difficult to achieve accurate unique counting of extremely large amounts of data, but if it is just an approximation, there are many efficient algorithms in computing science , among which HyperLogLog Counting is a very famous algorithm. It can only use about 12 k of memory to achieve hundreds of millions of unique counts, and the error is controlled at about one percent. The commands involved are as follows: Copy the code as follows:

PFADD key element [element ...]  # 加入元素
PFCOUNT key [key ...]   # 计数
Copy after login

This counting method is really amazing. It involves some uniform distribution, random probability, Bernoulli distribution, etc. in statistics. I have not completely understood it. I am interested. You can delve into relevant articles.

The three unique counting methods provided by redis each have their own advantages and disadvantages, and can fully meet the counting requirements in different situations.

4. Based on bloomfilter

BloomFilter uses a data structure similar to a bitmap or a bit set to store data, and uses a bit array to concisely represent a set. And it can quickly determine whether an element already exists in this collection. Although BloomFilter is not 100% accurate, the error rate can be reduced by adjusting parameters, the number of Hash functions used, and the size of the bit array. This adjustment can completely reduce the error rate to close to 0. It can meet most scenarios.

If there is a set S = {x1, x2, … xn}, Bloom Filter uses k independent hash functions to map each element in the set to {1,…,m}. range. For any element, the number mapped to is used as the index of the corresponding bit array, and the bit will be set to 1. For example, element x1 is mapped to the number 8 by the hash function, then the 8th bit of the bit array will be set to 1. In the figure below, the set S has only two elements x and y, which are mapped by three hash functions respectively. The mapped positions are (0, 3, 6) and (4, 7, 10) respectively, and the corresponding bits will be set. is 1:

How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates

#Now if you want to determine whether another element is in this set, you only need to be mapped by these three hash functions to see if there is 0 in the corresponding position. Existence, if any, means that this element definitely does not exist in this set, otherwise it might exist.

Redis needs to install plug-ins to use Bloom filters: https://blog.csdn.net/u013030276/article/details/88350641.

For more programming-related knowledge, please visit: Introduction to Programming! !

The above is the detailed content of How to remove duplicates in Redis? A brief analysis of 4 methods to remove duplicates. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to build the redis cluster mode How to build the redis cluster mode Apr 10, 2025 pm 10:15 PM

Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

How to read redis queue How to read redis queue Apr 10, 2025 pm 10:12 PM

To read a queue from Redis, you need to get the queue name, read the elements using the LPOP command, and process the empty queue. The specific steps are as follows: Get the queue name: name it with the prefix of "queue:" such as "queue:my-queue". Use the LPOP command: Eject the element from the head of the queue and return its value, such as LPOP queue:my-queue. Processing empty queues: If the queue is empty, LPOP returns nil, and you can check whether the queue exists before reading the element.

How to clear redis data How to clear redis data Apr 10, 2025 pm 10:06 PM

How to clear Redis data: Use the FLUSHALL command to clear all key values. Use the FLUSHDB command to clear the key value of the currently selected database. Use SELECT to switch databases, and then use FLUSHDB to clear multiple databases. Use the DEL command to delete a specific key. Use the redis-cli tool to clear the data.

How to configure Lua script execution time in centos redis How to configure Lua script execution time in centos redis Apr 14, 2025 pm 02:12 PM

On CentOS systems, you can limit the execution time of Lua scripts by modifying Redis configuration files or using Redis commands to prevent malicious scripts from consuming too much resources. Method 1: Modify the Redis configuration file and locate the Redis configuration file: The Redis configuration file is usually located in /etc/redis/redis.conf. Edit configuration file: Open the configuration file using a text editor (such as vi or nano): sudovi/etc/redis/redis.conf Set the Lua script execution time limit: Add or modify the following lines in the configuration file to set the maximum execution time of the Lua script (unit: milliseconds)

How to use the redis command line How to use the redis command line Apr 10, 2025 pm 10:18 PM

Use the Redis command line tool (redis-cli) to manage and operate Redis through the following steps: Connect to the server, specify the address and port. Send commands to the server using the command name and parameters. Use the HELP command to view help information for a specific command. Use the QUIT command to exit the command line tool.

How to set the redis expiration policy How to set the redis expiration policy Apr 10, 2025 pm 10:03 PM

There are two types of Redis data expiration strategies: periodic deletion: periodic scan to delete the expired key, which can be set through expired-time-cap-remove-count and expired-time-cap-remove-delay parameters. Lazy Deletion: Check for deletion expired keys only when keys are read or written. They can be set through lazyfree-lazy-eviction, lazyfree-lazy-expire, lazyfree-lazy-user-del parameters.

How to implement redis counter How to implement redis counter Apr 10, 2025 pm 10:21 PM

Redis counter is a mechanism that uses Redis key-value pair storage to implement counting operations, including the following steps: creating counter keys, increasing counts, decreasing counts, resetting counts, and obtaining counts. The advantages of Redis counters include fast speed, high concurrency, durability and simplicity and ease of use. It can be used in scenarios such as user access counting, real-time metric tracking, game scores and rankings, and order processing counting.

How to optimize the performance of debian readdir How to optimize the performance of debian readdir Apr 13, 2025 am 08:48 AM

In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information

See all articles