A brief analysis of HyperLogLog for Redis data type learning-Redis-php.cn

Table of Contents

HyperLogLog Algorithm

PFADD

PFCOUNT

PFMERGE

Business Scenario

Home

Database

Redis

A brief analysis of HyperLogLog for Redis data type learning

青灯夜游

Jan 21, 2022 am 10:00 AM

hyperloglog redis type of data

This article will take you to understand the HyperLogLog in the Redis data type, which is usually used to count the number of unique elements in a collection. I hope it will be helpful to you!

A brief analysis of HyperLogLog for Redis data type learning

Today is Friday, you are happily fishing, and the product manager sends you a requirements document via email. The demand is probably: the company needs to count the website's daily visitor IPs, and this statistics is a long-term behavior, ranging from a few months to a few years.

After reading the requirements, you will feel that this is so easy. You can easily implement this function using the collection type of Redis: generate a collection type key every day, use SADD to store the daily visitor IP, and use the SCARD command to easily Get the number of visitor IPs per day.

You quickly finished typing the code and passed the test, and the function was online. After going online and running for a period of time, you will find that the server where Redis is located starts to alarm. The reason is that the memory usage of some keys is too large. You took a look and found that these keys are all set keys that store visitor IPs. Only then did you pat your head, knowing that you had dug a big hole for yourself.

Assume that storing an IP address in IPv4 format requires up to 15 bytes and that the website has up to 1 million visitors per day. These set keys will use 0.45 GB of memory per month and 5.4 GB of memory per year. This is only an estimate of the IPv4 format. If the IPv6 format will occupy more memory. Although the time complexity of SADD and SCARD is O(1), their memory consumption is unacceptable.

You browsed the official website of Redis and found that Redis also provides a data type HyperLogLog, which can not only meet the needs of the product but also occupy less memory. [Related recommendations: Redis Video Tutorial]

HyperLogLog Algorithm

HyperLogLog is a probabilistic algorithm created specifically for calculating the cardinality of a set. The approximate cardinality of a given set can be calculated.

The approximate cardinality is not the actual cardinality of the set. It may be a little smaller or larger than the actual cardinality, but the error between the estimated cardinality and the actual cardinality will be within a reasonable range. For those who do not require Very accurate statistics can be achieved using the HyperLogLog algorithm.

The advantage of HyperLogLog is that the memory required for calculating the approximate cardinality does not change due to the size of the set. No matter how many elements the set contains, the memory required for HyperLogLog to calculate is always fixed, and are very few.

Each HyperLogLog type of Redis only needs to use 12KB of memory space to count nearly: 2⁶⁴ elements, and the standard error of the algorithm is only 0.81%.

If you use the HyperLogLog type to implement the above functions, if there are 1 million visitors per day, it will only occupy 360KB of memory in one month.

PFADD

The PFADD command can be used to count one or more given set elements.

PFADD key element [element...]

Depending on whether the given element has been counted, the PFADD command may return 0 or 1:

If all the given elements have been counted, the PFADD command will return 0, indicating that the approximate cardinality calculated by HyperLogLog has not changed.
The PFADD command will return 1 if the approximate cardinality calculated by HyperLogLog changes due to the presence of at least one element in a given element that has not been previously counted.

For example:

redis> PFADD letters a b c -- 第一次添加
(integer) 1
redis> PFADD letters a     -- 第二次添加
(integer) 0

Copy after login

It is also possible if you only specify the key without specifying the element when calling this command. If the key exists, no operation will be performed. If it does not exist, A data structure will be created (returns 1).

PFCOUNT

The PFCOUNT command can be used to obtain the approximate cardinality calculated by HyperLogLog for the collection. If the given key does not exist, 0 will be returned.

PFCOUNT key [key...]

For example:

redis> PFCOUNT letters
(integer) 3

Copy after login

When multiple HyperLogLogs are passed to PFCOUNT, the PFCOUNT command will first The union of all HyperLogLogs is then returned and the approximate cardinality is returned.

redis> PFADD letters1 a b c
(integer) 1
redis> PFADD letters2 c d e
(integer) 1
redis> PFCOUNT letters1 letters2
(integer) 5

Copy after login

PFMERGE

The PFMERGE command can perform a union calculation on multiple HyperLogLogs, and then save the calculated union HyperLogLog to the specified key.

PFMERGE destKey sourceKey [sourceKey...]

If the specified key already exists, the PFMERGE command will overwrite the existing key.

redis> PFADD letters1 a b c
(integer) 1
redis> PFADD letters2 c d e
(integer) 1
redis> PFMERGE res letters1 letters2
OK
redis> PFCOUNT res
(integer) 5

Copy after login

You can see that the PFMERGE and PFCOUNT commands are very similar. In fact, the PFCOUNT command performs the following operations when calculating the approximate base of multiple HyperLogLogs:

Internally called The PFMERGE command calculates the union of all given HyperLogLogs and stores the union into a temporary HyperLogLog.
Execute the PFCOUNT command on the temporary HyperLogLog to get its approximate cardinality.
Delete the temporary HyperLogLog.
Return the resulting approximate base.

When the program needs to call the PFCOUNT command on multiple HyperLogLogs, and this call may be repeated multiple times, you can consider replacing this call with the corresponding PFMERGE command call: by combining the The calculation results are stored in the specified HyperLogLog instead of recalculating the union every time, and the program can minimize unnecessary union calculations.

Business Scenario

HyperLogLog’s features are very suitable for: counting (monthly, annual statistics), deduplication (spam SMS detection) and other scenarios.

For more programming-related knowledge, please visit: Introduction to Programming! !

The above is the detailed content of A brief analysis of HyperLogLog for Redis data type learning. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7719

Java Tutorial

1641

CakePHP Tutorial

1396

Laravel Tutorial

1289

PHP Tutorial

1233

Related knowledge

How to build the redis cluster mode Apr 10, 2025 pm 10:15 PM

Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

How to clear redis data Apr 10, 2025 pm 10:06 PM

How to clear Redis data: Use the FLUSHALL command to clear all key values. Use the FLUSHDB command to clear the key value of the currently selected database. Use SELECT to switch databases, and then use FLUSHDB to clear multiple databases. Use the DEL command to delete a specific key. Use the redis-cli tool to clear the data.

How to read redis queue Apr 10, 2025 pm 10:12 PM

To read a queue from Redis, you need to get the queue name, read the elements using the LPOP command, and process the empty queue. The specific steps are as follows: Get the queue name: name it with the prefix of "queue:" such as "queue:my-queue". Use the LPOP command: Eject the element from the head of the queue and return its value, such as LPOP queue:my-queue. Processing empty queues: If the queue is empty, LPOP returns nil, and you can check whether the queue exists before reading the element.

How to use the redis command Apr 10, 2025 pm 08:45 PM

Using the Redis directive requires the following steps: Open the Redis client. Enter the command (verb key value). Provides the required parameters (varies from instruction to instruction). Press Enter to execute the command. Redis returns a response indicating the result of the operation (usually OK or -ERR).

How to use redis lock Apr 10, 2025 pm 08:39 PM

Using Redis to lock operations requires obtaining the lock through the SETNX command, and then using the EXPIRE command to set the expiration time. The specific steps are: (1) Use the SETNX command to try to set a key-value pair; (2) Use the EXPIRE command to set the expiration time for the lock; (3) Use the DEL command to delete the lock when the lock is no longer needed.

How to configure Lua script execution time in centos redis Apr 14, 2025 pm 02:12 PM

On CentOS systems, you can limit the execution time of Lua scripts by modifying Redis configuration files or using Redis commands to prevent malicious scripts from consuming too much resources. Method 1: Modify the Redis configuration file and locate the Redis configuration file: The Redis configuration file is usually located in /etc/redis/redis.conf. Edit configuration file: Open the configuration file using a text editor (such as vi or nano): sudovi/etc/redis/redis.conf Set the Lua script execution time limit: Add or modify the following lines in the configuration file to set the maximum execution time of the Lua script (unit: milliseconds)

How to use the redis command line Apr 10, 2025 pm 10:18 PM

Use the Redis command line tool (redis-cli) to manage and operate Redis through the following steps: Connect to the server, specify the address and port. Send commands to the server using the command name and parameters. Use the HELP command to view help information for a specific command. Use the QUIT command to exit the command line tool.

How to optimize the performance of debian readdir Apr 13, 2025 am 08:48 AM

In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information

See all articles