How to use the HyperLogLog data type in Redis-Redis-php.cn

Table of Contents

1. Principle of HyperLogLog

2. Usage steps:

Home

Database

Redis

How to use the HyperLogLog data type in Redis

PHPz

May 29, 2023 am 09:29 AM

redis hyperloglog

1. Principle of HyperLogLog

Redis HyperLogLog uses a probability algorithm, the HyperLogLog algorithm, to estimate the cardinality. Using a set of hash functions and a bit array of length m, HyperLogLog is able to estimate the number of unique elements in a set.

In the HyperLogLog algorithm, each element is hashed, and after converting the hash value into binary, each element is scored according to the number of 1's in the binary string prefix. For example, if the hash value of an element is 01110100011, then the number of 1's in the prefix is 3, so in the HyperLogLog algorithm, the score of this element is 3.

After counting the scores of all elements, take the reciprocal of each score (1 / 2^n), then add these reciprocals and take the reciprocal, and you will get a cardinality estimate, which is HyperLogLog The estimation results of the algorithm.

The HyperLogLog algorithm trades off the size of the length m of the bit array, compromising the memory occupied by the data structure and the accuracy of the estimated value (i.e., the estimated error), and obtains the result between the space occupied by the data and the smaller degree of error. perfect balance.

In short, the core idea of the HyperLogLog algorithm is based on hash functions and bit operations. By converting the hash value into a bit stream and counting the number of leading 0s, it can quickly estimate the unique value in a large data set. quantity. Using the hyperloglog algorithm, we are able to quickly identify duplicate web pages in very large datasets.

2. Usage steps:

Redis HyperLogLog is a data structure that can be used to estimate the number of elements in a collection. It can maintain massive amounts of data by using very little memory. It is more accurate than conventional estimation algorithms and very fast when processing large amounts of data.

A simple example, we can use HyperLogLog to calculate the number of independent IPs visiting the website. Specifically, you can follow the following steps:

First create a HyperLogLog data structure: PFADD hll:unique_ips 127.0.0.1
Add the ip for each access to the unique_ips data structure: PFADD hll:unique_ips 192.168.1.1
Get an approximation of the number of elements in the calculated collection: PFCOUNT hll:unique_ips
##You can pass multiple HyperLogLog structures (such as by day or hour) to get a more accurate count.

It should be noted that although HyperLogLog can save a lot of memory, it is an estimation algorithm and the error range is not completely accurate. You should pay attention to its scope of application when using it in practice.

3. Example of using page views to implement request IP deduplication

How to use the HyperLogLog data type in Redis

4.Using Jedis client

　1. Add dependencies, Introduce jedis dependency:

<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
    <version>3.6.0</version>
</dependency>

Copy after login

2. Create a Jedis object:

Jedis jedis = new Jedis("localhost");

Copy after login

3. Add elements to the HyperLogLog data structure:

jedis.pfadd("hll:unique_ips", "127.0.0.1");

Copy after login

4. Get the number of elements in the collection Approximate value:

Long count = jedis.pfcount("hll:unique_ips");
System.out.println(count);

Copy after login

5. A more accurate count can be obtained by merging multiple HyperLogLog structures. In Jedis, you can use the

PFMERGE command to merge the HyperLogLog data structure:

jedis.pfmerge("hll:unique_ips", "hll:unique_ips1", "hll:unique_ips2", "hll:unique_ips3");

Copy after login

5. Redission uses dependencies

　1. Create a RedissonClient object

Config config = new Config();
config.useSingleServer().setAddress("redis://localhost:6379");
RedissonClient redisson = Redisson.create(config);

Copy after login

　2 .Create RHyperLogLog object

RHyperLogLog<String> uniqueIps = redisson.getHyperLogLog("hll:unique_ips");

Copy after login

　3.Add elements

uniqueIps.add("127.0.0.1");

Copy after login

　4.Get approximate quantity

long approximateCount = uniqueIps.count();
System.out.println(approximateCount);

Copy after login

　5.Merge multiple HyperLogLog objects

RHyperLogLog<String> uniqueIps1 = redisson.getHyperLogLog("hll:unique_ips1");
RHyperLogLog<String> uniqueIps2 = redisson.getHyperLogLog("hll:unique_ips2");
uniqueIps.mergeWith(uniqueIps1, uniqueIps2);

Copy after login

6 .What features and methods does HyperLogLog provide?

Features:

The accuracy is low, but it takes up very little memory.
Supports inserting new elements without double counting.
Provides instructions to optimize memory usage and counting accuracy. For example, PFADD, PFCOUNT, PFMERGE and other instructions.
Be able to estimate the number of different elements in a data set, that is, the cardinality of the set.
Supports merging operations on multiple HyperLogLog objects to obtain an approximation of the total cardinality of these collections.

Commonly used methods in HyperLogLog:

PFADD key element [element ...]: Add one or more elements to the HyperLogLog structure.
PFCOUNT key [key ...]: Get the cardinality estimate of one or more HyperLogLog structures.
PFMERGE destkey sourcekey [sourcekey ...]: Merge one or more HyperLogLog structures into a target structure.
PFSELFTEST [numtests]: Test HyperLogLog valuation performance and accuracy (only for Redis4.0 version)

It should be noted that, Although HyperLogLog can save a lot of memory, it is still an estimation algorithm, the error range is not completely accurate, and it has a certain computational cost. Depending on the actual application, you need to consider whether to use HyperLogLog or other data structures to estimate the number of elements.

7. Summary of usage scenarios:

The main function of Redis using HyperLogLog is to perform deduplication counting in the case of large data streams (view, IP, city).

Specifically, the following are some scenarios where Redis HyperLogLog is used for deduplication and counting:

Count Page Views - In web applications, HyperLogLog can be used to count how many unique visitors there are for each page. Use HyperLogLog technology to calculate the average number of visits to this page across different time periods.
HyperLogLog has significant utility in analyzing the number of users in big data collections. A probability-based data structure is particularly effective when dealing with data sets such as unique user IDs. HyperLogLog only saves a limited number of hash values after hashing and is able to deduce the size of the data set.
Count advertising clicks - For advertising analysis on a website or application, HyperLogLog can be used to capture the number of effective clicks, that is, the number of distinct or unique clicks.

The above is the detailed content of How to use the HyperLogLog data type in Redis. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Nordhold: Fusion System, Explained

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1664

CakePHP Tutorial

1423

Laravel Tutorial

1321

PHP Tutorial

1269

C# Tutorial

1249

Related knowledge

How to build the redis cluster mode Apr 10, 2025 pm 10:15 PM

Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

How to clear redis data Apr 10, 2025 pm 10:06 PM

How to clear Redis data: Use the FLUSHALL command to clear all key values. Use the FLUSHDB command to clear the key value of the currently selected database. Use SELECT to switch databases, and then use FLUSHDB to clear multiple databases. Use the DEL command to delete a specific key. Use the redis-cli tool to clear the data.

How to read redis queue Apr 10, 2025 pm 10:12 PM

To read a queue from Redis, you need to get the queue name, read the elements using the LPOP command, and process the empty queue. The specific steps are as follows: Get the queue name: name it with the prefix of "queue:" such as "queue:my-queue". Use the LPOP command: Eject the element from the head of the queue and return its value, such as LPOP queue:my-queue. Processing empty queues: If the queue is empty, LPOP returns nil, and you can check whether the queue exists before reading the element.

How to configure Lua script execution time in centos redis Apr 14, 2025 pm 02:12 PM

On CentOS systems, you can limit the execution time of Lua scripts by modifying Redis configuration files or using Redis commands to prevent malicious scripts from consuming too much resources. Method 1: Modify the Redis configuration file and locate the Redis configuration file: The Redis configuration file is usually located in /etc/redis/redis.conf. Edit configuration file: Open the configuration file using a text editor (such as vi or nano): sudovi/etc/redis/redis.conf Set the Lua script execution time limit: Add or modify the following lines in the configuration file to set the maximum execution time of the Lua script (unit: milliseconds)

How to use the redis command line Apr 10, 2025 pm 10:18 PM

Use the Redis command line tool (redis-cli) to manage and operate Redis through the following steps: Connect to the server, specify the address and port. Send commands to the server using the command name and parameters. Use the HELP command to view help information for a specific command. Use the QUIT command to exit the command line tool.

How to implement redis counter Apr 10, 2025 pm 10:21 PM

Redis counter is a mechanism that uses Redis key-value pair storage to implement counting operations, including the following steps: creating counter keys, increasing counts, decreasing counts, resetting counts, and obtaining counts. The advantages of Redis counters include fast speed, high concurrency, durability and simplicity and ease of use. It can be used in scenarios such as user access counting, real-time metric tracking, game scores and rankings, and order processing counting.

How to set the redis expiration policy Apr 10, 2025 pm 10:03 PM

There are two types of Redis data expiration strategies: periodic deletion: periodic scan to delete the expired key, which can be set through expired-time-cap-remove-count and expired-time-cap-remove-delay parameters. Lazy Deletion: Check for deletion expired keys only when keys are read or written. They can be set through lazyfree-lazy-eviction, lazyfree-lazy-expire, lazyfree-lazy-user-del parameters.

How to optimize the performance of debian readdir Apr 13, 2025 am 08:48 AM

In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information

See all articles