Table of Contents
" >What is "Bloom filter"
Bloom filters in Redis
Home Database Redis What is a bloom filter? How to use it in Redis?

What is a bloom filter? How to use it in Redis?

Jun 24, 2021 pm 07:10 PM
redis bloom filter

Bloom filter is a magical data structure. This article will give you an in-depth understanding of Bloom filter and introduce the method of using Bloom filter in Redis.

What is a bloom filter? How to use it in Redis?

What is "Bloom filter"

The Bloom filter is a magical data structure,can Used to determine whether an element is in a collection. A very commonly used function is to remove duplicates. A common requirement among crawlers: There are thousands of target website URLs. How to determine whether a crawler has favored a certain URL? To put it simply, every time the crawler collects a URL, it can store the URL in the database. Every time a new URL comes over, it will go to the database to query whether it has been accessed before. [Related recommendations: Redis Video Tutorial]

select id from table where url = 'https://jaychen.cc'
Copy after login

But as the crawler crawls more and more URLs, the database must be accessed once before each request, and for this kind of string SQL query efficiency is not high. In addition to the database, using the set structure of Redis can also meet this requirement, and its performance is better than that of the database. But Redis also has a problem: it consumes too much memory. At this time, the Bloom filter appears very horizontally: let me answer this question.

Compared with databases and Redis, using Bloom filters can effectively avoid performance and memory usage problems.

The Bloom filter is essentially a bit array. A bit array means that each element of the array only occupies 1 bit. Each element can only be 0 or 1. In this way, applying for a bit array of 10000 elements only takes up 10000 / 8 = 1250 B of space. In addition to a bit array, the Bloom filter also has K hash functions. When an element is added to the Bloom filter, the following operations will be performed:

  • Use K hash functions to calculate the element value K times to obtain K hash values.
  • According to the obtained hash value, set the corresponding subscript value to 1 in the bit array.

For example, assume that the Bloom filter has 3 hash functions: f1, f2, f3 and a bit array arr. Now we need to insert https://jaychen.cc into the Bloom filter:

  • Perform three hash calculations on the value to get three values ​​n1, n2, n3.
  • Set the three elements arr[n1], arr[n2], arr[3] in the bit array to 1.

When you want to determine whether a value is in the Bloom filter, perform a hash calculation on the element again. After getting the value, determine whether each element in the bit array is 1. If the values ​​are all 1, then it means that this value is in the Bloom filter. If there is a value that is not 1, it means that the element is not in the Bloom filter.

If you can’t understand the text, please look at the explanation of the soul painter’s picture below

What is a bloom filter? How to use it in Redis?

After reading the above explanation, you will definitely come up with a Problem: When more elements are inserted, the more positions in the bit array are set to 1. When an element is not in the Bloom filter, after hash calculation, the value obtained is queried in the bit array, and there is Perhaps these locations are also set to 1. Such an object that does not exist in the Bloom filter may also be misjudged as being in the Bloom filter. But if the Bloom filter determines that an element is not in the Bloom filter, then this value must not be in the Bloom filter. To put it simply:

  • If the Bloom filter says that a certain element is present, it may be misjudged.
  • The Bloom filter says that an element is not there, then it must not be there.

The defect of this Bloom filter is put into the requirements of the crawler above. There may be some unvisited URLs that may be misjudged as visited, but if they are visited URLs, they must be It will not be mistakenly judged as not visited.

Bloom filters in Redis

redis added the module function in version 4.0. Bloom filters can be added to redis in the form of modules. Therefore, if you use redis 4.0 or above, you can use the bloom filter in redis by loading module. But this is not the simplest way. You can use docker to experience bloom filters directly in redis.

> docker run -d -p 6379:6379 --name bloomfilter redislabs/rebloom
> docker exec -it bloomfilter redis-cli
Copy after login

redis Bloom filter mainly has two commands:

  • bf.add Add elements to the Bloom filter: bf. add urls https://jaychen.cc.
  • bf.exists Determine whether an element is in the filter: bf.exists urls https://jaychen.cc.

As mentioned above, there are misjudgments in the Bloom filter. There are two values ​​​​in redis that determine the accuracy of the Bloom filter:

  • error_rate:允许布隆过滤器的错误率,这个值越低过滤器的位数组的大小越大,占用空间也就越大。
  • initial_size:布隆过滤器可以储存的元素个数,当实际存储的元素个数超过这个值之后,过滤器的准确率会下降。

redis 中有一个命令可以来设置这两个值:

bf.reserve urls 0.01 100
Copy after login

三个参数的含义:

  • 第一个值是过滤器的名字。
  • 第二个值为 error_rate 的值。
  • 第三个值为 initial_size 的值。

使用这个命令要注意一点:执行这个命令之前过滤器的名字应该不存在,如果执行之前就存在会报错:(error) ERR item exists

更多编程相关知识,请访问:编程入门!!

The above is the detailed content of What is a bloom filter? How to use it in Redis?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1269
29
C# Tutorial
1249
24
How to build the redis cluster mode How to build the redis cluster mode Apr 10, 2025 pm 10:15 PM

Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

How to clear redis data How to clear redis data Apr 10, 2025 pm 10:06 PM

How to clear Redis data: Use the FLUSHALL command to clear all key values. Use the FLUSHDB command to clear the key value of the currently selected database. Use SELECT to switch databases, and then use FLUSHDB to clear multiple databases. Use the DEL command to delete a specific key. Use the redis-cli tool to clear the data.

How to read redis queue How to read redis queue Apr 10, 2025 pm 10:12 PM

To read a queue from Redis, you need to get the queue name, read the elements using the LPOP command, and process the empty queue. The specific steps are as follows: Get the queue name: name it with the prefix of "queue:" such as "queue:my-queue". Use the LPOP command: Eject the element from the head of the queue and return its value, such as LPOP queue:my-queue. Processing empty queues: If the queue is empty, LPOP returns nil, and you can check whether the queue exists before reading the element.

How to configure Lua script execution time in centos redis How to configure Lua script execution time in centos redis Apr 14, 2025 pm 02:12 PM

On CentOS systems, you can limit the execution time of Lua scripts by modifying Redis configuration files or using Redis commands to prevent malicious scripts from consuming too much resources. Method 1: Modify the Redis configuration file and locate the Redis configuration file: The Redis configuration file is usually located in /etc/redis/redis.conf. Edit configuration file: Open the configuration file using a text editor (such as vi or nano): sudovi/etc/redis/redis.conf Set the Lua script execution time limit: Add or modify the following lines in the configuration file to set the maximum execution time of the Lua script (unit: milliseconds)

How to use the redis command line How to use the redis command line Apr 10, 2025 pm 10:18 PM

Use the Redis command line tool (redis-cli) to manage and operate Redis through the following steps: Connect to the server, specify the address and port. Send commands to the server using the command name and parameters. Use the HELP command to view help information for a specific command. Use the QUIT command to exit the command line tool.

How to implement redis counter How to implement redis counter Apr 10, 2025 pm 10:21 PM

Redis counter is a mechanism that uses Redis key-value pair storage to implement counting operations, including the following steps: creating counter keys, increasing counts, decreasing counts, resetting counts, and obtaining counts. The advantages of Redis counters include fast speed, high concurrency, durability and simplicity and ease of use. It can be used in scenarios such as user access counting, real-time metric tracking, game scores and rankings, and order processing counting.

How to set the redis expiration policy How to set the redis expiration policy Apr 10, 2025 pm 10:03 PM

There are two types of Redis data expiration strategies: periodic deletion: periodic scan to delete the expired key, which can be set through expired-time-cap-remove-count and expired-time-cap-remove-delay parameters. Lazy Deletion: Check for deletion expired keys only when keys are read or written. They can be set through lazyfree-lazy-eviction, lazyfree-lazy-expire, lazyfree-lazy-user-del parameters.

How to optimize the performance of debian readdir How to optimize the performance of debian readdir Apr 13, 2025 am 08:48 AM

In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information

See all articles