Table of Contents
1. What is a Bloom filter?
2. Implementation principle
3. Function
4. Specific implementation
5. Code implementation
6. Practical combat
Home Java javaTutorial How to quickly determine whether an element is in a collection in java

How to quickly determine whether an element is in a collection in java

Apr 19, 2023 pm 05:37 PM
java

1. What is a Bloom filter?

The Bloom Filter was proposed by a guy named Bloom in 1970.

It can actually be viewed as a data structure consisting of a binary vector (or bit array) and a series of random mapping functions (hash functions).

Its advantage is that space efficiency and query time are much better than ordinary algorithms. Its disadvantage is that it has a certain misrecognition rate and difficulty in deletion.

How to quickly determine whether an element is in a collection in java

2. Implementation principle

Let’s take a picture first

How to quickly determine whether an element is in a collection in java

Bloom filter algorithm The main idea is to use n hash functions to perform hashing to obtain different hash values. According to the hash, they are mapped to different index positions of the array (the length of this array may be very long), and then the corresponding index bits are The value on is set to 1.

To determine whether the element appears in the set is to use k different hash functions to calculate the hash value and see whether the value at the corresponding index position of the hash value is 1. If there is one that is not 1 , indicating that the element does not exist in the collection.

But it is also possible to judge that the element is in the set, but the element is not. The 1s above all index positions of this element are set by other elements, which leads to a certain probability of misjudgment (this is why the above is live The root cause may be in a collection, because there will be certain hash conflicts).

Note: The lower the false positive rate, the lower the corresponding performance will be.

3. Function

The Bloom filter can be used to determine whether an element is (possibly) in a set, and compared to other data structures, the Bloom filter There are huge advantages in both space and time.

Note the word above: possible. There is a suspense reserved here, which will be analyzed in detail below.

Judge whether the given data exists

Prevent cache penetration (judge whether the requested data is valid to avoid directly bypassing the cache to request the database), etc., mailbox spam filtering, blacklist function, etc. wait.

4. Specific implementation

After reading the algorithm idea of ​​Bloom filter, let’s start to explain the specific implementation.

Let me first give an example. Suppose there are two strings, Wangcai and Xiaoqiang. They have been hashed three times respectively, and then the index of the corresponding array (assuming the array length is 16) is calculated based on the hash result. The value of the position is set to 1. Let’s first look at the phrase Wangcai:

How to quickly determine whether an element is in a collection in java

After three hashes of Wangcai, the values ​​​​are 2, 4, and 6 respectively. Then we can get The index values ​​are 2, 4, and 6 respectively, so the values ​​of the index (2, 4, 6) positions of the array are set to 1, and the rest are regarded as 0. Now suppose that you need to find Wangcai, and also go through these three hashes and then It is found that the values ​​of the positions corresponding to indexes 2, 4, and 6 are all 1, then it can be judged that prosperous wealth may exist.

Then insert Xiaoqiang into the Bloom filter. The actual process is the same as above. Assume that the obtained subscripts are 1, 3, 5

How to quickly determine whether an element is in a collection in java

Putting aside the existence of Wangcai, Xiaoqiang looks like this in the Bloom filter at this time. The actual array combining Wangcai and Xiaoqiang looks like this:

How to quickly determine whether an element is in a collection in java

Now there is a data: 9527. The current requirement is to determine whether 9527 exists. Assume that the subscripts obtained by hashing 9527 three times are: 5, 6, and 7. It turns out that the value of the position with subscript 7 is 0, so it can be definitely judged that 9527 must not exist.

Then came another domestic 007. After three hashes, the subscripts obtained were: 2, 3, and 5. It turned out that the values ​​corresponding to the subscripts 2, 3, and 5 were all 1, so it can be roughly It is judged that domestic 007 may exist. But in fact, after our demonstration just now, domestic 007 does not exist at all. The reason why the values ​​of index positions 2, 3, and 5 are 1 is because of other data settings.

Speaking of which, I wonder if everyone understands the role of the Bloom filter.

5. Code implementation

As Java programmers, we are really happy. We use many frameworks and tools, and they are basically encapsulated. Bloom filters , we will use the tool class packaged by Google. Of course there are other methods that you can explore.

First add dependencies

<!--布隆过滤依赖-->
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>25.1-jre</version>
</dependency>
Copy after login

Code implementation

import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
import java.nio.charset.Charset;
public class BloomFilterDemo {
        public static void main(String[] args) {
        /**
         * 创建一个插入对象为一亿,误报率为0.01%的布隆过滤器
         * 不存在一定不存在
         * 存在不一定存在
         * ----------------
         *  Funnel 对象:预估的元素个数,误判率
         *  mightContain :方法判断元素是否存在
         */
        BloomFilter<CharSequence> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charset.forName("utf-8")), 100000000, 0.0001);
        bloomFilter.put("死");
        bloomFilter.put("磕");
        bloomFilter.put("Redis");
        System.out.println(bloomFilter.mightContain("Redis"));
        System.out.println(bloomFilter.mightContain("Java"));
    }
}
Copy after login

The specific explanation has been written in the comments. By now I believe everyone must understand the Bloom filter and how to use it.

6. Practical combat

Let’s simulate this scenario: solving cache penetration through Bloom filters.

First of all, do you know what cache penetration is?

Cache penetration means that the user accesses a data that is not in the cache or the database. Because it does not exist in the cache, it will access the database if the concurrency is high. It is easy to defeat the database

So how does the Bloom filter solve this problem? he

的原理是这样子的:将数据库中所有的查询条件,放入布隆过滤器中,当一个查询请求过来时,先经过布隆过滤器进行查,如果判断请求查询值存在,则继续查;如果判断请求查询不存在,直接丢弃。

其代码如下:

String get(String key) {
    String value = redis.get(key);     
    if (value  == null) {
        if(!bloomfilter.mightContain(key)){
            return null; 
        }else{
            value = db.get(key); 
            redis.set(key, value); 
        }    
    }
    return value;
}
Copy after login

The above is the detailed content of How to quickly determine whether an element is in a collection in java. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1670
14
PHP Tutorial
1274
29
C# Tutorial
1256
24
PHP: A Key Language for Web Development PHP: A Key Language for Web Development Apr 13, 2025 am 12:08 AM

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP vs. Python: Understanding the Differences PHP vs. Python: Understanding the Differences Apr 11, 2025 am 12:15 AM

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

Break or return from Java 8 stream forEach? Break or return from Java 8 stream forEach? Feb 07, 2025 pm 12:09 PM

Java 8 introduces the Stream API, providing a powerful and expressive way to process data collections. However, a common question when using Stream is: How to break or return from a forEach operation? Traditional loops allow for early interruption or return, but Stream's forEach method does not directly support this method. This article will explain the reasons and explore alternative methods for implementing premature termination in Stream processing systems. Further reading: Java Stream API improvements Understand Stream forEach The forEach method is a terminal operation that performs one operation on each element in the Stream. Its design intention is

PHP vs. Other Languages: A Comparison PHP vs. Other Languages: A Comparison Apr 13, 2025 am 12:19 AM

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP vs. Python: Core Features and Functionality PHP vs. Python: Core Features and Functionality Apr 13, 2025 am 12:16 AM

PHP and Python each have their own advantages and are suitable for different scenarios. 1.PHP is suitable for web development and provides built-in web servers and rich function libraries. 2. Python is suitable for data science and machine learning, with concise syntax and a powerful standard library. When choosing, it should be decided based on project requirements.

PHP's Impact: Web Development and Beyond PHP's Impact: Web Development and Beyond Apr 18, 2025 am 12:10 AM

PHPhassignificantlyimpactedwebdevelopmentandextendsbeyondit.1)ItpowersmajorplatformslikeWordPressandexcelsindatabaseinteractions.2)PHP'sadaptabilityallowsittoscaleforlargeapplicationsusingframeworkslikeLaravel.3)Beyondweb,PHPisusedincommand-linescrip

PHP: The Foundation of Many Websites PHP: The Foundation of Many Websites Apr 13, 2025 am 12:07 AM

The reasons why PHP is the preferred technology stack for many websites include its ease of use, strong community support, and widespread use. 1) Easy to learn and use, suitable for beginners. 2) Have a huge developer community and rich resources. 3) Widely used in WordPress, Drupal and other platforms. 4) Integrate tightly with web servers to simplify development deployment.

PHP vs. Python: Use Cases and Applications PHP vs. Python: Use Cases and Applications Apr 17, 2025 am 12:23 AM

PHP is suitable for web development and content management systems, and Python is suitable for data science, machine learning and automation scripts. 1.PHP performs well in building fast and scalable websites and applications and is commonly used in CMS such as WordPress. 2. Python has performed outstandingly in the fields of data science and machine learning, with rich libraries such as NumPy and TensorFlow.

See all articles