How to quickly determine whether an element is in a collection in java
1. What is a Bloom filter?
The Bloom Filter was proposed by a guy named Bloom in 1970.
It can actually be viewed as a data structure consisting of a binary vector (or bit array) and a series of random mapping functions (hash functions).
Its advantage is that space efficiency and query time are much better than ordinary algorithms. Its disadvantage is that it has a certain misrecognition rate and difficulty in deletion.
2. Implementation principle
Let’s take a picture first
Bloom filter algorithm The main idea is to use n hash functions to perform hashing to obtain different hash values. According to the hash, they are mapped to different index positions of the array (the length of this array may be very long), and then the corresponding index bits are The value on is set to 1.
To determine whether the element appears in the set is to use k different hash functions to calculate the hash value and see whether the value at the corresponding index position of the hash value is 1. If there is one that is not 1 , indicating that the element does not exist in the collection.
But it is also possible to judge that the element is in the set, but the element is not. The 1s above all index positions of this element are set by other elements, which leads to a certain probability of misjudgment (this is why the above is live The root cause may be in a collection, because there will be certain hash conflicts).
Note: The lower the false positive rate, the lower the corresponding performance will be.
3. Function
The Bloom filter can be used to determine whether an element is (possibly) in a set, and compared to other data structures, the Bloom filter There are huge advantages in both space and time.
Note the word above: possible. There is a suspense reserved here, which will be analyzed in detail below.
Judge whether the given data exists
Prevent cache penetration (judge whether the requested data is valid to avoid directly bypassing the cache to request the database), etc., mailbox spam filtering, blacklist function, etc. wait.
4. Specific implementation
After reading the algorithm idea of Bloom filter, let’s start to explain the specific implementation.
Let me first give an example. Suppose there are two strings, Wangcai and Xiaoqiang. They have been hashed three times respectively, and then the index of the corresponding array (assuming the array length is 16) is calculated based on the hash result. The value of the position is set to 1. Let’s first look at the phrase Wangcai:
After three hashes of Wangcai, the values are 2, 4, and 6 respectively. Then we can get The index values are 2, 4, and 6 respectively, so the values of the index (2, 4, 6) positions of the array are set to 1, and the rest are regarded as 0. Now suppose that you need to find Wangcai, and also go through these three hashes and then It is found that the values of the positions corresponding to indexes 2, 4, and 6 are all 1, then it can be judged that prosperous wealth may exist.
Then insert Xiaoqiang into the Bloom filter. The actual process is the same as above. Assume that the obtained subscripts are 1, 3, 5
Putting aside the existence of Wangcai, Xiaoqiang looks like this in the Bloom filter at this time. The actual array combining Wangcai and Xiaoqiang looks like this:
Now there is a data: 9527. The current requirement is to determine whether 9527 exists. Assume that the subscripts obtained by hashing 9527 three times are: 5, 6, and 7. It turns out that the value of the position with subscript 7 is 0, so it can be definitely judged that 9527 must not exist.
Then came another domestic 007. After three hashes, the subscripts obtained were: 2, 3, and 5. It turned out that the values corresponding to the subscripts 2, 3, and 5 were all 1, so it can be roughly It is judged that domestic 007 may exist. But in fact, after our demonstration just now, domestic 007 does not exist at all. The reason why the values of index positions 2, 3, and 5 are 1 is because of other data settings.
Speaking of which, I wonder if everyone understands the role of the Bloom filter.
5. Code implementation
As Java programmers, we are really happy. We use many frameworks and tools, and they are basically encapsulated. Bloom filters , we will use the tool class packaged by Google. Of course there are other methods that you can explore.
First add dependencies
<!--布隆过滤依赖--> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>25.1-jre</version> </dependency>
Code implementation
import com.google.common.hash.BloomFilter; import com.google.common.hash.Funnels; import java.nio.charset.Charset; public class BloomFilterDemo { public static void main(String[] args) { /** * 创建一个插入对象为一亿,误报率为0.01%的布隆过滤器 * 不存在一定不存在 * 存在不一定存在 * ---------------- * Funnel 对象:预估的元素个数,误判率 * mightContain :方法判断元素是否存在 */ BloomFilter<CharSequence> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charset.forName("utf-8")), 100000000, 0.0001); bloomFilter.put("死"); bloomFilter.put("磕"); bloomFilter.put("Redis"); System.out.println(bloomFilter.mightContain("Redis")); System.out.println(bloomFilter.mightContain("Java")); } }
The specific explanation has been written in the comments. By now I believe everyone must understand the Bloom filter and how to use it.
6. Practical combat
Let’s simulate this scenario: solving cache penetration through Bloom filters.
First of all, do you know what cache penetration is?
Cache penetration means that the user accesses a data that is not in the cache or the database. Because it does not exist in the cache, it will access the database if the concurrency is high. It is easy to defeat the database
So how does the Bloom filter solve this problem? he
的原理是这样子的:将数据库中所有的查询条件,放入布隆过滤器中,当一个查询请求过来时,先经过布隆过滤器进行查,如果判断请求查询值存在,则继续查;如果判断请求查询不存在,直接丢弃。
其代码如下:
String get(String key) { String value = redis.get(key); if (value == null) { if(!bloomfilter.mightContain(key)){ return null; }else{ value = db.get(key); redis.set(key, value); } } return value; }
The above is the detailed content of How to quickly determine whether an element is in a collection in java. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

Java 8 introduces the Stream API, providing a powerful and expressive way to process data collections. However, a common question when using Stream is: How to break or return from a forEach operation? Traditional loops allow for early interruption or return, but Stream's forEach method does not directly support this method. This article will explain the reasons and explore alternative methods for implementing premature termination in Stream processing systems. Further reading: Java Stream API improvements Understand Stream forEach The forEach method is a terminal operation that performs one operation on each element in the Stream. Its design intention is

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP and Python each have their own advantages and are suitable for different scenarios. 1.PHP is suitable for web development and provides built-in web servers and rich function libraries. 2. Python is suitable for data science and machine learning, with concise syntax and a powerful standard library. When choosing, it should be decided based on project requirements.

PHPhassignificantlyimpactedwebdevelopmentandextendsbeyondit.1)ItpowersmajorplatformslikeWordPressandexcelsindatabaseinteractions.2)PHP'sadaptabilityallowsittoscaleforlargeapplicationsusingframeworkslikeLaravel.3)Beyondweb,PHPisusedincommand-linescrip

The reasons why PHP is the preferred technology stack for many websites include its ease of use, strong community support, and widespread use. 1) Easy to learn and use, suitable for beginners. 2) Have a huge developer community and rich resources. 3) Widely used in WordPress, Drupal and other platforms. 4) Integrate tightly with web servers to simplify development deployment.

PHP is suitable for web development and content management systems, and Python is suitable for data science, machine learning and automation scripts. 1.PHP performs well in building fast and scalable websites and applications and is commonly used in CMS such as WordPress. 2. Python has performed outstandingly in the fields of data science and machine learning, with rich libraries such as NumPy and TensorFlow.
