How to implement bloom filter using php+redis
First define a hash function collection class. These hash functions are not necessarily used. In fact, 3 32-bit hash values are enough. The specific number can be based on the total amount of your bit sequence and your need to store it. Determined by the amount, the optimal value has been given above.
class BloomFilterHash { /** * 由Justin Sobel编写的按位散列函数 */ public function JSHash($string, $len = null) { $hash = 1315423911; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash ^= (($hash << 5) + ord($string[$i]) + ($hash >> 2)); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 该哈希算法基于AT&T贝尔实验室的Peter J. Weinberger的工作。 * Aho Sethi和Ulman编写的“编译器(原理,技术和工具)”一书建议使用采用此特定算法中的散列方法的散列函数。 */ public function PJWHash($string, $len = null) { $bitsInUnsignedInt = 4 * 8; //(unsigned int)(sizeof(unsigned int)* 8); $threeQuarters = ($bitsInUnsignedInt * 3) / 4; $oneEighth = $bitsInUnsignedInt / 8; $highBits = 0xFFFFFFFF << (int) ($bitsInUnsignedInt - $oneEighth); $hash = 0; $test = 0; $len || $len = strlen($string); for($i=0; $i<$len; $i++) { $hash = ($hash << (int) ($oneEighth)) + ord($string[$i]); } $test = $hash & $highBits; if ($test != 0) { $hash = (($hash ^ ($test >> (int)($threeQuarters))) & (~$highBits)); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 类似于PJW Hash功能,但针对32位处理器进行了调整。它是基于UNIX的系统上的widley使用哈希函数。 */ public function ELFHash($string, $len = null) { $hash = 0; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = ($hash << 4) + ord($string[$i]); $x = $hash & 0xF0000000; if ($x != 0) { $hash ^= ($x >> 24); } $hash &= ~$x; } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 这个哈希函数来自Brian Kernighan和Dennis Ritchie的书“The C Programming Language”。 * 它是一个简单的哈希函数,使用一组奇怪的可能种子,它们都构成了31 .... 31 ... 31等模式,它似乎与DJB哈希函数非常相似。 */ public function BKDRHash($string, $len = null) { $seed = 131; # 31 131 1313 13131 131313 etc.. $hash = 0; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = (int) (($hash * $seed) + ord($string[$i])); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 这是在开源SDBM项目中使用的首选算法。 * 哈希函数似乎对许多不同的数据集具有良好的总体分布。它似乎适用于数据集中元素的MSB存在高差异的情况。 */ public function SDBMHash($string, $len = null) { $hash = 0; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = (int) (ord($string[$i]) + ($hash << 6) + ($hash << 16) - $hash); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 由Daniel J. Bernstein教授制作的算法,首先在usenet新闻组comp.lang.c上向世界展示。 * 它是有史以来发布的最有效的哈希函数之一。 */ public function DJBHash($string, $len = null) { $hash = 5381; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = (int) (($hash << 5) + $hash) + ord($string[$i]); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * Donald E. Knuth在“计算机编程艺术第3卷”中提出的算法,主题是排序和搜索第6.4章。 */ public function DEKHash($string, $len = null) { $len || $len = strlen($string); $hash = $len; for ($i=0; $i<$len; $i++) { $hash = (($hash << 5) ^ ($hash >> 27)) ^ ord($string[$i]); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 参考 http://www.isthe.com/chongo/tech/comp/fnv/ */ public function FNVHash($string, $len = null) { $prime = 16777619; //32位的prime 2^24 + 2^8 + 0x93 = 16777619 $hash = 2166136261; //32位的offset $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = (int) ($hash * $prime) % 0xFFFFFFFF; $hash ^= ord($string[$i]); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } }
The next step is to connect redis to perform operations
/** * 使用redis实现的布隆过滤器 */ abstract class BloomFilterRedis { /** * 需要使用一个方法来定义bucket的名字 */ protected $bucket; protected $hashFunction; public function __construct($config, $id) { if (!$this->bucket || !$this->hashFunction) { throw new Exception("需要定义bucket和hashFunction", 1); } $this->Hash = new BloomFilterHash; $this->Redis = new YourRedis; //假设这里你已经连接好了 } /** * 添加到集合中 */ public function add($string) { $pipe = $this->Redis->multi(); foreach ($this->hashFunction as $function) { $hash = $this->Hash->$function($string); $pipe->setBit($this->bucket, $hash, 1); } return $pipe->exec(); } /** * 查询是否存在, 如果曾经写入过,必定回true,如果没写入过,有一定几率会误判为存在 */ public function exists($string) { $pipe = $this->Redis->multi(); $len = strlen($string); foreach ($this->hashFunction as $function) { $hash = $this->Hash->$function($string, $len); $pipe = $pipe->getBit($this->bucket, $hash); } $res = $pipe->exec(); foreach ($res as $bit) { if ($bit == 0) { return false; } } return true; } }
The above definition is an abstract class. If you want to use it, you can use it according to the specific business. For example, below is a filter that filters out duplicate content.
/** * 重复内容过滤器 * 该布隆过滤器总位数为2^32位, 判断条数为2^30条. hash函数最优为3个.(能够容忍最多的hash函数个数) * 使用的三个hash函数为 * BKDR, SDBM, JSHash * * 注意, 在存储的数据量到2^30条时候, 误判率会急剧增加, 因此需要定时判断过滤器中的位为1的的数量是否超过50%, 超过则需要清空. */ class FilteRepeatedComments extends BloomFilterRedis { /** * 表示判断重复内容的过滤器 * @var string */ protected $bucket = 'rptc'; protected $hashFunction = array('BKDRHash', 'SDBMHash', 'JSHash'); }
The above is the detailed content of How to implement bloom filter using php+redis. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

PHP is used to build dynamic websites, and its core functions include: 1. Generate dynamic content and generate web pages in real time by connecting with the database; 2. Process user interaction and form submissions, verify inputs and respond to operations; 3. Manage sessions and user authentication to provide a personalized experience; 4. Optimize performance and follow best practices to improve website efficiency and security.

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

The core benefits of PHP include ease of learning, strong web development support, rich libraries and frameworks, high performance and scalability, cross-platform compatibility, and cost-effectiveness. 1) Easy to learn and use, suitable for beginners; 2) Good integration with web servers and supports multiple databases; 3) Have powerful frameworks such as Laravel; 4) High performance can be achieved through optimization; 5) Support multiple operating systems; 6) Open source to reduce development costs.

PHP uses MySQLi and PDO extensions to interact in database operations and server-side logic processing, and processes server-side logic through functions such as session management. 1) Use MySQLi or PDO to connect to the database and execute SQL queries. 2) Handle HTTP requests and user status through session management and other functions. 3) Use transactions to ensure the atomicity of database operations. 4) Prevent SQL injection, use exception handling and closing connections for debugging. 5) Optimize performance through indexing and cache, write highly readable code and perform error handling.

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

PHPhassignificantlyimpactedwebdevelopmentandextendsbeyondit.1)ItpowersmajorplatformslikeWordPressandexcelsindatabaseinteractions.2)PHP'sadaptabilityallowsittoscaleforlargeapplicationsusingframeworkslikeLaravel.3)Beyondweb,PHPisusedincommand-linescrip
