Home Java javaTutorial Hash--Introduction to common algorithms

Hash--Introduction to common algorithms

Jun 29, 2017 am 11:29 AM
Hash

Hash (Hash), also known as hashing, is a very common algorithm. Hash is mainly used in JavaHashMap data structure. The hash algorithm consists of two parts: hash function and hash table. We can know the characteristics of our array. We can quickly locate elements through subscripts (O(1)). Similarly, in the hash table, we can use the key (hash value) To quickly locate a certain value, the hash value is calculated through the hash function (hash(key) = address). The element [address] = value can be located through the hash value. The principle is similar to that of an array.

The best hash function is of course one that can calculate a unique hash value for each key value, but there may often be different keyvalue hash value, which causes conflicts. There are two aspects to judge whether a hash function is well designed:

 1. Few conflicts .

 2. Calculation is fast.

The following are several commonly used hash functions. They all have certain mathematical principles behind them and have undergone a lot of practice. Their mathematical principles will not be explored here.

BKDRHash function (h = 31 * h + c)

This hash function is applied to string hash value calculation in Java.

//String#hashCodepublic int hashCode() {int h = hash;if (h == 0 && value.length > 0) {char val[] = value;for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];    //BKDR 哈希函数,常数可以是131、1313、13131……        }
        hash = h;
    }return h;
}
Copy after login

##DJB2 Hash function (h = h << 5 + h + c = h = 33 * h + c

ElasticSearch

uses DJB2The hash function hashes the specified key of the document to be indexed.

##SDBMHash function (h = h << 6 + h << 16 - h + c = 65599 * h + c) is applied in

SDBM

(a simple database engine). The above is just a list of three hash functions. Let’s do an experiment to see how they conflict.

 

Java

 1 package com.algorithm.hash; 2  3 import java.util.HashMap; 4 import java.util.UUID; 5  6 /** 7  * 三种哈希函数冲突数比较 8  * Created by yulinfeng on 6/27/17. 9  */10 public class HashFunc {11 12     public static void main(String[] args) {13         int length = 1000000;   //100万字符串14         //利用HashMap来计算冲突数,HashMap的键值不能重复所以length - map.size()即为冲突数15         HashMap<String, String> bkdrMap = new HashMap<String, String>();16         HashMap<String, String> djb2Map = new HashMap<String, String>();17         HashMap<String, String> sdbmMap = new HashMap<String, String>();18         getStr(length, bkdrMap, djb2Map, sdbmMap);19         System.out.println("BKDR哈希函数100万字符串的冲突数:" + (length - bkdrMap.size()));20         System.out.println("DJB2哈希函数100万字符串的冲突数:" + (length - djb2Map.size()));21         System.out.println("SDBM哈希函数100万字符串的冲突数:" + (length - sdbmMap.size()));22     }23 24     /**25      * 生成字符串,并计算冲突数26      * @param length27      * @param bkdrMap28      * @param djb2Map29      * @param sdbmMap30      */31     private static void getStr(int length, HashMap<String, String> bkdrMap, HashMap<String, String> djb2Map, HashMap<String, String> sdbmMap) {32         for (int i = 0; i < length; i++) {33             System.out.println(i);34             String str = UUID.randomUUID().toString();35             bkdrMap.put(String.valueOf(str.hashCode()), str);   //Java的String.hashCode就是BKDR哈希函数, h = 31 * h + c36             djb2Map.put(djb2(str), str);    //DJB2哈希函数37             sdbmMap.put(sdbm(str), str);    //SDBM哈希函数38         }39     }40 41     /**42      * djb2哈希函数43      * @param str44      * @return45      */46     private static String djb2(String str) {47         int hash = 0;48         for (int i = 0; i != str.length(); i++) {49             hash = hash * 33 + str.charAt(i);    //h = h << 5 + h + c = h = 33 * h + c50         }51         return String.valueOf(hash);52     }53 54     /**55      * sdbm哈希函数56      * @param str57      * @return58      */59     private static String sdbm(String str) {60         int hash = 0;61         for (int i = 0; i != str.length(); i++) {62             hash = 65599 * hash + str.charAt(i);    //h = h << 6 + h << 16 - h + c = 65599 * h + c63         }64         return String.valueOf(hash);65     }66 }
Copy after login
The following are
10

万、100 Ten thousand, 200 Conflict number situation:

After repeated trials, the number of collisions of the three hash functions is actually about the same.

 

Python3

 1 import uuid 2  3 def hash_test(length, bkdrDic, djb2Dic, sdbmDic): 4     for i in range(length): 5         string = str(uuid.uuid1())  #基于时间戳 6         bkdrDic[bkdr(string)] = string 7         djb2Dic[djb2(string)] = string 8         sdbmDic[sdbm(string)] = string 9 10 #BDKR哈希函数11 def bkdr(string):12     hash = 013     for i in range(len(string)):14         hash = 31 * hash + ord(string[i])   # h = 31 * h + c15     return hash16 17 #DJB2哈希函数18 def djb2(string):19     hash = 020     for i in range(len(string)):21         hash = 33 * hash + ord(string[i])   # h = h << 5 + h + c22     return hash23 24 #SDBM哈希函数25 def sdbm(string):26     hash = 027     for i in range(len(string)):28         hash = 65599 * hash + ord(string[i])    # h = h << 6 + h << 16 - h + c29     return hash30 31 length = 10032 bkdrDic = dict() #bkdrDic = {}33 djb2Dic = dict()34 sdbmDic = dict()35 hash_test(length, bkdrDic, djb2Dic, sdbmDic)36 print("BKDR哈希函数100万字符串的冲突数:%d"%(length - len(bkdrDic)))37 print("DJB2哈希函数100万字符串的冲突数:%d"%(length - len(djb2Dic)))38 print("SDBM哈希函数100万字符串的冲突数:%d"%(length - len(sdbmDic)))
Copy after login

A hash table is a data structure that needs to be used with a hash function to create an index for quick search ——"Algorithm Notes". Generally speaking, it is a fixed-length storage space. For example, HashMapThe default hash table is a fixed-length hash table of 16EntryArray. After having a fixed-length storage space, the remaining question is how to put the value into which position. Usually if the hash value is m, the length is n, then this value is placed at the m mod n position.

The picture above shows the hash and hash table, as well as the solution to the conflict (zipper method). There are many solutions to conflicts, including hashing again until there is no conflict, and also using the zipper method to use linked lists to concatenate elements at the same position as shown in the figure above.

Imagine that the length of the hash table in the above example is 10, resulting in 1 conflicts. If the hash The table length is 20, then there will be no conflict. The search is faster but will waste more space. If the hash table length is 2, the conflict search will be inverted 3 times slower, but this will save a lot of space. So the length selection of the hash table is crucial, but it is also an important problem.

Supplement:

Hashes are used in many aspects. For example, different values ​​have different hash values, but The hash algorithm can also be designed so that similar or identical values ​​have similar or identical hash values. That is to say, if two objects are completely different, then their hash values ​​are completely different; if two objects are exactly the same, then their hash values ​​are also exactly the same; the more similar the two objects are, then their hash values ​​are also completely different. The more similar they are. This is actually a similarity problem, which means that this idea can be generalized and applied to similarity calculations (such as the Jaccard distance problem), and ultimately applied to accurate advertising placement, product recommendation, etc.

In addition, consistent hashing can also be applied in load balancing. How to ensure that each server can evenly distribute the load pressure, a good hash algorithm can also do it.

The above is the detailed content of Hash--Introduction to common algorithms. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1662
14
PHP Tutorial
1262
29
C# Tutorial
1235
24
How to use hash search algorithm in C++ How to use hash search algorithm in C++ Sep 19, 2023 pm 02:49 PM

How to use the hash search algorithm in C++ The hash search algorithm is an efficient search and storage technology. It converts keywords into a fixed-length index through a hash function, and then uses this index in the data structure Search. In C++, we can implement hash search algorithms by using hash containers and hash functions from the standard library. This article will introduce how to use the hash search algorithm in C++ and provide specific code examples. Introducing header files and namespaces First, before using the hash search algorithm in C++

How to write a hash lookup algorithm in Python? How to write a hash lookup algorithm in Python? Sep 21, 2023 pm 02:37 PM

How to write a hash lookup algorithm in Python? Hash search algorithm, also known as hash search algorithm, is a data search method based on hash table. Compared with traditional search algorithms such as linear search and binary search, the hash search algorithm has higher search efficiency. In Python, we can use a dictionary to implement a hash table and then implement a hash lookup. The basic idea of ​​the hash search algorithm is to convert the keyword to be searched into an index value through a hash function, and then search it in the hash table based on the index value.

Python underlying technology revealed: how to implement hash algorithm Python underlying technology revealed: how to implement hash algorithm Nov 08, 2023 pm 06:40 PM

Revealing the underlying technology of Python: How to implement the hash algorithm, specific code examples are required Summary: The hash algorithm is one of the commonly used technologies in the computer field and is used to quickly determine the unique identification of data. As a high-level language, Python provides many built-in hash functions, such as the hash() function and the implementation of various hash algorithms. This article will reveal the principles of hashing algorithms and the details of Python's underlying implementation, and provide specific code examples. Introduction to Hash Algorithm Hash algorithm, also known as hash algorithm, is a method of converting data of any length into

A simple article explaining what a hash algorithm is! What is a hash algorithm? A simple article explaining what a hash algorithm is! What is a hash algorithm? Mar 14, 2024 am 11:46 AM

When understanding Bitcoin investment and blockchain technology, hash algorithms can be said to appear frequently. It is said in the currency circle that hip-hop has hip-hop and algorithms have hashes. As for the word "algorithm", it is currently used vaguely by domestic users. Sometimes it refers to the consensus mechanism, and sometimes it refers to the specific Hash algorithm. As a blockchain algorithm, the Hash algorithm has always been obscure to the general public. So, what is Hash algorithm? Hash algorithm? Next, the editor of the currency circle will give you a simple explanation of what a hash algorithm is? I hope that investors can understand the hash algorithm after reading this article. What is a hash algorithm? Hash is transliterated from "Hash", also known as "hash". Essentially a computer program that accepts any

Best encryption and hashing techniques in PHP development Best encryption and hashing techniques in PHP development May 27, 2023 am 08:21 AM

In today's digital age, with the development of the Internet and the increasing importance of information, data confidentiality and security have become increasingly important. To ensure that data is not stolen or tampered with during transmission, PHP developers often use encryption and hashing techniques to protect sensitive data. This article will introduce the most commonly used encryption and hashing technologies in PHP development, as well as their advantages and disadvantages. 1. Encryption technology Encryption is a technology that protects data security. It uses algorithms to convert data into meaningless forms. Only the person holding the key can restore it to readable

In PHP, what does hash function mean? In PHP, what does hash function mean? Sep 03, 2023 am 08:49 AM

A hash function is any function that can be used to map data of any size to data of a fixed size. The value returned by a hash function is called a hash value, hash code, digest, or simply hash. Syntax stringhash(string$algo,string$data[,bool$raw_output=FALSE]) Parameter algo The name of the selected hash algorithm (such as "md5", "sha256", "haval160,4", etc.) data to be hashed information. When raw_output is set to TRUE, raw binary data is output. FALSE outputs lowercase hexadecimal. Example<?php &nbsp

How to handle hashing and encryption functions in accounting systems - Development methods for hashing and encryption using PHP How to handle hashing and encryption functions in accounting systems - Development methods for hashing and encryption using PHP Sep 26, 2023 pm 01:15 PM

How to handle hashing and encryption functions in accounting systems - Development methods for hashing and encryption using PHP Introduction: With the advent of the digital age, the security of various information systems has become more and more important. When designing and developing accounting systems, protecting user privacy data is crucial. Among them, the use of hashing and encryption functions can effectively protect users' sensitive information. This article will introduce how to use PHP to implement hashing and encryption functions in accounting systems, and provide specific code examples. 1. Implementation of hash function Hash is a one-way encryption algorithm

PHP Load Balancing Diversity: Understanding the Pros and Cons of Different Technologies PHP Load Balancing Diversity: Understanding the Pros and Cons of Different Technologies Mar 02, 2024 pm 02:50 PM

Introduction In today's fast-paced and connected digital world, ensuring high availability of applications is critical. Load balancing technology enables applications to distribute incoming traffic across multiple servers, improving performance and reliability. PHP provides support for a range of load balancing technologies, each with its own unique advantages and limitations. Round Robin Round Robin is a simple and effective load balancing technique that distributes requests to a server pool in order. This approach is easy to implement and ensures that requests are evenly distributed among servers. $servers=array("server1","server2","server3");$index=0;while(true)

See all articles