Table of Contents
字典数据结构
字典的API函数
创建新字典
字典添加元素
字典Rehash解析
Rehash操作核心函数:
小结
Home Database Mysql Tutorial Redis内部数据结构详解之字典(dict)

Redis内部数据结构详解之字典(dict)

Jun 07, 2016 pm 03:22 PM
dict redis internal dictionary data structure Detailed explanation

字典,简单说就是存储key-value键值数据,当然value=NULL那么就是集合了。字典通俗来说就是C++ STL中的map,STL中的map是用red-black tree实现的,因为map不仅能够保证key不重复,而且key还是按照字典序存储的,而Redis中的字典并不要求有序,因此为了降低编

字典,简单说就是存储key-value键值数据,当然value=NULL那么就是集合了。字典通俗来说就是C++ STL中的map,STL中的map是用red-black tree实现的,因为map不仅能够保证key不重复,而且key还是按照字典序存储的,而Redis中的字典并不要求有序,因此为了降低编码的难度使用哈希表作为字典的底层实现。Redis的字典是使用一个桶bucket,通过对key进行hash得到的索引值index,然后将key-value的数据存在桶的index位置,Redis处理hash碰撞的方式是链表,两个不同的key hash得到相同的索引值,那么就使用链表解决冲突。使用链表自然当存储的数据巨大的时候,字典不免会退化成多个链表,效率大大降低,Redis采用rehash的方式对桶进行扩容来解决这种退化。

Redis使用的hash算法有以下两种:

1. MurmurHash2 32 bit 算法:这种算法的分布率和速度都非常好,具体信息请参考 MurmurHash 的主页:http://code.google.com/p/smhasher/ 。
2. 基于djb算法实现的一个大小写无关散列算法:具体信息请参考
http://www.cse.yorku.ca/~oz/hash.html 。

字典数据结构

typedef struct dictEntry {//字典的节点
    void *key;
    union {//使用的联合体
        void *val;
        uint64_t u64;//这两个参数很有用
        int64_t s64;
    } v;
    struct dictEntry *next;//下一个节点指针
} dictEntry;

typedef struct dictType {
    unsigned int (*hashFunction)(const void *key); //hash函数指针
    void *(*keyDup)(void *privdata, const void *key); //键复制函数指针
    void *(*valDup)(void *privdata, const void *obj); //值复制函数指针
    int (*keyCompare)(void *privdata, const void *key1, const void *key2); //键比较函数指针
    void (*keyDestructor)(void *privdata, void *key); //键构造函数指针
    void (*valDestructor)(void *privdata, void *obj); //值构造函数指针
} dictType;

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht { //字典hash table
    dictEntry **table;//可以看做字典数组,俗称桶bucket
    unsigned long size; //指针数组的大小,即桶的层数
    unsigned long sizemask;
    unsigned long used; //字典中当前的节点数目
} dictht;

typedef struct dict {
    dictType *type;
    void *privdata; //私有数据
    dictht ht[2];   //两个hash table
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */ //rehash 索引
    int iterators; /* number of iterators currently running */ //当前该字典迭代器个数
} dict;
Copy after login
dict数据结构中声明了两个字典hashtable结构dictht,ht[1]在rehash时候使用,后面具体分析。

下图给出整个字典结构,图片来自Redis设计与实现一书:

\

上图ht[1]为空,说明当然字典没在Rehash状态。

字典的API函数

函数名称

作用

复杂度

dictCreate

创建一个新字典

O(1)

dictResize

重新规划字典的大小

O(1)

dictExpand

扩展字典

O(1)

dictRehash

对字典进行N步渐进式Rehash

O(N)

_dictRehashStep

对字典进行1步尝试Rehash

O(N)

dictAdd

添加一个元素

O(1)

dictReplace

替换给定key的value值

O(1)

dictDelete

删除一个元素

O(N)

dictRelease

释放字典

O(1)

dictFind

查找一个元素

O(N)

dictFetchValue

通过key查找value

O(N)

dictGetRandomKey

随机返回字典中一个元素

O(1)

创建新字典

通过dictCreate函数创建一个新字典dict *dictCreate(dictType *type, void *privDataPtr),一个空字典的示意图如下(图片来自Redis设计与实现一书): \上面已经提起过,ht[1]只在Rehash时使用。

字典添加元素

根据字典当前的状态,将一个key-value元素添加到字典中可能会引起一系列复制的操作:

如果字典未初始化(即字典的0号哈希表ht[0]的table为空),那么需要调用dictExpand函数对它初始化;

如果插入的元素key已经存在,那么添加元素失败;

如果插入元素时,引起碰撞,需要使用链表来处理碰撞;

如果插入元素时,引起程序满足Rehash的条件时,先调用dictExpand函数扩展哈希表的size,然后准备渐进式Rehash操作。

字典添加元素的流程图,来自Redis设计与实现一书

\

/* Expand or create the hash table */
int dictExpand(dict *d, unsigned long size)
{
    dictht n; /* the new hash table */
    unsigned long realsize = _dictNextPower(size); //得到需要扩展到的size

    /* the size is invalid if it is smaller than the number of
     * elements already inside the hash table */
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    /* Allocate the new hash table and initialize all pointers to NULL */
    n.size = realsize;
    n.sizemask = realsize-1;
    n.table = zcalloc(realsize * sizeof(dictEntry*));
    n.used = 0;

    /* Is this the first initialization? If so it's not really a rehashing
     * we just set the first hash table so that it can accept keys. */
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    /* Prepare a second hash table for incremental rehashing */
    //准备渐进式rehash,rehash的字典table为0号
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    if (dictIsRehashing(d)) return DICT_OK;

    // 如果哈希表为空,那么将它扩展为初始大小
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /*如果哈希表的已用节点数 >= 哈希表的大小,并且以下条件任一个为真:
       1) dict_can_resize 为真
       2) 已用节点数除以哈希表大小之比大于 dict_force_resize_ratio
       那么调用 dictExpand 对哈希表进行扩展,扩展的体积至少为已使用节点数的两倍
    */
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        return dictExpand(d, d->ht[0].used*2);
    }
    return DICT_OK;
}

static int _dictKeyIndex(dict *d, const void *key)
{
    unsigned int h, idx, table;
    dictEntry *he;

    /* Expand the hash table if needed */
    if (_dictExpandIfNeeded(d) == DICT_ERR)
        return -1;
    /* Compute the key hash value */
    h = dictHashKey(d, key);//通过hash函数得到key所在的bucket索引位置
    //查找在现有字典中是否出现了该key
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        /* Search if this slot does not already contain the given key */
        he = d->ht[table].table[idx];
        while(he) {
            if (dictCompareKeys(d, key, he->key))
                return -1;
            he = he->next;
        }
        //如果系统没在rehash则不需要查找ht[1]
        if (!dictIsRehashing(d)) break;
    }
    return idx;
}

dictEntry *dictAddRaw(dict *d, void *key)
{
    int index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d);// 尝试渐进式地 rehash 桶中一组元素

    /* Get the index of the new element, or -1 if
     * the element already exists. */
    // 查找可容纳新元素的索引位置,如果元素已存在, index 为 -1
    if ((index = _dictKeyIndex(d, key)) == -1)
        return NULL;

    /* Allocate the memory and store the new entry */
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    // 决定该把新元素放在那个哈希表
    entry = zmalloc(sizeof(*entry));
    //头插法,插入节点
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    /* Set the hash entry fields. */
    dictSetKey(d, entry, key);//关联起key
    return entry;
}

/* Add an element to the target hash table */
//添加一个元素
int dictAdd(dict *d, void *key, void *val)
{
    dictEntry *entry = dictAddRaw(d,key);

    if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val);//关联起value
    return DICT_OK;
}
Copy after login

字典Rehash解析

Rehash的触发机制:当每次添加新元素时,都会对工作哈希表ht[0]进行检查,如果used(哈希表中元素的数目)与size(桶的大小)比率ratio满足以下任一条件,将激活字典的Rehash机制:ratio=used / size, ratio >= 1并且dict_can_resize 为真;ratio 大 于 变 量 dict_force_resize_ratio 。
Rehash执行过程:创建一个比ht[0].used至少两倍的ht[1].table;将原ht[0].table中所有元素迁移到ht[1].table;清空原来ht[0],将ht[1]替换成ht[0] 渐进式Rehash主要由两个函数来进行: _dictRehashStep:当对字典进行添加、查找、删除、随机获取元素都会执行一次,其每次在开始Rehash后,将ht[0].table的第一个不为空的索引上的所有节点全部迁移到ht[1].table; dictRehashMilliseconds:由Redis服务器常规任务程序(serverCron)执行,以毫秒为单位,在一定时间内,以每次执行100步rehash操作。

Rehash操作核心函数:

int dictRehash(dict *d, int n) {
    if (!dictIsRehashing(d)) return 0;

    while(n--) {
        dictEntry *de, *nextde;

        /* Check if we already rehashed the whole table... */
        if (d->ht[0].used == 0) {//已经完成
            zfree(d->ht[0].table);//释放ht[0].table
            d->ht[0] = d->ht[1]; //这里ht[0]与ht[1]都不是指针,直接赋值就替换了
            _dictReset(&d->ht[1]);//将ht[1].table设置为null
            d->rehashidx = -1;
            return 0;
        }

        /* Note that rehashidx can&#39;t overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned)d->rehashidx);
        //找到第一个不为空的数组
        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
        //指向该链表头
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {//遍历链表
            unsigned int h;

            nextde = de->next;
            /* Get the index in the new hash table */
            //得到在ht[1]中的索引号,通过相应的hash函数
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;

            // 添加节点到 ht[1] ,调整指针,采用的是头插法
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;//设置为空
        d->rehashidx++;
    }
    return 1;
}
Copy after login

小结

Redis中的字典数据结构使用哈希表来实现,用来存储key-value键值元素;
字典使用两个哈希表,一般只使用ht[0],只有当Rehash时候才使用ht[0];
哈希表采用链表的方式解决键碰撞问题;
Redis的Rehash操作是渐进式的,服务器程序会主动Rehash,在查找、添加、删除元素等操作时也会在Rehash进行时执行一次rehash操作。

字典的内容实在太多,操作比较繁琐,应该是Redis中最复杂的底层数据结构了,本文分析的绝对不够深入,希望以后有时间再修改吧,暂时先这样。到目前为止,Redis六种内部数据结构,同时也是底层操作的实现讲解全部结束,后面的文章将进入五种基本数据类型指令的实现,字符串(String)、哈希表(Hash)、列表(List)、集合(Set)、有序集合(Sorted Set)的各种指令的实现。

我自己对Redis2.8.2源码的注释,有时间找个机会放出来。


最后感谢黄健宏(huangz1990)的Redis设计与实现及其他对Redis2.6源码的相关注释对我在研究Redis2.8源码方面的帮助。
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to build the redis cluster mode How to build the redis cluster mode Apr 10, 2025 pm 10:15 PM

Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

How to read redis queue How to read redis queue Apr 10, 2025 pm 10:12 PM

To read a queue from Redis, you need to get the queue name, read the elements using the LPOP command, and process the empty queue. The specific steps are as follows: Get the queue name: name it with the prefix of "queue:" such as "queue:my-queue". Use the LPOP command: Eject the element from the head of the queue and return its value, such as LPOP queue:my-queue. Processing empty queues: If the queue is empty, LPOP returns nil, and you can check whether the queue exists before reading the element.

How to clear redis data How to clear redis data Apr 10, 2025 pm 10:06 PM

How to clear Redis data: Use the FLUSHALL command to clear all key values. Use the FLUSHDB command to clear the key value of the currently selected database. Use SELECT to switch databases, and then use FLUSHDB to clear multiple databases. Use the DEL command to delete a specific key. Use the redis-cli tool to clear the data.

How to configure Lua script execution time in centos redis How to configure Lua script execution time in centos redis Apr 14, 2025 pm 02:12 PM

On CentOS systems, you can limit the execution time of Lua scripts by modifying Redis configuration files or using Redis commands to prevent malicious scripts from consuming too much resources. Method 1: Modify the Redis configuration file and locate the Redis configuration file: The Redis configuration file is usually located in /etc/redis/redis.conf. Edit configuration file: Open the configuration file using a text editor (such as vi or nano): sudovi/etc/redis/redis.conf Set the Lua script execution time limit: Add or modify the following lines in the configuration file to set the maximum execution time of the Lua script (unit: milliseconds)

How to set the redis expiration policy How to set the redis expiration policy Apr 10, 2025 pm 10:03 PM

There are two types of Redis data expiration strategies: periodic deletion: periodic scan to delete the expired key, which can be set through expired-time-cap-remove-count and expired-time-cap-remove-delay parameters. Lazy Deletion: Check for deletion expired keys only when keys are read or written. They can be set through lazyfree-lazy-eviction, lazyfree-lazy-expire, lazyfree-lazy-user-del parameters.

How to use the redis command line How to use the redis command line Apr 10, 2025 pm 10:18 PM

Use the Redis command line tool (redis-cli) to manage and operate Redis through the following steps: Connect to the server, specify the address and port. Send commands to the server using the command name and parameters. Use the HELP command to view help information for a specific command. Use the QUIT command to exit the command line tool.

How to implement redis counter How to implement redis counter Apr 10, 2025 pm 10:21 PM

Redis counter is a mechanism that uses Redis key-value pair storage to implement counting operations, including the following steps: creating counter keys, increasing counts, decreasing counts, resetting counts, and obtaining counts. The advantages of Redis counters include fast speed, high concurrency, durability and simplicity and ease of use. It can be used in scenarios such as user access counting, real-time metric tracking, game scores and rankings, and order processing counting.

How to optimize the performance of debian readdir How to optimize the performance of debian readdir Apr 13, 2025 am 08:48 AM

In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information

See all articles