C++ hashmap benchmark

c hashmap性能对比

2022年最新的 hashmap 性能对比结果出来了。作者是 Martin Leitner-Ankerl，ankerl::unordered_dense::map 的作者。之前在2019年有一个测试，今年更新了最新的测试，测试数据非常全面。如果大家想选择一个高效的 hashmap ，不妨参考一下。

hashmap性能对比

测试数据类型主要分为 Numbers 和 String 类型。测试的操作有Insert， Random Insert， Iterate， Find， Copy 等常见的 hashmap 操作，Find 时测试不同的选择率，不同的插入查找频率下的性能表现。另外还包括 memory usage 的对比，整体结果使用算术平均值来计算。

测试结果：ankerl::unordered_dense::map 的综合性能，也就是以上所有操作的算术平均值最优，其中 Copy，Iterate 的性能无出其右，Find性能也非常不错。对于不同的hash函数，表现差异不大。ankerl 继承自 robin_hood ，都是本文的作者。但性能相比 robin_hood 更好。

gtl::flat_hash_map 和 gtl::parallel_flat_hash_map 也就是很多开源软件比较喜欢用的 phmap::flat_hash_map 和 phmap::parallel_flat_hash_map，关于phmap的介绍可以参考https://byronhe.com/post/2020/11/10/parallel-hashmap-btree-fast-multi-thread-intro/。gtl::parallel_flat_hash_map 是 gtl::flat_hash_map的并行版本，它继承自 Google’s Abseil 的 absl::flat_hash_map ，但性能也比较优秀。仓库里有 btree containers 的实现和 hash containers 的实现，headler only 的形式，方便使用。gtl::flat_hash_map 版本实现在 Find 1 – 500k uint64_t 这个测试集表现最好

emhash8::HashMap 和 emhash7::HashMap 是仅次于ankerl的性能表现。

folly::F14NodeMap 和 folly::F14ValueMap 的性能也非常好，但 folly 依赖太多，不过如果你已经使用了 folly 库，那这两个不妨一试。

absl::flat_hash_map 是 Google’s Abseil 实现的，是一个最好的性能表现之一。尤其对于大的map性能表现更好，字符串查找的性能也很好，Copy 和 Iterate 相对较慢。

boost::unordered_map 的 lookup 性能比 std::unordered_map 好两倍，可以平替 std 版本，但内存使用很多，插入性能也一般。

hash函数

关于 hash 函数的比较也分为 Integral types 和 String 两种类型。std::hash 和 boot::hash 的 Integral types 的 hash 函数直接采用的原值，所以性能会因为输入的不同相差很大，不推荐。

std::hash 对于 String 类型的性能比较不错。boot::hash 的 String 类型 hash 性能较差。

ankerl::unordered_dense::hash 和 absl::hash 比较像，String 都是使用的 wyhash，但 Integral types 处理不同，ankerl::unordered_dense::hash 使用了 XOR input，但两者的性能都不错。

robin_hood::hash 的 Integral types 基于 murmurhash3 ，string 类型基于 MurmurHash2 。性能也很不错。

mumx 的 Integral 跟 ankerl::unordered_dense::hash 一致，string 类型跟 std::hash 一致。

以上所有结论参考https://martin.ankerl.com/2022/08/27/hashmap-bench-01/#ankerl__unordered_dense__map

以下我们来看看性能最好的Ankerl::unordered_dense::map和emhash8::HashMap 是如何实现的。两者都是header only的形式，方便作为第三方头文件引入。

Ankerl::unordered_dense::map

Ankerl::unordered_dense::map 的核心是Robin hood probing。 github: https://github.com/martinus/unordered_dense

design

其hash 函数选择是 ankerl::unordered_dense::hash，string类型选择的是wyhash，hash函数的实现是非常高效的。Hashmap主要结构包括一个 vector 数组，存储 pair{key, value} 数据，一个 array 数组存储 Buckets 。

Buckets 的主要数据结构是

代码语言：javascript复制

struct Bucket {
    uint32_t dist_and_fingerprint;
    uint32_t value_idx;
};

一共有8个字节，最高的三个字节存储的是距离第一次 hash 函数的位置。接下来一个字节存储的是 fingerprint ，低4个字节存储的是在 vector 中的位置。

这个 Buckets 是为 robin-hood hashing 特别设计的冲突解决方案。bucket 数组是局部有序的，由大到小排序，而且是连续内存查找，cache-locality对于cache十分友好。

Insert

Insert 操作首先通过 key 计算 hash，通过移位获取 bucket_index, 访问当前 bucket，如果bucket没有值，直接插入即可，否则通过对比 dist_and_fingerprint 将先插入的值与 bucket 中的 dist_and_fingerprint 对比，将更大的值排在前面，依次向后移动 bucket_index 以及 dist_inc dist_and_fingerprint（dist_and_fingerprint 1UL<<8）,找到对应的 buckets index 中的位置, 需要注意的是buckets index是相对位置，插入时先插入 vector 中然后在插入buckets 中。

Find

Find 操作首先通过对 key 计算 hash，根据 Buckets 的个数，通过移位获得 bucket_index , 然后通过对比前32位 dist_and_fingerprint ，如果相等在比较通过 value index 找到 vector 中的 key 和当前 key，如果相等则返回 values，否则 buckets 数组向后移位，继续上述过程，直到 dist_and_fingerprint > bucket.dist_and_fingerprint 后退出。查找过程主要耗时在对 buckets 内的有限次顺序查找，充分利用cache-locality，性能非常不错。

Iterate

Iterate 非常简单，遍历vector即可。性能极好。

emhash8::HashMap

Emhash8 是 linear probing 的“集大成者”。github: https://github.com/ktprime/emhash

design

Emhash8的hash函数整型选择的是 murmurhash3mixer，string 类型是 wyhash 。传统的hashmap 根据寻址方式可以分为 closed addresssing（链表法）和 open addressing (开放寻址法，Linear Probing, Quadratic Probing,Double Hashing )。开放寻址法有更好的cache locality，但可能会发生 Primary Clustering 和 Secondary Clustering 的问题。Emhash8 通过 3-way combined probing 的方式来优化 probing 。得益于 3-way probing , emhash8 在 load factor > 0.9 的情况下性能依然非常好。

emhash 的关键设计

1. 关键数据结构： Index 和 value_type 。数据存储在value_type = std::pair<KeyT, ValueT>; index 中 bucket 表示 Index 中冲突的位置，slot 表示数据的位置。

代码语言：javascript复制

struct Index
{
    size_type bucket;
    size_type slot;
};

2. 3-way combined probing 技术来搜索empty slot

在2-3个cache line之内采用 linear probing
超过之后采用有限的 quadratic probing
记录上一个 empty 位置 _last，通过 _last 来计算 empty slot ：(_mask / 2 _last ) & _mask

3. smart collision resolution, 冲突节点是通过Index中的bucket字段连接起来，bucket就是下一个索引的位置。所以可以避免Primary Clustering 和 Secondary Clustering 造成的性能下降。

高性能计算性能测试 hashmap c++

0 人点赞