C++ 中无序集合的无序映射11
我想实现一些东西,将无序的整数集映射到整数值。某种类型的C ++等效于Python dict,它以键和INT为ints将其设置为值。
到目前为止set_lookup; ,但据我了解,它在使用树时不必要地慢慢。我不在乎订购,只有速度很重要。
据我了解,所需的结构是std :: unordered_map< std :: unordered_set< int> int,int,hash> set_lookup;
需要哈希函数才能工作。
这是正确的方法吗?最低限度的示例将如何?我找不到哈希部分的外观。
I wanted to implement something, that maps an unordered set of integers to an integer value. Some kind of C++ equivalent of Python dict, which has sets as keys and ints as values.
So far I used std::map<std::set<int>, int> set_lookup;
but from what I understood this is unnecessarily slow as it uses trees. I don't care about the ordering, only speed is important.
From what I have understand, the desired structure is std::unordered_map<std::unordered_set<int>, int, hash> set_lookup;
which needs a hash function to work.
Is this the right approach? And how would a minimum running example look like? I couldn't find how the hash part should look like.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
目前尚不清楚您是询问定义散列函数的语法,还是询问如何为一组整数定义数学上好的散列。
无论如何 - 如果是前者,这里是你应该如何在技术上为你的情况定义一个哈希函数:
写完之后,我加入了关于这是否是解决你的问题的一个好方向的其他评论。
你没有描述你的问题,所以我无法回答。
It isn't clear whether you ask about the syntax for defining a hash function, or about how to define a mathematically good hash for a set of ints.
Anyway - in case it is the former, here is how you should technically define a hash function for your case:
Having written that, I join the other comments about whether it is a good direction to solve your problem.
You haven't described your problem, so I cannot answer that.
这取决于。如果您的键是创建的,则不会更改,并且您希望能够快速进行很多查找,那么基于哈希表的方法确实会很好,但是您需要两件事为此:
,确定良好的哈希功能是一种艺术形式。很少有糟糕的方法 - 但有时比必要的速度慢 - 方法是使用boost
hash_combine
(足够短,您可以将其复制到代码中 - 请参见在这里用于实施)。但是,如果您的整数值在大多数位中已经非常随机,那么简单地将它们组合在一起就会产生一个很棒的哈希。如果不确定,请使用hash_combine
或更好的哈希(例如Murmur32)。哈希所花费的时间将取决于遍历的时间,而遍历unordered_set
通常涉及链接的列表遍历(通常在内存页面中跳跃,并且是CPU CACHER不友好)。存储快速遍历值的值的最佳方法是在连续内存中 - ie astd :: vector&lt;&gt;
或std :: array&lt;&gt;
如果大小在编译时已知。您需要做的另一件事是比较钥匙的平等:当键中的元素连续且始终如一时,这也起作用。同样,一个排序的
std :: vector&lt;&gt;
或std :: array&lt;&gt;
是最好的。也就是说,如果密钥的集合很大,并且您可以在密钥平等的统计保证下妥协,则可以使用256位哈希和代码,就好像 hash collisisions 始终对应于密钥平等。这通常不是可接受的风险,但是如果您的哈希不容易碰撞,并且您有256位哈希,那么CPU可以在千年的Hashing Distions键上运行平坦的块,并且仍然不太可能产生同样的哈希甚至一次,所以我也看到金融公司在其核心内部数据库产品中使用,因为它可以节省 so 很多时间。
如果您对此妥协感兴趣,则需要
std :: unordered_map&lt; hashvalue256,std :: pair&lt&lt&lt&std :: vector&gt; int&gt;要查找与一组整数关联的
int
,您将首先放置它们,然后进行查找。编写一个哈希函数很容易产生set
或排序vector&gt;&gt;
或array&lt&gt;
的相同输出可以将元素呈现给hash_combine
以相同的排序顺序(即size size_t seed = 0; for(auto&amp; element:any_sorted_container)hash_combine(seed,element)代码>)。存储
vector&lt; int&gt;
表示,如果您不需要找到所有的键“ sets”,则可以稍后遍历unordered_map
- 如果您不需要这样做(例如您只有当时的密钥查找int
s,并且您对良好的哈希碰撞的统计不可能感到满意,您甚至不需要存储键/矢量):std :: unordered_map&lt; hashvalue256,int&gt;
。It depends. If your keys are created then not changed, and you want to be able to do a lot of lookups very fast, then a hash-table based approach would indeed be good, but you'll need two things for that:
To hash keys, deciding on a good hash function is a bit of an art form. A rarely bad - but sometimes slower than necessary - approach is to use boost
hash_combine
(which is short enough that you can copy it into your code - see here for the implementation). If your integer values are already quite random across most of their bits, though, simply XORing them together would produce a great hash. If you're not sure, usehash_combine
or a better hash (e.g. MURMUR32). The time taken to hash will depend on the time to traverse, and traversing anunordered_set
typically involves a linked list traversal (which typically jumps around in memory pages and is CPU cache unfriendly). The best way to store the values for fast traversal is in contiguous memory - i.e. astd::vector<>
, orstd::array<>
if the size is known at compile time.The other thing you need to do is compare keys for equality: that also works fastest when elements in the key are contiguous in memory, and consistently ordered. Again, a sorted
std::vector<>
orstd::array<>
would be best.That said, if the sets for your keys are large, and you can compromise on a statistical guarantee of key equality, you could use e.g. a 256-bit hash and code as if hash collisions always correspond to key equality. That's often not an acceptable risk, but if your hash is not collision prone and you have e.g. a 256 bit hash, a CPU could run flat-chat for millennia hashing distinct keys and still be unlikely to produce the same hash even once, so it is a use I've seen even financial firms use in their core in-house database products, as it can save so much time.
If you're tempted by that compromise, you'd want
std::unordered_map<HashValue256, std::pair<int, std::vector<int>>>
. To find theint
associated with a set of integers, you'd hash them first, then do a lookup. It's easy to write a hash function that produces the same output for aset
or sortedvector<>
orarray<>
, as you can present the elements to something likehash_combine
in the same sorted order during traversal (i.e. justsize_t seed = 0; for (auto& element : any_sorted_container) hash_combine(seed, element);
). Storing thevector<int>
means you can traverse theunordered_map
later if you want to find all the key "sets" - if you don't need to do that (e.g. you're only ever looking up theint
s by keys known to the code at the time, and you're comfortable with the statistical improbability of a good hash colliding, you don't even need to store the keys/vectors):std::unordered_map<HashValue256, int>
.