当前位置：文江博客话题详情

C++ 中无序集合的无序映射11

发布于 2025-01-19 21:03:59 字数 304 浏览 2 评论 0原文

我想实现一些东西，将无序的整数集映射到整数值。某种类型的C ++等效于Python dict，它以键和INT为ints将其设置为值。

到目前为止set_lookup; ，但据我了解，它在使用树时不必要地慢慢。我不在乎订购，只有速度很重要。

据我了解，所需的结构是std :: unordered_map＆lt; std :: unordered_set＆lt; int＆gt; int，int，hash＆gt; set_lookup;需要哈希函数才能工作。

这是正确的方法吗？最低限度的示例将如何？我找不到哈希部分的外观。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

廻憶裏菂餘溫 2025-01-26 21:03:59

目前尚不清楚您是询问定义散列函数的语法，还是询问如何为一组整数定义数学上好的散列。

无论如何 - 如果是前者，这里是你应该如何在技术上为你的情况定义一个哈希函数：

template <>
struct hash<std::unordered_set<int>>
{
    std::size_t operator()(const std::unordered_set<int>& k) const
    {
        using std::size_t;
        using std::hash;
        using std::string;
    
        // ...
        // Here you should create and return a meaning full hash value:
        return 5;
    }
};
    
void main()
{
    std::unordered_map<std::unordered_set<int>, int> m;
}

写完之后，我加入了关于这是否是解决你的问题的一个好方向的其他评论。
你没有描述你的问题，所以我无法回答。

It isn't clear whether you ask about the syntax for defining a hash function, or about how to define a mathematically good hash for a set of ints.

Anyway - in case it is the former, here is how you should technically define a hash function for your case:

template <>
struct hash<std::unordered_set<int>>
{
    std::size_t operator()(const std::unordered_set<int>& k) const
    {
        using std::size_t;
        using std::hash;
        using std::string;
    
        // ...
        // Here you should create and return a meaning full hash value:
        return 5;
    }
};
    
void main()
{
    std::unordered_map<std::unordered_set<int>, int> m;
}

Having written that, I join the other comments about whether it is a good direction to solve your problem.
You haven't described your problem, so I cannot answer that.

回复收藏 0 原文

荒人说梦 2025-01-26 21:03:59

我理解[std :: map＆lt; std :: set＆lt; int＆gt; int in int＆gt; set_lookup;]在使用树时不必要地慢。

是[std :: unordered_map＆lt＆lt :: unordered_set＆lt; int＆gt; int，hash＆gt;]正确的方法？

这取决于。如果您的键是创建的，则不会更改，并且您希望能够快速进行很多查找，那么基于哈希表的方法确实会很好，但是您需要两件事为此：

为了能够将
键比较键与哈希键进行比较

，确定良好的哈希功能是一种艺术形式。很少有糟糕的方法 - 但有时比必要的速度慢 - 方法是使用boost hash_combine（足够短，您可以将其复制到代码中 - 请参见在这里用于实施）。但是，如果您的整数值在大多数位中已经非常随机，那么简单地将它们组合在一起就会产生一个很棒的哈希。如果不确定，请使用hash_combine或更好的哈希（例如Murmur32）。哈希所花费的时间将取决于遍历的时间，而遍历unordered_set通常涉及链接的列表遍历（通常在内存页面中跳跃，并且是CPU CACHER不友好）。存储快速遍历值的值的最佳方法是在连续内存中 - ie a std :: vector＆lt;＆gt;或std :: array＆lt;＆gt;如果大小在编译时已知。

您需要做的另一件事是比较钥匙的平等：当键中的元素连续且始终如一时，这也起作用。同样，一个排序的std :: vector＆lt;＆gt;或std :: array＆lt;＆gt;是最好的。

也就是说，如果密钥的集合很大，并且您可以在密钥平等的统计保证下妥协，则可以使用256位哈希和代码，就好像 hash collisisions 始终对应于密钥平等。这通常不是可接受的风险，但是如果您的哈希不容易碰撞，并且您有256位哈希，那么CPU可以在千年的Hashing Distions键上运行平坦的块，并且仍然不太可能产生同样的哈希甚至一次，所以我也看到金融公司在其核心内部数据库产品中使用，因为它可以节省 so 很多时间。

如果您对此妥协感兴趣，则需要std :: unordered_map＆lt; hashvalue256，std :: pair＆lt＆lt＆lt＆std :: vector＆gt; int＆gt;要查找与一组整数关联的int，您将首先放置它们，然后进行查找。编写一个哈希函数很容易产生set或排序vector＆gt;＆gt;或array＆lt＆gt;的相同输出可以将元素呈现给hash_combine以相同的排序顺序（即size size_t seed = 0; for（auto＆amp; element：any_sorted_container）hash_combine（seed，element）代码>）。存储vector＆lt; int＆gt;表示，如果您不需要找到所有的键“ sets”，则可以稍后遍历unordered_map - 如果您不需要这样做（例如您只有当时的密钥查找int s，并且您对良好的哈希碰撞的统计不可能感到满意，您甚至不需要存储键/矢量）：std :: unordered_map＆lt; hashvalue256，int＆gt;。

I understood [std::map<std::set<int>, int> set_lookup;] is unnecessarily slow as it uses trees.

Is [std::unordered_map<std::unordered_set<int>, int, hash>] the right approach?

It depends. If your keys are created then not changed, and you want to be able to do a lot of lookups very fast, then a hash-table based approach would indeed be good, but you'll need two things for that:

to be able to hash keys
to be able to compare keys

To hash keys, deciding on a good hash function is a bit of an art form. A rarely bad - but sometimes slower than necessary - approach is to use boost hash_combine (which is short enough that you can copy it into your code - see here for the implementation). If your integer values are already quite random across most of their bits, though, simply XORing them together would produce a great hash. If you're not sure, use hash_combine or a better hash (e.g. MURMUR32). The time taken to hash will depend on the time to traverse, and traversing an unordered_set typically involves a linked list traversal (which typically jumps around in memory pages and is CPU cache unfriendly). The best way to store the values for fast traversal is in contiguous memory - i.e. a std::vector<>, or std::array<> if the size is known at compile time.

The other thing you need to do is compare keys for equality: that also works fastest when elements in the key are contiguous in memory, and consistently ordered. Again, a sorted std::vector<> or std::array<> would be best.

That said, if the sets for your keys are large, and you can compromise on a statistical guarantee of key equality, you could use e.g. a 256-bit hash and code as if hash collisions always correspond to key equality. That's often not an acceptable risk, but if your hash is not collision prone and you have e.g. a 256 bit hash, a CPU could run flat-chat for millennia hashing distinct keys and still be unlikely to produce the same hash even once, so it is a use I've seen even financial firms use in their core in-house database products, as it can save so much time.

If you're tempted by that compromise, you'd want std::unordered_map<HashValue256, std::pair<int, std::vector<int>>>. To find the int associated with a set of integers, you'd hash them first, then do a lookup. It's easy to write a hash function that produces the same output for a set or sorted vector<> or array<>, as you can present the elements to something like hash_combine in the same sorted order during traversal (i.e. just size_t seed = 0; for (auto& element : any_sorted_container) hash_combine(seed, element);). Storing the vector<int> means you can traverse the unordered_map later if you want to find all the key "sets" - if you don't need to do that (e.g. you're only ever looking up the ints by keys known to the code at the time, and you're comfortable with the statistical improbability of a good hash colliding, you don't even need to store the keys/vectors): std::unordered_map<HashValue256, int>.