boost::thread::id 的 tr1::hash 吗?

发布于 2024-07-17 08:53:30 字数 884 浏览 11 评论 0原文

我开始使用 tr1 命名空间中的 unordered_set 类来加速对普通(基于树)STL map 的访问。 然而,我想在 boost (boost::thread::id) 中存储对线程 ID 的引用,并意识到这些标识符的 API 非常不透明,以至于您无法清楚地获取它的哈希值。

令人惊讶的是,boost 实现了 tr1 的部分内容(包括 hashunordered_set),但它没有定义能够对线程 ID。

查看boost::thread::id的文档,我发现线程ID可以输出到流中,所以我进行散列的解决方案是:

struct boost_thread_id_hash
{
    size_t operator()(boost::thread::id const& id) const
    {
        std::stringstream ostr;
        ostr << id;
        std::tr1::hash<std::string> h;
        return h(ostr.str());
    }
};

即序列化它,应用散列到结果字符串。 然而,这似乎比实际使用 STL map 效率低。

所以,我的问题是:你找到更好的方法吗? boost 和 tr1 中不强制存在 hash 类是否明显不一致?

谢谢。

I started to use the unordered_set class from the tr1 namespace to speed-up access against the plain (tree-based) STL map. However, I wanted to store references to threads ID in boost (boost::thread::id), and realized that the API of those identifiers is so opaque that you cannot clearly obtain a hash of it.

Surprisingly, boost implements parts of the tr1 (including hash and unordered_set), but it does not define a hash class that is able to hash a thread ID.

Looking at the documentation of boost::thread::id I found that thread IDs can be output to a stream, so my solution for doing hashing was kind of:

struct boost_thread_id_hash
{
    size_t operator()(boost::thread::id const& id) const
    {
        std::stringstream ostr;
        ostr << id;
        std::tr1::hash<std::string> h;
        return h(ostr.str());
    }
};

That is, serialize it, apply the hash to the resulting string. However, this seems to be less efficient than actually using the STL map<boost::thread::id>.

So, my questions: Do you find a better way of doing this? Is it a clear inconsistency in both boost and tr1 not to force the existence of a hash<boost::thread::id> class?

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

树深时见影 2024-07-24 08:53:30

正如您自己所说,字符串化 thread::id 的开销(仅用于随后计算字符串哈希)与 tr1::unordered_map 的任何性能优势相比,是天文数字。可能会授予相对于 std::map 的能力。 所以简短的答案是:坚持使用 std::map< thread::id, ... >

如果您绝对必须使用无序容器,尝试使用native_handle_type而不是< code>thread::id 如果可能的话,即更喜欢 tr1::unordered_maptr1::unordered_map< thread::native_handle_type, ... >,在 insert< 时调用 thread::native_handle() 而不是 thread::get_id() /code>ing 和 finding。

请勿尝试以下任何操作

struct boost_thread_id_hash {
   // one and only member of boost::thread::id is boost::thread::id::thread_data
   //   of type boost::detail::thread_data_ptr;
   // boost::thread::id::operator==(const id&) compares boost::thread::id::thread_data's
   size_t operator()(boost::thread::id const& id) const {
      const boost::detail::thread_data_ptr* pptdp = \
        reinterpret_cast< boost::detail::thread_data_ptr* >(&id);
      return h(pptdp->get());
   }
};

它可以工作,但非常脆弱,几乎肯定是定时炸弹。 它假设对 thread::id 实现的内部工作原理有深入的了解。 这会让你受到其他开发者的咒骂。 如果担心可维护性,请不要这样做! 甚至修补 boost/thread/detail/thread.hpp 以添加 size_t hash_value(const id& tid) 作为 thread::id 的好友更好”。 :)

The overhead of stringifying thread::id (only to compute the string hash afterward) is, as you almost said yourself, astronomical compared to any performance benefits a tr1::unordered_map might confer vis-a-vis std::map. So the short answer would be: stick with std::map< thread::id, ... >

If you absolutely must use unordered containers, try to usenative_handle_type instead of thread::id if possible, i.e. prefer tr1::unordered_map< thread::native_handle_type, ... >, invoking thread::native_handle() instead of thread::get_id() when inserting and finding.

DO NOT attempt anything like the following:

struct boost_thread_id_hash {
   // one and only member of boost::thread::id is boost::thread::id::thread_data
   //   of type boost::detail::thread_data_ptr;
   // boost::thread::id::operator==(const id&) compares boost::thread::id::thread_data's
   size_t operator()(boost::thread::id const& id) const {
      const boost::detail::thread_data_ptr* pptdp = \
        reinterpret_cast< boost::detail::thread_data_ptr* >(&id);
      return h(pptdp->get());
   }
};

It could work, but is extremely brittle and an almost guaranteed timebomb. It assumes intimate knowledge of the inner workings of the thread::id implementation. It will get you cursed at by other developers. Don't do it if maintainability is of any concern! Even patching boost/thread/detail/thread.hpp to add size_t hash_value(const id& tid) as a friend of thread::id is "better". :)

笔落惊风雨 2024-07-24 08:53:30

明显的问题是你为什么要实际使用哈希?

我了解性能关键代码的 map / set 问题,实际上这些容器对缓存不太友好,因为这些项目可能分配在非常不同的内存位置。

正如 KeithB 所建议的(我不会评论使用二进制表示形式,因为毕竟没有什么能保证 2 个 id 具有相同的二进制表示形式...),使用排序的向量可以加快代码速度,以防万一物品很少。

排序向量/双端队列对缓存更加友好,但是由于涉及复制,它们在插入/擦除时会遇到 O(N) 复杂性。 一旦你达到几百个线程(顺便说一句,从来没有见过那么多),它可能会造成伤害。

然而,有一种数据结构试图将映射和排序向量的好处联系起来:B+树

您可以将其视为一张地图,其中每个节点将包含多个元素(按排序顺序)。 仅使用叶节点。

为了获得更多性能,您可以:

  • 线性链接叶子:即根缓存指向第一个和最后一个叶子的指针,并且叶子本身互连,以便线性行进完全绕过内部节点。
  • 将最后访问的叶子缓存在根中,毕竟它也可能是下一个访问的叶子。

渐近性能与映射相同,因为它是作为平衡二叉树实现的,但由于值打包在组中,因此您的代码可以通过常数变得更快。

真正的困难是定制每个“存储桶”的大小,您需要对此进行一些分析,因此如果您的实现允许在那里进行一些自定义,那就更好了(因为它将取决于执行代码的体系结构)。

The obvious question is why would you want to actually use a hash ?

I understand the issue with map / set for performance critical code, indeed those containers are not very cache friendly because the items might be allocated at very different memory locations.

As KeithB suggested (I won't comment on using the binary representation since nothing guarantees that 2 ids have the same binary representation after all...), using a sorted vector can speed up the code in case there is very few items.

Sorted vectors / deques are much more cache-friendly, however they suffer from a O(N) complexity on insert/erase because of the copying involved. Once you reach a couple hundreds threads (never seen that many by the way), it could hurt.

There is however, a data structure that tries to associate the benefits from maps and sorted vectors: the B+Tree.

You can view it as a map for which each node would contain more than one element (in sorted order). Only the leaf nodes are used.

To get some more performance you can:

  • Link the leaves linearly: ie the root caches a pointer to the first and last leaf and the leaves are interconnected themselves, so that linear travel completely bypass the interal nodes.
  • Cache the last accessed leaf in the root, after all it's likely that'll also be the next one accessed.

The asymptotic performances are the same than for the map, because it's implemented as a Balanced Binary Tree, but because the values are packed in groups, you're code can become faster by a constant.

The real difficulty is to tailor the size of each "bucket", you'll need some profiling for that so it would be better if your implementation allowed some customization there (as it will depend on the architecture on which the code is executed).

疯狂的代价 2024-07-24 08:53:30

为什么要将这些存储在一个集合中。 除非你做了一些不寻常的事情,否则将会有少量的线程。 维护一组的开销可能比仅仅将它们放入向量中并进行线性搜索要高。

如果搜索比添加和删除更频繁,则可以仅使用排序向量。 有一个< 为 boost::thread::id 定义的运算符,因此您可以在每次添加或删除后对向量进行排序(或插入到正确的位置),并使用 lower_bound() 进行二分搜索。 这与搜索集合的复杂性相同,并且对于少量数据应该具有较低的开销。

如果您仍然需要这样做,那么只需将其视为 sizeof(boost::thread:id) 字节并对其进行操作如何。

此示例假设 boost::thread::id 的大小是 int 大小的倍数,并且没有打包,也没有虚函数。 如果情况并非如此,则必须对其进行修改,否则根本无法工作。

编辑:我查看了 boost::thread::id 类,它有一个 boost::shared_pointer 作为成员,所以下面的代码坏得很厉害。 我认为唯一的解决方案是让 boost::thread 的作者添加一个哈希函数。 我留下这个例子是为了以防万一它在其他上下文中有用。

boost::thread::id id;
unsigned* data;
// The next line doesn't do anything useful in this case.
data = reinterpret_cast<unsigned *>(&id);
unsigned hash = 0;

for (unsigned int i = 0; i < sizeof(boost::thread::id)/4; i++)
  hash ^= data[i];

Why do you want to store these in a set. Unless you doing something out of the ordinary, there will be a small number of threads. The overhead of maintaining a set is probably higher than just putting them in a vector and doing a linear search.

If searching will happen more frequently than adding and deleting, you can just use a sorted vector. There is a < operator defined for boost::thread::id, so you can sort the vector (or insert into the correct place) after each addition or deletion, and use lower_bound() to do a binary search. This is the same complexity as searching a set, and should have lower overhead for small amounts of data.

If you still need to do this, how about just treating it as a sizeof(boost::thread:id) bytes, and operating on those.

This example assumes that the size of boost::thread::id is a multiple of the size of an int, and that there is no packing, and no virtual functions. If that is not true, it will have to be modified, or will not work at all.

EDIT: I took a look at the boost::thread::id class, and it has a boost::shared_pointer<> as a member, so the code below is horribly broken. I think the only solution is to have the authors of boost::thread add a hash function. I'm leaving the example just in case its useful in some other context.

boost::thread::id id;
unsigned* data;
// The next line doesn't do anything useful in this case.
data = reinterpret_cast<unsigned *>(&id);
unsigned hash = 0;

for (unsigned int i = 0; i < sizeof(boost::thread::id)/4; i++)
  hash ^= data[i];
(り薆情海 2024-07-24 08:53:30

几年后才回答这个问题,但是当尝试将 boost::thread::id 放入 std::unordered_map 作为键时,这显示为最相关的问题。 在已接受的回复中,获取本机句柄是一个很好的建议,只不过它不适用于 this_thread。

相反,有时 boost 有一个 thread::id 的 hash_value,所以这对我来说效果很好:

namespace boost {
  extern std::size_t hash_value(const thread::id &v);
}

namespace std {
  template<>
  struct hash<boost::thread::id> {
    std::size_t operator()(const boost::thread::id& v) const {
      return boost::hash_value(v);
    }
  };
}

当然,需要链接到 libboost_thread 库。

Some years late to answer this question, but this showed up as the most relevant one when trying to put a boost::thread::id in a std::unordered_map as key. Getting the native handle was a good suggestion in the accepted reply except that it is not available for this_thread.

Instead boost for sometime has a hash_value for thread::id, so this worked fine for me:

namespace boost {
  extern std::size_t hash_value(const thread::id &v);
}

namespace std {
  template<>
  struct hash<boost::thread::id> {
    std::size_t operator()(const boost::thread::id& v) const {
      return boost::hash_value(v);
    }
  };
}

Of course, need to link against libboost_thread library.

他是夢罘是命 2024-07-24 08:53:30

您可以创建在 thread::id 和某些东西(例如:整数)之间进行映射的类,您可以将其用作散列。 唯一的缺点是必须确保系统中只有一个映射对象实例。

you can create class that does mapping between thread::id and something (ex.: integers), that you can use as hash. the only drawback is that you must ensure there is only one instance of mapping object in the system.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文