在插入 STL 集之前我应该​​随机洗牌吗?

发布于 2024-09-12 16:17:08 字数 100 浏览 3 评论 0原文

我需要将 1000 万个字符串插入到 C++ STL 集中。字符串已排序。如果我按排序顺序插入字符串,是否会出现病态问题?我应该先随机吗?或者 G++ STL 实现会自动为我重新平衡吗?

I need to insert 10-million strings into a C++ STL set. The strings are sorted. Will I have a pathological problem if I insert the strings in sorted order? Should I randomize first? Or will the G++ STL implementation automatically rebalance for me?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

一袭水袖舞倾城 2024-09-19 16:17:08

set 实现通常使用红黑树,它会为您重新平衡。但是,如果您在插入之前随机化数据,则插入可能会更快(也可能不会) - 唯一确定的方法是使用您的设置实现和特定数据进行测试。无论哪种方式,检索时间都是相同的。

The set implementation typically uses a red-black tree, which will rebalance for you. However, insertion may be faster (or it may not) if you randomise the data before inserting - the only way to be sure is to do a test with your set implementation and specific data. Retrieval times will be the same, either way.

溺孤伤于心 2024-09-19 16:17:08

实施将自动重新平衡。然而,鉴于您知道输入已排序,您可以给它一些帮助:您可以在执行插入时提供“提示”,在这种情况下,向先前插入的项目提供迭代器将是完全正确的提示为下一次插入提供。在这种情况下,每次插入都将具有摊销常数复杂度,而不是您期望的对数复杂度。

The implementation will re-balance automatically. Given that you know the input is sorted, however, you can give it a bit of assistance: You can supply a "hint" when you do an insertion, and in this case supplying the iterator to the previously inserted item will be exactly the right hint to supply for the next insertion. In this case, each insertion will have amortized constant complexity instead of the logarithmic complexity you'd otherwise expect.

苏大泽ㄣ 2024-09-19 16:17:08

我唯一的问题是:你真的需要一套吗?

如果数据已经排序,并且您不需要在创建后插入/删除元素,则 deque 会更好:

  • 使用 二进制搜索进行检索,
  • 您将获得更少的内存开销...以及更好的缓存局部性

binary_search:我怀疑您需要的不仅仅是一个 ForwardIterator 来进行二分搜索,猜猜这个网站又关闭了:(

The only question I have: do you really need a set ?

If the data is already sorted and you don't need to insert / delete elements after the creation, a deque would be better:

  • you'll have the same big-O complexity using a binary search for retrieval
  • you'll get less memory overhead... and better cache locality

On binary_search: I suspect you need more than a ForwardIterator for a binary search, guess this site is off again :(

澉约 2024-09-19 16:17:08

http://en.wikipedia.org/wiki/Standard_Template_Library

设置:“使用 self 实现-平衡二叉搜索树。”

http://en.wikipedia.org/wiki/Standard_Template_Library

set: "Implemented using a self-balancing binary search tree."

留蓝 2024-09-19 16:17:08

g++ 的 libstdc++ 使用红黑树作为集合和映射。

http://en.wikipedia.org/wiki/Red-black_tree

这是自平衡树,插入总是 O(log n)。 C++标准也要求所有的实现都具有这个特性,所以在实践中,它们几乎都是红黑树,或者非常相似的东西。

因此,不必担心放置元素的顺序。

g++'s libstdc++ uses red black trees for sets and maps.

http://en.wikipedia.org/wiki/Red-black_tree

This is a self balancing tree, and insertions are always O(log n). The C++ standard also requires that all implementations have this characteristic, so in practice, they are almost always red black trees, or something very similar.

So don't worry about the order you put the elements in.

分开我的手 2024-09-19 16:17:08

一个非常便宜且简单的解决方案是从字符串集合的两端插入。也就是说,先加“A”,然后加“ZZZZZ”,再加“AA”,再加“ZZZZY”,以此类推,直到中间相遇。它不需要高昂的洗牌成本,但它可能会回避病态的情况。

A very cheap and simple solution is to insert from both ends of your collections of strings. That is to say, first add "A", then "ZZZZZ", then "AA", then "ZZZZY", etcetera until you meet in the middle. It doesn't require the hefty cost of shuffling, yet it is likely to sidestep pathological cases.

镜花水月 2024-09-19 16:17:08

也许“unordered_set”可以是一个替代方案。

Maybe 'unordered_set' can be an alternative.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文