STL::Map - 遍历列表还是使用查找?

发布于 2024-07-09 19:29:37 字数 119 浏览 10 评论 0原文

假设我有一个方法需要从包含 100 个元素的映射中提取 8 个值。 您认为哪一个更好:

从头到尾走一次 for 循环,通过打开按键将元素拉出?

或者使用 find 8 次来获取这些值?

Say I have a method that needs to pull 8 values from a map with 100 elements in it. Which do you think would be preferable:

Walk in a for loop from begin to end once, pulling the elements out by switching on the key?

Or using find 8 times to get those values?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

半世蒼涼 2024-07-16 19:29:37

遍历列表将花费 O(n) 时间来找到随机元素。

Map 是平衡二叉树,因此查找的时间复杂度为 O(log n)。

因此,执行 8 会找到 8*log2(n) 的结果,并且遍历列表是 (n)。 列表越大,收益就越大,但在所有随机情况下,查找都会比迭代更快。

在非随机情况下,如果有理由认为您想要的项目在树中彼此靠近,或者靠近“开始”(左侧),那么步行/迭代会更快。 但这似乎不太可能。

Walking the list will take you O(n) time to find a random element.

Map is a balanced binary tree, so doing a find is O(log n).

Thus doing 8 finds results in 8*log2(n) and walking the list is (n). The larger the list, the larger the gains, but in all random cases doing finds will be faster than doing iterations.

In non-random cases if there is reason to thing the items you want are near each other in the tree, or near the "begining" (left side) then walking/iterating would be faster. But that seems unlikey.

血之狂魔 2024-07-16 19:29:37

虽然我会选择 find 选项,但人们过于强调渐近性能。

事实上,渐近性能对于可以接收相当大输入的算法来说是一个方便的指南,但即便如此,它也不是万无一失的。 对于任何合理的输入,渐近性能较差的算法很可能比其他算法更快。

有时,您的输入大小已知相当小(甚至是固定的)。 在这种情况下,渐近性能几乎毫无意义。

While I'd go with the find option, people put too much stress on asymptotic performance.

The fact is that asymptotic performance is a handy guide for algorithms that can receive reasonably large inputs, but even then it isn't foolproof. It's quite possible for an algorithm with worse asymptotic performance than another to be faster for any reasonable input.

And then there are times when your input size is known to be fairly small (or even fixed). In such cases asymptotic performance is almost meaningless.

动次打次papapa 2024-07-16 19:29:37

我会使用 find 8 次。 这将是更少(并且更明显)的代码。

您应该尽量不要根据较小的数字做出性能判断,因为(a)在这种规模下,无论哪种方式都不太可能成为性能瓶颈,并且(b)数字将来可能会发生变化 - 选择具有最佳渐近性的算法表现。

注意:您提到在键上“切换”...这可能适用于您的情况,但通常您无法在地图中切换键值。 允许这一点将使通过迭代在地图中搜索 M 项的代码变得更加复杂。

I would use find 8 times. It will be less (and more obvious) code.

You should try not make performance judgements based on small numbers since (a) it isn't likely to be a performance bottleneck either way at this size and (b) the numbers may change in the future -- choose the algorithm with the best asymptotic performance.

Note: you mention 'switching' on the key ... that may apply in your case, but in general you can't switch on the key value in a map. Allowing for this would make the code to searching for M items in a map by iteration even more complex.

不美如何 2024-07-16 19:29:37

8 发现是最好的,因为代码更简单、更清晰。

不过,考虑性能更有趣,所以我也会这样做。

正如 Artelius 在我写这个答案时所说的那样,忽略复杂性。 这不相关,因为您知道 n=100。 例如,插入排序的算法复杂度比快速排序差(至少在平均情况下),但在几乎所有实现中,对于小 n 来说,插入排序都比快速排序更快,因此快速排序在最后会突破到插入排序。 你的 n 也很小,所以限制为 n -> 无穷大并不重要。

由于这两个选项的代码都很容易编写,因此我建议对其进行分析。 这将(a)告诉您哪个更快,并且(b)证明两者都非常快,以至于您做什么并不重要(除非这是您的程序唯一做的事情,并且它做了很多)。 特别是当您谈论打开密钥时 - 如果密钥是整数类型,那么限制因素更有可能是内存缓存问题,而不是任何实际处理。

然而如果做不到这一点,通常比较搜索算法的方法是对比较进行计数,假设它们比遍历结构慢得多。 如果不出意外的话,每次比较都会访问内存并且是一个不可预测的分支,而这是 CPU 通常最不擅长的两件事。

如果您在开始之前对 8 个元素进行排序(大约需要 24 次比较)而不是您建议的切换,那么由于映射也是排序的,因此您只需在遍历的每个节点上进行一次比较,再加上每个项目一次比较您正在搜索(比较每一“侧”的一个节点。如果它们匹配,则增加两侧。如果不匹配,则增加具有较小元素的一侧)。 所以在最坏的情况下是 8+100,或者如果你在到达终点之前找到所有 8 个则更少。 但是,如果 8 个中的最后一个随机位于地图中,则它们的平均位置仍然约为 8/9。 因此称其为 8+88+24 = 120 次比较,其中 132 次为最坏情况。 最好的情况是 8(如果您要搜索的内容都在开头)+24(对于初始排序)= 32 次比较,或者如果您在排序上也很幸运或者您的 8 个搜索项目是由于某种原因已准备好排序。

红黑树(通常是映射)中节点的平均深度略高于 log2(n)。 在本例中将其称为 7,因为 2^7 是 128。因此,查找 8 个元素应该需要进行 56 次比较。 IIRC红黑树的平衡特性是最深节点不超过最浅节点深度的两倍。 所以最坏情况的深度是floor(2*log2(n)),称之为15,总共120,最好的是ceil(1/2 log2(n)),也就是4。这又是32次比较。

因此,假设比较是影响速度的唯一因素,则 8 次查找的速度将是线性遍历的 4 倍快和 4 倍慢之间,且平均值提高 2 倍。

不过,线性遍历可能会占用更多内存,因此可能会更慢。 但最终,对于 n=100,你所说的时间是毫秒,所以只需执行最简单的代码(可能是 8 个发现)。 我是否提到过,如果您确实想知道无法预测的速度,则只需对其进行分析即可?

8 finds is best, because the code's simpler and clearer.

Thinking about performance is more fun, though, so I'll do that too.

As Artelius said while I was writing this answer, ignore the complexity. It's not relevant because you know that n=100. For example, insertion sort has worse algorithmic complexity than quicksort (at least in the average case), but in almost all implementations, insertion sort is faster than quicksort for small n and so quicksorts break out to insertion sort towards the end. Your n is also small, so the limit as n -> infinity isn't what matters.

Since the code for both options is simple to write, I'd suggest profiling it. This will (a) tell you which is faster, and (b) prove that both are so fast that it doesn't matter which you do (unless it's the only thing your program does, and it does it a lot). Especially since you talk about switching on the key - if the key is an integral type then the limiting factor is more likely to be memory cache issues than any actual processing.

However failing that, usually the way to compare searching algorithms is to count the comparisons, on the assumption that they're much slower than traversing the structures. If nothing else, each comparison accesses memory and is an unpredictable branch, which are two things CPUs are often worst at.

If you sort your 8 elements before you start (which takes 24 comparisons or so) instead of the switch you propose, then because the map is also sorted, you only have to do one comparison at each node you traverse, plus one comparison per item you're searching for (compare one node from each "side". If they match increment both sides. If they don't match, increment the side with the smaller element). So that's 8+100 in the worst case, or less if you find all 8 before you get to the end. But the average position of the last of 8, if they're randomly located in the map, is still something like 8/9 of the way through. So call it 8+88+24 = 120 comparisons, with 132 as the worst case. The best case is 8 (if the things you're searching for are all at the start) +24 (for the initial sort) = 32 comparisons, or even better if you get lucky on the sort as well or your 8 search items are ready-sorted for some reason.

The average depth of a node in a Red-Black tree (which map usually is) is slightly over log2(n). Call it 7 in this case since 2^7 is 128. So finding 8 elements should take something like 56 comparisons. IIRC the balance characteristic of a Red-Black tree is that the deepest node is no more than twice the depth of the shallowest. So the worst case depth is floor(2*log2(n)), call it 15, for a total of 120, and the best is ceil(1/2 log2(n)), which is 4. That's 32 comparisons again.

So assuming that comparisons are the only thing that affects speed, the 8 finds will be somewhere between 4 times as fast, and 4 times as slow, as the linear traversal, with a 2x better average.

The linear traversal will probably touch more memory, though, so might be slower on that account. But ultimately for n=100 you're talking millisecond times, so just do whatever's the simplest code (probably the 8 finds). And did I mention that if you really want to know the speed you can't hope to predict, you just have to profile it?

勿挽旧人 2024-07-16 19:29:37

正如其他人所指出的,我可能只需要在地图上使用 find() 八次就可以了。 但根据您的需求,可以考虑另一种选择。 如果构造映射后映射中的项目不会发生太大变化,或者不需要将插入与查找交错,则可以尝试仅将键/值对保留在排序向量中。 如果这样做,那么您可以使用 lower_bound() 函数在对数时间内进行二分搜索。 这样做的好处是,如果您正在查找的键可以排序(并且您知道它们将始终存在),那么您可以使用从上一次查找返回的迭代器作为下一次查找的下限。 例如,

vector::iterator a = my_map.lower_bound( my_map.begin(), my_map.end(), "a" );
vector::iterator b = my_map.lower_bound( a + 1, my_map.end(), "b" );
vector::iterator c = my_map.lower_bound( b + 1, my_map.end(), "c" );
// ...

两种方法都有对数查找,但这可以帮助稍微减少常数。

As others have noted, I would probably just use find() on a map eight times and be done with it. But there's an alternative to consider depending on your needs. If the items in the map aren't going to change much after the map is constructed, or you don't need to interleave insertions with lookups, you might try just keeping the key/value pairs in a sorted vector. If you do this, then you can use the lower_bound() function to do a binary search in logarithmic time. This has the benefit that if the keys that you are looking for can be ordered (and you know that they'll always be present) then you can use the iterator returned from the previous lookup as the lower bound for the next. e.g.,

vector::iterator a = my_map.lower_bound( my_map.begin(), my_map.end(), "a" );
vector::iterator b = my_map.lower_bound( a + 1, my_map.end(), "b" );
vector::iterator c = my_map.lower_bound( b + 1, my_map.end(), "c" );
// ...

Both approaches have logarithmic lookup, but this can help reduce the constant somewhat.

一个人的旅程 2024-07-16 19:29:37

您应该使用 find 8 次。

将 switch 方法视为获取每个节点并比较它 8 次。 这是 800 次比较,并且您完全失去了被键入地图的所有好处,它还不如一个列表。

通过查找方法,您可以利用地图的优势来遍历列表。 我相信 std::maps 是作为二叉树实现的,这意味着搜索键只需要将键与树的深度进行比较,对于 100 个元素的二叉树来说,该深度为 8~。 现在,您只需 8*8 次比较(即 64 次比较)就可以找到所有 8 个元素。

You should use find 8 times.

Think of the switch approach as taking each node and comparing it 8 times. That's 800 comparisons, and you lose all benefit of the map being keyed at all, it might as well be a list.

With the find approach, you traverse the list using the benefits of the map. I believe std::maps are implemented as binary trees, meaning searching for a key will only require comparing your key down to the depth of the tree, which will be 8~ for a 100 element binary tree. Now, you can find all 8 elements with only 8*8 comparisons, or 64 comparisons.

莫多说 2024-07-16 19:29:37

下面是对它们的时间复杂度的分析(n 是映射中的项目计数),保证以对数或更好的时间复杂度进行 find 的查找:

8 * log2 n  for 8 times find
n for the iterate through all

第一个对于较小的数字来说较大(8 对于 n=例如 2),但在 43 岁左右,第一个会变得比第二个更好,并保持这种状态。 因此,您将需要使用第一种方法,因为它也更方便编码。

Here is the analysis for the time complexity of them (n is the item count in the map), which is guaranteed to do the lookup for find with logarithmic or better time complexity:

8 * log2 n  for 8 times find
n for the iterate through all

The first one is bigger for smaller numbers (8 for n=2 for example), but at around 43, the first one will become better than the second one and stays so. So, you will want to use the first method, given that it also is more convenient to code.

瑾夏年华 2024-07-16 19:29:37

如果这很重要,我会同时实现这两个功能并对其性能进行基准测试。

理论上是

8 * lg(100) >?< 100

其他考虑因素是这些数字中的任何一个是否会发生变化——它是否会超过 100 个元素? 您会进行超过 8 次搜索吗?

If it's that critical, I would implement both and benchmark the performance.

In theory it's whether

8 * lg(100) >?< 100

Other considerations are if either of those numbers will ever change -- will it ever be more than 100 elements; will you ever do more than 8 searches?

故事与诗 2024-07-16 19:29:37

让我们假设当它找到钥匙时“找到”保释金。

让我们进一步假设您明智地编码了“开关”,并且它在找到匹配项后停止检查。 我们还假设您不会费心编写代码以在找到所有 8 个之后就可以结束整个过程(这可能会很痛苦)。

“8 find”方法预计(平均:平均)执行 50 * 8 = 400 次比较。

“switch”方法预计(iow:平均)执行 (8 * 100) - 28 = 772 次比较。

所以从比较来看,8 个查找方法更好。 然而,比较的次数足够少,因此您最好选择更容易理解的选项。 不过,这也可能是 8 find 方法。

Let's assume "find" bails when it finds the key.

Let's further assume that you code the "switch" sensibly, and it quits checking after it finds a match. We will also assume you don't bother to code it to bail on the whole process once all 8 have been found (that would probably be a pain to code up).

The "8 find" approach can expect (iow: on average) to perform 50 * 8 = 400 comparisons.

The "switch" approach can expect (iow: on average) to perform (8 * 100) - 28 = 772 comparisons.

So in terms of comparisons, the 8 finds approach is better. However, the number of comparisons is small enough that you'd be better off just going with the option that is easier to understand. That will probably be the 8 find approach too though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文