当前位置：文江博客话题详情

使用 std::map 和 std::string 键与 int 键的成本？

发布于 2024-08-13 05:16:56 字数 249 浏览 2 评论 0原文

我知道单个地图查询最多需要 log(N) 时间。不过我想知道，我见过很多使用字符串作为映射键的示例。例如，将 std::string 而不是 int 关联为映射的键的性能成本是多少？

std::map; someMap; 与 std::map; someMap;

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云柯 2024-08-20 05:16:56

分析渐近性能算法正在研究必须执行的操作以及它们添加到方程中的成本。为此，您需要首先了解执行的操作是什么，然后评估其成本。

在平衡二叉树（恰好是映射）中搜索键需要 O( log N ) 复杂的操作。这些操作中的每一个都意味着比较密钥是否匹配，如果密钥不匹配，则遵循适当的指针（子级）。这意味着总成本与这两个操作的成本的 log N 倍成正比。跟随指针是一个常数时间操作O(1)，并且比较键取决于键。对于整数键，比较速度很快 O(1)。比较两个字符串是另一回事，它所花费的时间与所涉及的字符串的大小成正比O(L)（其中我故意使用L作为长度string 参数，而不是更常见的 N

当您将所有成本相加时，您会发现使用整数作为键，总成本为 O( log N )*( O( 1) + O(1) ) 相当于 O( log N ) (O(1) 隐藏在 的常量中。 >O 表示法默默隐藏，

如果您使用字符串作为键，则总成本为 O( log N )*( O(L) + O(1) )，其中常数时间操作。被成本更高的线性运算 O(L) 隐藏，并且可以转换为 O( L * log N ) 即在 a 中定位元素的成本。由字符串作为键的映射与存储在映射中的元素数量乘以用作键的字符串的平均长度的对数成正比。

请注意，大 O 表示法最适合用作确定算法如何进行的分析工具。当问题规模扩大时，它会表现出来，但它隐藏了许多对原始性能很重要的事实。

作为最简单的示例，如果将键从通用字符串更改为 1000 个字符的数组，则可以将该成本隐藏在从表示法中删除的常量中。比较 1000 个字符的数组是一个常量操作，只是需要相当多的时间。使用渐近符号，这只是一个 O( log N ) 运算，就像整数一样。

许多其他隐藏成本也会发生同样的情况，因为创建元素的成本通常被视为恒定时间操作，只是因为它不依赖于问题的参数（在每个元素中定位内存块的成本）分配不取决于您的数据集，而是取决于内存碎片，这超出了算法分析的范围，在 malloc 内获取锁以保证不是两个进程尝试返回同一内存块的成本取决于锁的争用取决于处理器、进程的数量以及它们执行的内存请求量......，同样超出了算法分析的范围）。当阅读大 O 表示法中的成本时，您必须意识到它的真正含义。

Analyzing algorithms for asymptotic performance is working on the operations that must be performed and the cost they add to the equation. For that you need to first know what are the performed operations and then evaluate its costs.

Searching for a key in a balanced binary tree (which maps happen to be) require O( log N ) complex operations. Each of those operations implies comparing the key for a match and following the appropriate pointer (child) if the key did not match. This means that the overall cost is proportional to log N times the cost of those two operations. Following pointers is a constant time operation O(1), and comparing keys depend on the key. For an integer key, comparisons are fast O(1). Comparing two strings is another story, it takes time proportional to the sizes of the strings involved O(L) (where I have used intentionally L as the length of string parameter instead of the more common N.

When you sum all the costs up you get that using integers as keys the total cost is O( log N )*( O(1) + O(1) ) that is equivalent to O( log N ). (O(1) gets hidden in the constant that the O notation silently hides.

If you use strings as keys, the total cost is O( log N )*( O(L) + O(1) ) where the constant time operation gets hidden by the more costly linear operation O(L) and can be converted into O( L * log N ). That is, the cost of locating an element in a map keyed by strings is proportional to the logarithm of the number of elements stored in the map times the average length of the strings used as keys.

Note that the big-O notation is most appropriate to use as an analysis tool to determine how the algorithm will behave when the size of the problem grows, but it hides many facts underneath that are important for raw performance.

As the simplest example, if you change the key from a generic string to an array of 1000 characters you can hide that cost within the constant dropped out of the notation. Comparing arrays of 1000 chars is a constant operation that just happens to take quite a bit of time. With the asymptotic notation that would just be a O( log N ) operation, as with integers.

The same happens with many other hidden costs, as the cost of creation of the elements that is usually considered as a constant time operation, just because it does not depend on the parameters to your problem (the cost of locating the block of memory in each allocation does not depend on your data set, but rather on memory fragmentation that is outside of the scope of the algorithm analysis, the cost of acquiring the lock inside malloc as to guarantee that not two processes try to return the same block of memory depends on the contention of the lock that depends itself number of processors, processes and how much memory requests they perform..., again out of the scope of the algorithm analysis). When reading costs in the big-O notation you must be conscious of what it really means.

回复收藏 0 原文

孤芳又自赏 2024-08-20 05:16:56

除了已经提到的比较字符串的时间复杂度之外，每次将项目添加到容器时，字符串键还会导致额外的内存分配。在某些情况下，例如高度并行的系统，全局分配器互斥体可能是性能问题的根源。

一般来说，您应该选择最适合您的情况的替代方案，并且仅根据实际性能测试进行优化。众所周知，很难判断什么会成为瓶颈。

回复收藏 0 原文

在梵高的星空下 2024-08-20 05:16:56

成本差异将与比较两个整数与比较两个字符串之间的成本差异相关。

比较两个字符串时，必须取消引用指针以获取第一个字符，然后比较它们。如果它们相同，则必须比较第二个字符，依此类推。如果您的字符串具有很长的公共前缀，这可能会稍微减慢该过程。不过，它不太可能像比较整数那么快。

回复收藏 0 原文

慕巷 2024-08-20 05:16:56

当然，代价是整数可以在真正的 O(1) 时间内进行比较，而字符串可以在 O(n) 时间内进行比较（n 是最大共享前缀）。此外，字符串的存储比整数的存储消耗更多的空间。
除了这些明显的差异之外，没有重大的性能成本。

回复收藏 0 原文

倾城泪 2024-08-20 05:16:56

首先，我怀疑在实际应用程序中，是否有字符串键或整数键会产生任何明显的差异。分析您的应用程序会告诉您它是否重要。

如果确实重要，您可以将密钥更改为如下所示（未经测试）：

class Key {
public:
    unsigned hash;
    std::string s;

    int cmp(const Key& other) {
        int diff = hash - other.hash;
        if (diff == 0)
            diff = strcmp(s, other.s);
        return diff;
}

现在您正在对两个字符串的哈希值进行 int 比较。如果哈希值不同，则字符串肯定会不同。如果哈希值相同，由于鸽洞原理，您仍然需要比较字符串。

First of all, I doubt that in a real application, whether you have string keys or int keys makes any noticeable difference. Profiling your application will tell you if it matters.

If it does matter, you could change your key to be something like this (untested):

class Key {
public:
    unsigned hash;
    std::string s;

    int cmp(const Key& other) {
        int diff = hash - other.hash;
        if (diff == 0)
            diff = strcmp(s, other.s);
        return diff;
}

Now you're doing an int comparison on the hashes of two strings. If the hashes are different, the strings are certainly different. If the hashes are the same, you still have to compare the strings because of the Pigeonhole Principle.

回复收藏 0 原文