用于快速名称查找的容器

发布于 2024-07-14 03:43:07 字数 129 浏览 4 评论 0原文

我想存储字符串并为每个字符串分配一个唯一的 ID 号(索引就可以了)。 我只需要每个字符串的一份副本,并且需要快速查找。 我经常检查该字符串是否存在于表中,以至于我注意到性能受到影响。 为此使用的最佳容器是什么以及如何查找字符串是否存在?

I want to store strings and issue each with a unique ID number (an index would be fine). I would only need one copy of each string and I require quick lookup. I check if the string exist in the table often enough that i notice a performance hit. Whats the best container to use for this and how do i lookup if the string exist?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

无声情话 2024-07-21 03:43:08

我建议 tr1::unordered_map。 它是作为哈希图实现的,因此其查找的预期复杂度为 O(1),最坏情况为 O(n)。 如果您的编译器不支持 tr1,还有一个 boost 实现。

#include <string>
#include <iostream>
#include <tr1/unordered_map>

using namespace std;

int main()
{
    tr1::unordered_map<string, int> table;

    table["One"] = 1;
    table["Two"] = 2;

    cout << "find(\"One\") == " << boolalpha << (table.find("One") != table.end()) << endl; 
    cout << "find(\"Three\") == " << boolalpha << (table.find("Three") != table.end()) << endl; 

    return 0;
}

I would suggest tr1::unordered_map. It is implemented as a hashmap so it has an expected complexity of O(1) for lookups and a worst case of O(n). There is also a boost implementation if your compiler doesn't support tr1.

#include <string>
#include <iostream>
#include <tr1/unordered_map>

using namespace std;

int main()
{
    tr1::unordered_map<string, int> table;

    table["One"] = 1;
    table["Two"] = 2;

    cout << "find(\"One\") == " << boolalpha << (table.find("One") != table.end()) << endl; 
    cout << "find(\"Three\") == " << boolalpha << (table.find("Three") != table.end()) << endl; 

    return 0;
}
想念有你 2024-07-21 03:43:08

试试这个:

“替代文本”"
(来源:adrinael.net

try this:

alt text
(source: adrinael.net)

月朦胧 2024-07-21 03:43:08

尝试 std::map。

Try std::map.

━╋う一瞬間旳綻放 2024-07-21 03:43:08

首先您必须能够量化您的选择。 您还告诉我们,您感兴趣的主要使用模式是查找,而不是插入。

N 为您期望表中包含的字符串数量,并令 C 为表中存在的任何给定字符串的平均字符数。所述表(或在对照表检查的字符串中)。

  1. 基于哈希的方法的情况下,对于每次查找,您需要支付以下费用: p>

    • O(C) - 计算您要查找的字符串的哈希值
    • 介于 O(1 x C)O(N x C) 之间,其中 1..N 是您期望的成本根据哈希键遍历存储桶,此处乘以 C 以根据查找键重新检查每个字符串中的字符
    • 总时间:O(2 x C)O((N + 1) x C)
  2. 如果是基于 std::map 的方法(使用红黑树),对于每次查找,您需要支付以下费用:

    • 总时间:在 O(1 x C)O(log(N) x C) 之间 - 其中 O(log(N))< /code> 是最大树遍历成本,O(C)std::map 的通用 less<> 的时间> 实现需要在树遍历期间重新检查您的查找键

N 值较大且缺乏保证少于 log(N) 冲突的哈希函数的情况下,或者如果您只是想安全起见,< strong>您最好使用基于树的 (std::map) 方法。 如果 N 很小,无论如何,请使用基于哈希的方法(同时仍然确保哈希冲突较低。)

不过,在做出任何决定之前,您还应该检查:

First and foremost you must be able to quantify your options. You have also told us that the main usage pattern you're interested in is lookup, not insertion.

Let N be the number of strings that you expect to be having in the table, and let C be the average number of characters in any given string present in the said table (or in the strings that are checked against the table).

  1. In the case of a hash-based approach, for each lookup you pay the following costs:

    • O(C) - calculating the hash for the string you are about to look up
    • between O(1 x C) and O(N x C), where 1..N is the cost you expect from traversing the bucket based on hash key, here multiplied by C to re-check the characters in each string against the lookup key
    • total time: between O(2 x C) and O((N + 1) x C)
  2. In the case of a std::map-based approach (which uses red-black trees), for each lookup you pay the following costs:

    • total time: between O(1 x C) and O(log(N) x C) - where O(log(N)) is the maximal tree traversal cost, and O(C) is the time that std::map's generic less<> implementation takes to recheck your lookup key during tree traversal

In the case of large values for N and in the absence of a hash function that guarantees less than log(N) collisions, or if you just want to play it safe, you're better off using a tree-based (std::map) approach. If N is small, by all means, use a hash-based approach (while still making sure that hash collision is low.)

Before making any decision, though, you should also check:

千寻… 2024-07-21 03:43:08

要搜索的字符串是否静态可用? 您可能想看看完美的哈希函数

Are the Strings to be searched available statically? You might want to look at a perfect hashing function

请持续率性 2024-07-21 03:43:08

听起来数组可以很好地工作,其中索引是数组的索引。 要检查它是否存在,只需确保索引位于数组范围内并且其条目不为 NULL。

编辑:如果您对列表进行排序,您始终可以使用应该具有快速查找功能的二分搜索。

编辑:此外,如果您想搜索字符串,也可以随时使用 std::map 。 这应该有一些不错的查找速度。

sounds like an array would work just fine where the index is the index into the array. To check if it exists, just make sure the index is in bounds of the array and that its entry isn't NULL.

EDIT: if you sort the list, you could always use a binary search which should have fast lookup.

EDIT: Also, if you want to search for a string, you can always use a std::map<std::string, int> as well. This should have some decent lookup speeds.

遥远的她 2024-07-21 03:43:08

最简单的是使用 std::map。

它的工作原理如下:

#include <map>
using namespace std;

...

   map<string, int> myContainer;
   myContainer["foo"] = 5; // map string "foo" to id 5
   // Now check if "foo" has been added to the container:
   if (myContainer.find("foo") != myContainer.end())
   {
       // Yes!
       cout << "The ID of foo is " << myContainer["foo"];
   }
   // Let's get "foo" out of it
   myContainer.erase("foo")

Easiest is to use std::map.

It works like this:

#include <map>
using namespace std;

...

   map<string, int> myContainer;
   myContainer["foo"] = 5; // map string "foo" to id 5
   // Now check if "foo" has been added to the container:
   if (myContainer.find("foo") != myContainer.end())
   {
       // Yes!
       cout << "The ID of foo is " << myContainer["foo"];
   }
   // Let's get "foo" out of it
   myContainer.erase("foo")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文