存储定义列表以实现最快查找的最佳方法

发布于 2024-09-07 21:23:06 字数 402 浏览 7 评论 0原文

我有某种字典文件,如下所示:

 UTM University of Tennessee at Martin
 UMD University of Maryland

它是一个 3 个字母的缩写词,后跟定义,以换行符分隔。 该文件总共有 9282 个定义。

我的问题是:

1)存储这些定义的最佳方式是什么?我应该将它们放在地图中、向量中、将它们存储在数组中、将它们放在 txt 文件中并扫描它以查找我需要的首字母缩略词吗?其他?速度是关键。 2)根据您的答案,我应该使用哪些函数来查找首字母缩略词,然后仅检索定义?

预先感谢您的帮助。

编辑/新的相关问题:如果我不希望我的应用程序依赖于外部txt文件,那么最好的方法是什么?

I have some sort of dictionary file that looks like this:

 UTM University of Tennessee at Martin
 UMD University of Maryland

It is a 3 letters acronym followed by the definition, separated by newlines.
In total the file has 9282 definitions.

My questions are:

1) What would be the best way to store this definitions? Should I place them in a map, in a vector, store them in an array, leave them in a txt file and scan it for the acronym i need ? Other? Speed is key here.
2) Depending on your answer, what functions should i be using in order to lookup the acronym and then to retrieve just the definition?

Thanks in advance for your help.

EDIT / New Related Question: If i didn't want my application to depend on an external txt file, what would be the best way to do it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

草莓酥 2024-09-14 21:23:06

std::map 很简单,是基本 STL 的一部分。这可能是您最简单的选择。

如果速度确实非常重要,您可以对几个选项进行基准测试:

  • 使用哈希表(tr1::hash_map,或boost::unordered_map)用于 O(1) 查找(它需要哈希)。
  • 使用 std::map O(log n ) 查找
  • 创建一个包含 26^3 个元素的 vector (或 vector)(假设首字母缩略词都是字母 AZ),并转换首字母缩略词进入索引。

我猜想矢量选项(到目前为止)是最快的,但它也是最不明显、最难维护、最难扩展到更大的数据集的。

您可以将 const char *acronym; 转换为索引,如下所示:

const char *vector_of_names[26*26*26];

// Input 3-letter acronym, outputs the associated name.
const char *getName(const char* acronym) {
  return vector_of_names[
      ((acronyms[0]-'A') * 26*26) +
      ((acronyms[1]-'A') * 26) +
       (acronyms[2]-'A')];
}

std::map is easy and is part of basic STL. It's probably your simplest option.

If speed is really really important, you can benchmark a few options:

  • use a hash table (tr1::hash_map, or boost::unordered_map) for O(1) lookup (it requires a hash).
  • using std::map O(log n) lookups
  • create a vector<string> (or vector<const char*>) with 26^3 elements (assuming acronyms are all letters A-Z), and convert the acronym into an index.

I'd guess that the vector option would be (by far) the fastest, but it's also the least obvious, hardest to maintain, and hardest to scale to larger datasets.

You can turn const char *acronym; into an index with something like this:

const char *vector_of_names[26*26*26];

// Input 3-letter acronym, outputs the associated name.
const char *getName(const char* acronym) {
  return vector_of_names[
      ((acronyms[0]-'A') * 26*26) +
      ((acronyms[1]-'A') * 26) +
       (acronyms[2]-'A')];
}
寄人书 2024-09-14 21:23:06

最快的查找可能是提前构建的完美哈希表。与所提供的其他解决方案相比,这需要更多的编码,因此在尝试之前请确保您需要它。

The fastest possible lookup is probably a perfect hash table which is built ahead of time. This would require more coding than the other solutions presented, so make sure you need it before you try it.

素年丶 2024-09-14 21:23:06

如果速度很重要,那么哈希映射似乎是最佳选择。 Boost::Unordered 中有一个。否则 std::map 也可能有效。

您的其他选择似乎不太可能:将信息保存在文本文件中并在需要时进行扫描会非常慢(线性复杂性+磁盘访问)。未排序的向量可以实现更快的查找,但为什么呢?你想要一张地图,就用一张。

If speed is important a hash map seems like the best choice. There is one in Boost::Unordered. Otherwise a std::map is likely to work, too.

Your other options don't seem likely choices: Keeping the information in a text file and scanning when needed would be horribly slow (linear complexity + disk access). An unsorted vector would enable faster lookup, but why? You want a map, use one.

跨年 2024-09-14 21:23:06

您应该使用 std::map (#include

) 提供关联单向查找,您的键是缩写词,您的值是全名。

您可以使用 insert 放入元素,并使用 operator[] 访问它们。

有关详细信息,请参阅此参考

You should use std::map (#include <map>) to provide associative one-way lookup, with your key being the acronym and your value being the full name.

You can use insert to put your elements in, and operator[] to access them.

See this reference for more information.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文