数据结构的选择

发布于 2024-07-26 02:13:43 字数 104 浏览 6 评论 0 原文

我使用 C++,假设我想存储 40 个用户名,我将简单地使用一个数组。 但是,如果我想存储 40000 个用户名,就搜索速度而言这仍然是一个好主意吗? 我应该使用哪种数据结构来提高这个速度?

I use C++, say i want to store 40 usernames, I will simply use an array. However, if I want to store 40000 usernames is this still a good idea in terms of search speed? Which data structure should I use to improve this speed?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

稀香 2024-08-02 02:13:44

您需要指定插入和删除的要求。 是否需要在序列中的随机点删除和插入内容?

另外,为什么要求按顺序搜索? 您是否正在执行不适合哈希表查找的搜索?

目前我建议使用 dequelist。 通常,最好选择一个具有最简单的算法实现接口的容器,然后仅在性能不足且替代方案提供必要的加速时才更改选择。

向量有两个主要优点,没有每个对象的内存开销,尽管向量会过度分配以防止频繁复制,并且对象是连续存储的,因此顺序访问往往会很快。 这些也是它的缺点。 增长的向量需要重新分配和复制,并且从向量末端以外的任何位置插入和删除也需要复制。 对于具有大量对象或大型对象的向量,连续存储可能会产生问题,因为即使只有轻微的内存碎片,也很难满足连续存储要求。

列表不需要连续存储,但列表节点通常有两个指针的每个对象开销(在大多数实现中)。 这在非常小的对象列表中可能很重要(例如,在指针列表中,每个节点是数据项大小的 3 倍)。 不过,从列表中间插入和删除的成本非常低,并且列表节点一旦创建就不需要在内存中移动。

双端队列使用分块存储,因此它与向量类似,每个对象的开销较低,但不需要在整个容器上连续存储,因此不存在碎片内存空间的相同问题。 它通常是一个非常好的收藏选择,但常常被忽视。

You need to specify what the insertion and removal requirements are. Do things need to be removed and inserted at random points in the sequence?

Also, why the requirement to search sequentially? Are you doing searches that aren't suitable for a hash table lookup?

At the moment I'd suggest a deque or a list. Often it's best to choose a container with the interface that makes for the simplest implementation for your algorithm and then only change the choice if the performance is inadequate and an alternative provides the necessary speedup.

A vector has two principle advantages, there is no per-object memory overhead, although vectors will over-allocate to prevent frequent copying and objects are stored contiguously so sequential access tends to be fast. These are also its disadvantages. Growing vectors require reallocation and copying, and insertion and removal from anywhere other than the end of the vector also require copying. Contiguous storage can produce problems for vectors with large numbers of objects or large objects as the contiguous storage requirements can be hard to satisfy even with only mild memory fragmentation.

A list doesn't require contigous storage but list nodes usually have a per-object overhead of two pointers (in most implementation). This can be significant in list of very small objects (e.g. in a list of pointers, each node is 3x the size of the data item). Insertion and removal from the middle of a list is very cheap though and list nodes never need to me moved in memory once created.

A deque uses chunked storage, so it has a low per-object overhead similar to a vector, but doesn't require contiguous storage over the whole container so doesn't have the same problem with fragmented memory spaces. It is often a very good choice for collections and is often overlooked.

情绪 2024-08-02 02:13:44

根据经验,与 list 相比,更喜欢使用 vector,或者,饮食禁止,C 风格数组。

填充向量后,请确保使用 sort算法。 然后,您可以使用 findbinary_searchlower_bound。 (您不需要排序即可使用find。)

As a rule of thumb, prefer vector to list or, diety forbid, C-style array.

After the vector is filled, make sure it is properly ordered using the sort algorithm. You can then search for a particular record using either find, binary_search or lower_bound. (You don't need to sort to use find.)

迷路的信 2024-08-02 02:13:44

说真的,除非您处于资源受限的环境(嵌入式平台、电话或其他)。 使用 std::map,保存进行排序或搜索的努力,让容器来处理一切。 这可能是一个排序的树结构,可能是平衡的(例如红黑),这意味着您将获得良好的搜索性能。 除非数据的大小接近一两个指针的大小,否则您选择的任何数据结构的内存开销都是可以忽略不计的。 您的显卡可能有更多的内存,您将用完您正在考虑的数据。

正如其他人所说,没有什么充分的理由使用普通数组,如果您不想使用 map 使用 std::vectorstd:: list 取决于您是否需要插入/删除数据 (=>list) 或不需要 (=>vector)

还要考虑如果您确实需要内存中的所有数据,如何通过 sqlite。 或者甚至使用 sqlite 进行内存访问。 这完全取决于您需要如何处理数据。

Seriously unless you are in a resource constrained environment (embedded platform, phone, or other). Use a std::map, save the effort of doing sorting or searching and let the container take care of everything. This will possibly be a sorted tree structure, probably balance (e.g. Red-Black), which means you will get good searching performance. Unless the size of you data is close to the size of one or two pointers, the memory overhead of whatever data structure you pick is negligable. You Graphics Card probably has more memory that you are going to use up for the data you are think about.

As others said there is very little good reason to use vanilla array, if you don't want to use a map use std::vector or std::list depending on whether you need insert/delete data (=>list) or not (=>vector)

Also consider if you really need all that data in memory, how about putting it on disk via sqlite. Or even use sqlite for in memory access. It all depends on what you need to do with your data.

回首观望 2024-08-02 02:13:44

std::vector 和 std::list 似乎很适合这项任务。 如果您事先知道最大记录数,则可以使用数组。

std::vector and std::list seem good for this task. You can use an array if you know the maximum number of records beforehands.

仅此而已 2024-08-02 02:13:44

如果您只需要顺序搜索和存储,那么 list 是合适的容器。< br>

另外, vector 也不是一个糟糕的选择。

If you need only sequentially search and storage, then list is the proper container.

Also, vector wouldn't be a bad choice.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文