如何快速地将元素重复插入到排序列表中
我没有接受过正式的 CS 培训,所以请耐心等待。
我需要做一个模拟,可以抽象为以下内容(省略细节):
我们有一个代表事件发生时间的实数列表。在 每一步,我们
- 删除第一个事件,并且
- 作为“处理”它的结果,一些其他事件可能会在稍后的时间插入到列表中
并重复多次。
问题
我可以使用什么数据结构/算法来尽可能有效地实现这一点?我需要显着增加列表中事件/数字的数量。首要任务是让长列表的处理速度尽可能快。
由于我是在 C++ 中执行此操作,STL 或 boost 中已有哪些数据结构可以使实现变得简单?
更多详细信息:
列表中的事件数量是可变的,但保证在 n
和 2*n
之间,其中 n
是一些模拟参数。当事件时间增加时,最新事件和最早事件的时间差也保证小于常数T
。最后,我怀疑时间上事件的密度虽然不是恒定的,但也有一个上限和下限(即所有事件永远不会强烈聚集在单个时间点周围)
到目前为止的努力:
正如问题的标题所示,我正在考虑使用排序的数字列表。如果我使用链表进行恒定时间插入,那么我很难找到以快速(亚线性)方式插入新事件的位置。
现在,我使用一种近似方法,将时间划分为多个桶,并跟踪每个桶中有多少事件。然后随着时间的“流逝”,对桶进行逐一处理,从前面删除一个桶时总是在末尾添加一个新桶,从而保持桶的数量不变。这很快,但只是一个近似值。
I do not have formal CS training, so bear with me.
I need to do a simulation, which can abstracted away to the following (omitting the details):
We have a list of real numbers representing the times of events. In
each step, we
- remove the first event, and
- as a result of "processing" it, a few other events may get inserted into the list at a strictly later time
and repeat this many times.
Questions
What data structure / algorithm can I use to implement this as efficiently as possible? I need to increase the number of events/numbers in the list significantly. The priority is to make this as fast as possible for a long list.
Since I'm doing this in C++, what data structures are already available in the STL or boost that will make it simple to implement this?
More details:
The number of events in the list is variable, but it's guaranteed to be between n
and 2*n
where n
is some simulation parameter. While the event times are increasing, the time-difference of the latest and earliest events is also guaranteed to be less than a constant T
. Finally, I suspect that the density of events in time, while not constant, also has an upper and lower bound (i.e. all the events will never be strongly clustered around a single point in time)
Efforts so far:
As the title of the question says, I was thinking of using a sorted list of numbers. If I use a linked list for constant time insertion, then I have trouble finding the position where to insert new events in a fast (sublinear) way.
Right now I am using an approximation where I divide time into buckets, and keep track of how many event are there in each bucket. Then process the buckets one-by-one as time "passes", always adding a new bucket at the end when removing one from the front, thus keeping the number of buckets constant. This is fast, but only an approximation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
最小堆可能适合您的需求。这里有一个解释,我认为STL提供了
priority_queue
为你。插入时间为 O(log N),删除时间为 O(log N)
A min-heap might suit your needs. There's an explanation here and I think STL provides the
priority_queue
for you.Insertion time is O(log N), removal is O(log N)
听起来您需要/想要一个优先级队列。如果内存可用,标准库中的优先级队列适配器被编写为检索最大的项目而不是最小的项目,因此您必须指定它使用 std::greater 进行比较。
除此之外,它几乎完全满足了您的要求:快速访问/删除最小/最大项目的能力,以及快速插入新项目的能力。虽然它不能按顺序维护所有项目,但它确实保持了足够的顺序,使其仍然可以快速找到/删除一个最小(或最大)的项目。
It sounds like you need/want a priority queue. If memory serves, the priority queue adapter in the standard library is written to retrieve the largest items instead of the smallest, so you'll have to specify that it use
std::greater
for comparison.Other than that, it provides just about exactly what you've asked for: the ability to quickly access/remove the smallest/largest item, and the ability to insert new items quickly. While it doesn't maintain all the items in order, it does maintain enough order that it can still find/remove the one smallest (or largest) item quickly.
我将从一个基本的优先级队列开始,看看它是否足够快。
如果没有,那么您可以考虑编写一些自定义的内容。
http://en.wikipedia.org/wiki/Priority_queue
I would start with a basic priority queue, and see if that's fast enough.
If not, then you can look at writing something custom.
http://en.wikipedia.org/wiki/Priority_queue
二叉树总是排序的,并且比线性列表具有更快的访问时间。搜索、插入和删除时间均为 O(log(n))。
但这取决于物品是否必须一直进行排序,或者仅在过程完成后进行排序。在后一种情况下,哈希表可能更快。在该过程结束时,您可以将项目复制到数组或列表中并对其进行排序。
A binary tree is always sorted and has faster access times than a linear list. Search, insert and delete times are O(log(n)).
But it depends whether the items have to be sorted all the time, or only after the process is finished. In the latter case a hash table is probably faster. At the end of the process you then would copy the items to an array or a list and sort it.