STL+订购套装 +没有重复项
我需要一组有序且不重复的值。 那么,什么是快速/最好的方法:
1 - 创建一个向量,对其进行排序并删除重复项? 2 - 使用一种“排序”向量(如果存在)?
哪一种效率更高?
I need to have an ordered set of values without duplicates.
So, what is the fast/best method :
1 - Create a vector, sort it and remove duplicates ?
2 - Use a kind of "sorted" vector (if it exists) ?
Which one can be the more efficient ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
为什么不使用
std::set
?Why wouldn't you use a
std::set
?如果您要加载列表一次然后多次使用它,那么使用 std::vector 而不是 std::set 可能会在内存使用和迭代方面更有效。
如果您要不断添加和删除元素,那么您绝对应该使用 std::set。
对于一般用途,使用 std::set 因为它的工作量较少(构建向量需要您在完成附加所有元素后排序并删除重复项),除非您对低内存使用或其他一些方面的效率有特殊需要性能下降表明需要向量。
If you are going to load the list once then use it multiple times then using std::vector instead of std::set will probably be more efficient in memory usage and iterating through it.
If you are going to continually add and remove elements you should definitely use std::set.
For general purpose use std::set because it is less work (building the vector requires you to sort and remove duplicates after you have finished appending all the elements), unless you have a particular need for efficiency in low memory-use or some other performance hit that indicates vector is required.
使用 std::set。它是有序的,并且不允许重复。
唯一的缺点是您无法随机访问元素,尽管这没有指定为要求。
Use std::set. It's ordered, and it does not allow duplicates.
The only downside is that you don't get random access to the elements though this was not specified as a requirement.
效率将取决于您所拥有的插入/访问的比率(即您需要对向量进行排序的次数)。如果性能确实很重要,我建议您尝试这两种方法,并针对应用程序使用的真实情况使用最快的一种。
注意:std::set 不是排序向量,因为它在内存中不连续(它是一棵树)。
您想要的“排序向量”是
std::vector
上的堆。请参阅:http://stdcxx.apache.org/doc/stdlibug/14-7 .html。The efficiency will depend on the ratio of insertions/accesses you have (i.e., the number of times you'll need to sort your vector). If performance is really important there, I suggest that you try both approaches and use the fastest one for a real case of application usage.
Note:
std::set
is not a sorted vector because it is not contiguous in memory (it is a tree).The "sorted vector" you want is a heap over
std::vector
. See: http://stdcxx.apache.org/doc/stdlibug/14-7.html.总是有 Loki::AssocVector
否则你可以轻松地推出自己的:
std::vector
或std::deque
作为基础容器lower_bound
/upper_bound
/equal_range
和binary_search
查找对象的通用算法inplace_merge
非常有用但实际上,请使用
std ::设置
:)There is always Loki::AssocVector
Otherwise you can easily roll your own:
std::vector
orstd::deque
as the base containerlower_bound
/upper_bound
/equal_range
andbinary_search
generic algorithms to look up an objectinplace_merge
is great when you already know that the value is not presentBut really, use a
std::set
:)一般来说,如果我需要快速一次性,我会同时使用一组和列表,并执行类似this:
您不必在那里使用 lambda,只是在此示例中键入更容易。无论如何,输出:
这里的关键点是:
set::insert
返回一个有用的
对,其中第一个值是新迭代器(如果insert)或现有迭代器(如果没有),第二个值为 true(如果已插入)或 false(如果没有)。set::insert
不会使集合中的任何其他迭代器失效,无论是否发生插入。set
来快速避免重复,并使用list
来保持顺序。那么,实现是:
集合
(用于重复检查)和一个迭代器列表
(用于顺序保存)。list::iterator
确实需要两级解引用才能访问该值。优点和缺点是:
集合
的唯一性语义。您必须增强这一点并减少一些缺点,但代价是必须编写额外的代码:
将两个容器粘贴在
方法以及您想要的任何其他内容,以使其更容易携带。class
/struct
中insert.second
。class
/struct
支持任何值类型。如果您想使用其他类型的集或列表,您还可以模板化唯一且有序的容器类型。此外,您还可以在那里找到有序集实现(此处的一些其他答案提供了链接)。当我只是快速编码时,我会使用我在这里描述的那个;它非常简单,对我来说,这样做通常比获取现有的实现更快。
Generally, if I need a quick one-off, I'll use both a set and a list together, and do something like this:
You don't have to use the lambda there, it was just easier to type for this example. Anyways, that outputs:
The key points here are:
set::insert
returns a useful<iterator,bool>
pair, where the first value is the new iterator (if inserted) or the existing iterator (if not), and the second value is true (if inserted) or false (if not).set::insert
doesn't invalidate any other iterators in the set, whether an insertion happened or not.set
to quickly avoid duplicates, and alist
to preserve order.And the implementation, then, is:
set
for values (for dupe checks) and alist
of iterators (for order preservation).list<set::iterator>::iterator
does need two levels of dereferencing to access the value.The pros and cons are:
set
.Options you have to enhance this and reduce some of the cons, at the cost of having to write extra code:
class
/struct
with aninsert
method and whatever else you want, to make carrying it around a bit easier..second
from your insert function if you want the caller to be able to know if it was inserted or not (e.g. maybe you have to delete things if they aren't inserted or something).class
/struct
with support for whatever value type. You could also template the unique and ordered container types, if you want to use other types of sets or lists.Also, you can find ordered set implementations out there (some of the other answers here provide links). I use the one I described here when I'm just coding things quickly; it's simple enough that it's usually quicker for me to just do this than it is to go get an existing implementation.
这取决于你想要什么效率。如果您想要“快”的东西,请使用 std::set<> (正如其他人已经建议的那样)。
但是,如果您需要 chache 一致性或将事物保留在向量中(保证对齐内存)而不是集合(没有任何保证,如果我没记错的话,作为树实现),那么您必须直接将 std::vector 与一些标准算法假设您提供的容器已经排序(然后使检查速度更快),例如 std::二进制搜索()。
That depends on what efficiency you want. If you want something that's "just fast", use std::set<> (as others already suggested).
However, if you need chache coherency or to keep things in a vector (aligned memory guaranteed) instead of a set (nothing guaranteed, implemented as a tree if I remember correctly) then you'll have to ust directly std::vector combined with some standard algorithms that assume the container you provide is already sorted (then making the check faster), like std::binary_search().
插入集合需要 log(n)。而且排序是免费的。
插入向量(push_back)需要常数时间。对向量进行排序需要 n*log(n)。
但您仍然需要删除重复项。
如果一次性插入然后排序,也可以考虑向量。如果您经常插入,则设置是正确的。
Insert into a set takes log(n). And the sort is free.
Insert into a vector (push_back) takes constant time. Sorting a vector takes n*log(n).
But you still need to remove duplicates.
If you insert in one go and then sort, you can consider also vector. If you insert frequently set is the right one.
在您的 .h 或 .hpp 中尝试一下:
现在您可以使用您的 OrderedDataByTime 集!
Try this in your .h or .hpp:
Now you can use your OrderedDataByTime set !!
有序集基本上是 g++ 中基于策略的数据结构,它使唯一元素保持排序顺序。它接近 STL 中的 set 数据结构,在
log( 中执行操作n)
复杂度,并以 log(n) 复杂度执行两个附加操作。 :)order_of_key (n)
:小于n
的项目数find_by_order(n)
:n
-th集合中的元素(索引为 0)了解更多详细信息,关注这个链接
Ordered set is basically a policy based data structure in g++ that keeps the unique elements in sorted order. It's close to set data structure in STL that performs operations in
log(n)
complexity and performs two additional operations also in log(n) complexity. :)order_of_key (n)
: Number of items that smaller thann
find_by_order(n)
:n
-th element in a set (in 0 indexed)for more details, follow this link