STL+订购套装 +没有重复项

发布于 2024-10-07 23:35:50 字数 110 浏览 3 评论 0原文

我需要一组有序且不重复的值。 那么,什么是快速/最好的方法:

1 - 创建一个向量,对其进行排序并删除重复项? 2 - 使用一种“排序”向量(如果存在)?

哪一种效率更高?

I need to have an ordered set of values without duplicates.
So, what is the fast/best method :

1 - Create a vector, sort it and remove duplicates ?
2 - Use a kind of "sorted" vector (if it exists) ?

Which one can be the more efficient ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

祁梦 2024-10-14 23:35:50

为什么不使用 std::set

Why wouldn't you use a std::set?

雪花飘飘的天空 2024-10-14 23:35:50

如果您要加载列表一次然后多次使用它,那么使用 std::vector 而不是 std::set 可能会在内存使用和迭代方面更有效。

如果您要不断添加和删除元素,那么您绝对应该使用 std::set。

对于一般用途,使用 std::set 因为它的工作量较少(构建向量需要您在完成附加所有元素后排序并删除重复项),除非您对低内存使用或其他一些方面的效率有特殊需要性能下降表明需要向量。

If you are going to load the list once then use it multiple times then using std::vector instead of std::set will probably be more efficient in memory usage and iterating through it.

If you are going to continually add and remove elements you should definitely use std::set.

For general purpose use std::set because it is less work (building the vector requires you to sort and remove duplicates after you have finished appending all the elements), unless you have a particular need for efficiency in low memory-use or some other performance hit that indicates vector is required.

╰ゝ天使的微笑 2024-10-14 23:35:50

使用 std::set。它是有序的,并且不允许重复。

唯一的缺点是您无法随机访问元素,尽管这没有指定为要求。

Use std::set. It's ordered, and it does not allow duplicates.

The only downside is that you don't get random access to the elements though this was not specified as a requirement.

香草可樂 2024-10-14 23:35:50

效率将取决于您所拥有的插入/访问的比率(即您需要对向量进行排序的次数)。如果性能确实很重要,我建议您尝试这两种方法,并针对应用程序使用的真实情况使用最快的一种。

注意:std::set 不是排序向量,因为它在内存中不连续(它是一棵树)。
您想要的“排序向量”是 std::vector 上的堆。请参阅:http://stdcxx.apache.org/doc/stdlibug/14-7 .html

The efficiency will depend on the ratio of insertions/accesses you have (i.e., the number of times you'll need to sort your vector). If performance is really important there, I suggest that you try both approaches and use the fastest one for a real case of application usage.

Note: std::set is not a sorted vector because it is not contiguous in memory (it is a tree).
The "sorted vector" you want is a heap over std::vector. See: http://stdcxx.apache.org/doc/stdlibug/14-7.html.

冬天的雪花 2024-10-14 23:35:50

总是有 Loki::AssocVector

否则你可以轻松地推出自己的:

  • 使用std::vectorstd::deque 作为基础容器
  • ,使用 lower_bound / upper_bound / equal_rangebinary_search 查找对象的通用算法
  • 当您已经知道该值不存在时,inplace_merge 非常有用

但实际上,请使用 std ::设置:)

There is always Loki::AssocVector

Otherwise you can easily roll your own:

  • use a std::vector or std::deque as the base container
  • use lower_bound / upper_bound / equal_range and binary_search generic algorithms to look up an object
  • also inplace_merge is great when you already know that the value is not present

But really, use a std::set :)

孤蝉 2024-10-14 23:35:50

一般来说,如果我需要快速一次性,我会同时使用一组列表,并执行类似this

#include <set>
#include <list>
#include <string>
#include <iostream>

using namespace std;

int main() {

    // set prevents dupes, list preserves order
    set<string> theset;
    list<set<string>::iterator> thelist;
    
    // insertion is like this:
    auto insert = [&] (const string &str) {
        auto inserted = theset.insert(str);
        if (inserted.second)
            thelist.push_back(inserted.first);
    };

    // then, for example:
    insert("zebra");       // first zebra
    insert("chair a");     // first chair a
    insert("desk");        // first desk
    insert("desk");
    insert("chair b");     // first chair b
    insert("chair a");
    insert("chair a");
    insert("table");       // first table
    insert("chair a");
    insert("xylophone");   // first xylophone
    insert("zebra");
    
    // access can be done like:
    for (auto istr : thelist)
        cout << *istr << endl;
    
}

您不必在那里使用 lambda,只是在此示例中键入更容易。无论如何,输出:

zebra
chair a
desk
chair b
table
xylophone

这里的关键点是:

  • set::insert 返回一个有用的 对,其中第一个值是新迭代器(如果insert)或现有迭代器(如果没有),第二个值为 true(如果已插入)或 false(如果没有)。
  • set::insert 不会使集合中的任何其他迭代器失效,无论是否发生插入。
  • 我们可以使用set来快速避免重复,并使用list来保持顺序。
  • 将集合的迭代器存储在列表中只是为了避免复制值。

那么,实现是:

  • 为值构造一个集合(用于重复检查)和一个迭代器列表(用于顺序保存)。
  • 插入时,始终尝试添加到集合中,但仅当它尚未在集合中时才添加到列表中(即,如果它不是重复的)。
  • 访问时,只需记住它是迭代器列表,而不是值列表,因此 list::iterator 确实需要两级解引用才能访问该值。

优点和缺点是:

  • 优点:易于实施。
  • 优点:有效。
  • 优点:不复制值。
  • 优点:使用普通旧集合的唯一性语义。
  • 缺点:可能会使访问语法复杂化。
  • 缺点:如果要将整个列表转换为值列表,则必须迭代整个列表一次。
  • 缺点:你必须拖着两个集装箱到处走。
  • 缺点:没有足够的花哨来拥有自己的透明容器界面,即它本身不是一个容器。

您必须增强这一点并减少一些缺点,但代价是必须编写额外的代码:

  • 使用 将两个容器粘贴在 class/struct 中insert 方法以及您想要的任何其他内容,以使其更容易携带。
  • 如果您希望调用者能够知道它是否已插入(例如,如果未插入或某些内容,您可能必须删除它们),请从插入函数返回 .second
  • 模板表示 class/struct 支持任何值类型。如果您想使用其他类型的集或列表,您还可以模板化唯一且有序的容器类型。
  • 如果您想要正确的 STL 容器的所有功能,请将整个事物包装在一致的 Container 接口中。

此外,您还可以在那里找到有序集实现(此处的一些其他答案提供了链接)。当我只是快速编码时,我会使用我在这里描述的那个;它非常简单,对我来说,这样做通常比获取现有的实现更快。

Generally, if I need a quick one-off, I'll use both a set and a list together, and do something like this:

#include <set>
#include <list>
#include <string>
#include <iostream>

using namespace std;

int main() {

    // set prevents dupes, list preserves order
    set<string> theset;
    list<set<string>::iterator> thelist;
    
    // insertion is like this:
    auto insert = [&] (const string &str) {
        auto inserted = theset.insert(str);
        if (inserted.second)
            thelist.push_back(inserted.first);
    };

    // then, for example:
    insert("zebra");       // first zebra
    insert("chair a");     // first chair a
    insert("desk");        // first desk
    insert("desk");
    insert("chair b");     // first chair b
    insert("chair a");
    insert("chair a");
    insert("table");       // first table
    insert("chair a");
    insert("xylophone");   // first xylophone
    insert("zebra");
    
    // access can be done like:
    for (auto istr : thelist)
        cout << *istr << endl;
    
}

You don't have to use the lambda there, it was just easier to type for this example. Anyways, that outputs:

zebra
chair a
desk
chair b
table
xylophone

The key points here are:

  • set::insert returns a useful <iterator,bool> pair, where the first value is the new iterator (if inserted) or the existing iterator (if not), and the second value is true (if inserted) or false (if not).
  • set::insert doesn't invalidate any other iterators in the set, whether an insertion happened or not.
  • We can use a set to quickly avoid duplicates, and a list to preserve order.
  • Store the set's iterators in the list just to avoid copying values.

And the implementation, then, is:

  • Construct a set for values (for dupe checks) and a list of iterators (for order preservation).
  • When inserting, always try to add to the set, but only add to the list if it wasn't already in the set (i.e. if it's not a dupe).
  • When accessing, just remember its a list of iterators, not a list of values, so a list<set::iterator>::iterator does need two levels of dereferencing to access the value.

The pros and cons are:

  • Pro: Easy to implement.
  • Pro: Works.
  • Pro: Doesn't copy values.
  • Pro: Uses uniqueness semantics of a plain old set.
  • Con: Can complicate access syntax.
  • Con: Must iterate over entire list once if you want to convert it to a list of values.
  • Con: You've got to lug two containers around.
  • Con: Isn't fancy enough to have its own transparent container interface, i.e. it is not, itself, a container.

Options you have to enhance this and reduce some of the cons, at the cost of having to write extra code:

  • Stick both containers in a class/struct with an insert method and whatever else you want, to make carrying it around a bit easier.
  • Return .second from your insert function if you want the caller to be able to know if it was inserted or not (e.g. maybe you have to delete things if they aren't inserted or something).
  • Template said class/struct with support for whatever value type. You could also template the unique and ordered container types, if you want to use other types of sets or lists.
  • Wrap the whole thing in a conformant Container interface, if you'd like all the features of a proper STL container.

Also, you can find ordered set implementations out there (some of the other answers here provide links). I use the one I described here when I'm just coding things quickly; it's simple enough that it's usually quicker for me to just do this than it is to go get an existing implementation.

时常饿 2024-10-14 23:35:50

这取决于你想要什么效率。如果您想要“快”的东西,请使用 std::set<> (正如其他人已经建议的那样)。

但是,如果您需要 chache 一致性或将事物保留在向量中(保证对齐内存)而不是集合(没有任何保证,如果我没记错的话,作为树实现),那么您必须直接将 std::vector 与一些标准算法假设您提供的容器已经排序(然后使检查速度更快),例如 std::二进制搜索()。

That depends on what efficiency you want. If you want something that's "just fast", use std::set<> (as others already suggested).

However, if you need chache coherency or to keep things in a vector (aligned memory guaranteed) instead of a set (nothing guaranteed, implemented as a tree if I remember correctly) then you'll have to ust directly std::vector combined with some standard algorithms that assume the container you provide is already sorted (then making the check faster), like std::binary_search().

智商已欠费 2024-10-14 23:35:50

插入集合需要 log(n)。而且排序是免费的。

插入向量(push_back)需要常数时间。对向量进行排序需要 n*log(n)。
但您仍然需要删除重复项。

如果一次性插入然后排序,也可以考虑向量。如果您经常插入,则设置是正确的。

Insert into a set takes log(n). And the sort is free.

Insert into a vector (push_back) takes constant time. Sorting a vector takes n*log(n).
But you still need to remove duplicates.

If you insert in one go and then sort, you can consider also vector. If you insert frequently set is the right one.

万劫不复 2024-10-14 23:35:50

在您的 .h 或 .hpp 中尝试一下:

struct TestWithTime
{
    TestWithTime(unsigned long long timeSecs) : m_timeSecs(timeSecs) {}

    unsigned long long m_timeSecs;
}

struct OrderedByTime
{
    bool operator() (const TestWithTime* first,  const TestWithTime* second) const
    {
        // Important: if the time is equal
        if (first->m_timeSecs == second->m_timeSecs)
        {
            // then compare the pointers
            return first < second;
        }
        return first->m_timeSecs < second->m_timeSecs;
    }
};

typedef std::set<TestWithTime*, OrderedByTime> OrderedDataByTime;

现在您可以使用您的 OrderedDataByTime 集!

Try this in your .h or .hpp:

struct TestWithTime
{
    TestWithTime(unsigned long long timeSecs) : m_timeSecs(timeSecs) {}

    unsigned long long m_timeSecs;
}

struct OrderedByTime
{
    bool operator() (const TestWithTime* first,  const TestWithTime* second) const
    {
        // Important: if the time is equal
        if (first->m_timeSecs == second->m_timeSecs)
        {
            // then compare the pointers
            return first < second;
        }
        return first->m_timeSecs < second->m_timeSecs;
    }
};

typedef std::set<TestWithTime*, OrderedByTime> OrderedDataByTime;

Now you can use your OrderedDataByTime set !!

终止放荡 2024-10-14 23:35:50

有序集基本上是 g++ 中基于策略的数据结构,它使唯一元素保持排序顺序。它接近 STL 中的 set 数据结构,在 log( 中执行操作n) 复杂度,并以 log(n) 复杂度执行两个附加操作。 :)

  • order_of_key (n) :小于 n 的项目数
  • find_by_order(n)n-th集合中的元素(索引为 0)

了解更多详细信息,关注这个链接

Ordered set is basically a policy based data structure in g++ that keeps the unique elements in sorted order. It's close to set data structure in STL that performs operations in log(n) complexity and performs two additional operations also in log(n) complexity. :)

  • order_of_key (n) : Number of items that smaller than n
  • find_by_order(n) : n-th element in a set (in 0 indexed)

for more details, follow this link

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文