当前位置：文江博客话题详情

STL+订购套装 +没有重复项

发布于 2024-10-07 23:35:50 字数 110 浏览 3 评论 0原文

我需要一组有序且不重复的值。那么，什么是快速/最好的方法：

1 - 创建一个向量，对其进行排序并删除重复项？ 2 - 使用一种“排序”向量（如果存在）？

哪一种效率更高？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

祁梦 2024-10-14 23:35:50

为什么不使用 std::set？

回复收藏 0 原文

雪花飘飘的天空 2024-10-14 23:35:50

如果您要加载列表一次然后多次使用它，那么使用 std::vector 而不是 std::set 可能会在内存使用和迭代方面更有效。

如果您要不断添加和删除元素，那么您绝对应该使用 std::set。

对于一般用途，使用 std::set 因为它的工作量较少（构建向量需要您在完成附加所有元素后排序并删除重复项），除非您对低内存使用或其他一些方面的效率有特殊需要性能下降表明需要向量。

回复收藏 0 原文

╰ゝ天使的微笑 2024-10-14 23:35:50

使用 std::set。它是有序的，并且不允许重复。

唯一的缺点是您无法随机访问元素，尽管这没有指定为要求。

回复收藏 0 原文

香草可樂 2024-10-14 23:35:50

效率将取决于您所拥有的插入/访问的比率（即您需要对向量进行排序的次数）。如果性能确实很重要，我建议您尝试这两种方法，并针对应用程序使用的真实情况使用最快的一种。

注意：std::set 不是排序向量，因为它在内存中不连续（它是一棵树）。
您想要的“排序向量”是 std::vector 上的堆。请参阅：http://stdcxx.apache.org/doc/stdlibug/14-7 .html。

回复收藏 0 原文

冬天的雪花 2024-10-14 23:35:50

总是有 Loki::AssocVector

否则你可以轻松地推出自己的：

使用std::vector 或 std::deque 作为基础容器
，使用 lower_bound / upper_bound / equal_range 和 binary_search 查找对象的通用算法
当您已经知道该值不存在时，inplace_merge 非常有用

但实际上，请使用 std ::设置:)

回复收藏 0 原文

孤蝉 2024-10-14 23:35:50

一般来说，如果我需要快速一次性，我会同时使用一组和列表，并执行类似this：

#include <set>
#include <list>
#include <string>
#include <iostream>

using namespace std;

int main() {

    // set prevents dupes, list preserves order
    set<string> theset;
    list<set<string>::iterator> thelist;
    
    // insertion is like this:
    auto insert = [&] (const string &str) {
        auto inserted = theset.insert(str);
        if (inserted.second)
            thelist.push_back(inserted.first);
    };

    // then, for example:
    insert("zebra");       // first zebra
    insert("chair a");     // first chair a
    insert("desk");        // first desk
    insert("desk");
    insert("chair b");     // first chair b
    insert("chair a");
    insert("chair a");
    insert("table");       // first table
    insert("chair a");
    insert("xylophone");   // first xylophone
    insert("zebra");
    
    // access can be done like:
    for (auto istr : thelist)
        cout << *istr << endl;
    
}

您不必在那里使用 lambda，只是在此示例中键入更容易。无论如何，输出：

zebra
chair a
desk
chair b
table
xylophone

这里的关键点是：

set::insert 返回一个有用的对，其中第一个值是新迭代器（如果insert）或现有迭代器（如果没有），第二个值为 true（如果已插入）或 false（如果没有）。
set::insert 不会使集合中的任何其他迭代器失效，无论是否发生插入。
我们可以使用set来快速避免重复，并使用list来保持顺序。
将集合的迭代器存储在列表中只是为了避免复制值。

那么，实现是：

为值构造一个集合（用于重复检查）和一个迭代器列表（用于顺序保存）。
插入时，始终尝试添加到集合中，但仅当它尚未在集合中时才添加到列表中（即，如果它不是重复的）。
访问时，只需记住它是迭代器列表，而不是值列表，因此 list::iterator 确实需要两级解引用才能访问该值。

优点和缺点是：

优点：易于实施。
优点：有效。
优点：不复制值。
优点：使用普通旧集合的唯一性语义。
缺点：可能会使访问语法复杂化。
缺点：如果要将整个列表转换为值列表，则必须迭代整个列表一次。
缺点：你必须拖着两个集装箱到处走。
缺点：没有足够的花哨来拥有自己的透明容器界面，即它本身不是一个容器。

您必须增强这一点并减少一些缺点，但代价是必须编写额外的代码：

使用 将两个容器粘贴在 class/struct 中insert 方法以及您想要的任何其他内容，以使其更容易携带。
如果您希望调用者能够知道它是否已插入（例如，如果未插入或某些内容，您可能必须删除它们），请从插入函数返回 .second 。
模板表示 class/struct 支持任何值类型。如果您想使用其他类型的集或列表，您还可以模板化唯一且有序的容器类型。
如果您想要正确的 STL 容器的所有功能，请将整个事物包装在一致的 Container 接口中。

此外，您还可以在那里找到有序集实现（此处的一些其他答案提供了链接）。当我只是快速编码时，我会使用我在这里描述的那个；它非常简单，对我来说，这样做通常比获取现有的实现更快。

Generally, if I need a quick one-off, I'll use both a set and a list together, and do something like this:

#include <set>
#include <list>
#include <string>
#include <iostream>

using namespace std;

int main() {

    // set prevents dupes, list preserves order
    set<string> theset;
    list<set<string>::iterator> thelist;
    
    // insertion is like this:
    auto insert = [&] (const string &str) {
        auto inserted = theset.insert(str);
        if (inserted.second)
            thelist.push_back(inserted.first);
    };

    // then, for example:
    insert("zebra");       // first zebra
    insert("chair a");     // first chair a
    insert("desk");        // first desk
    insert("desk");
    insert("chair b");     // first chair b
    insert("chair a");
    insert("chair a");
    insert("table");       // first table
    insert("chair a");
    insert("xylophone");   // first xylophone
    insert("zebra");
    
    // access can be done like:
    for (auto istr : thelist)
        cout << *istr << endl;
    
}

You don't have to use the lambda there, it was just easier to type for this example. Anyways, that outputs:

zebra
chair a
desk
chair b
table
xylophone

The key points here are:

set::insert returns a useful <iterator,bool> pair, where the first value is the new iterator (if inserted) or the existing iterator (if not), and the second value is true (if inserted) or false (if not).
set::insert doesn't invalidate any other iterators in the set, whether an insertion happened or not.
We can use a set to quickly avoid duplicates, and a list to preserve order.
Store the set's iterators in the list just to avoid copying values.

And the implementation, then, is:

Construct a set for values (for dupe checks) and a list of iterators (for order preservation).
When inserting, always try to add to the set, but only add to the list if it wasn't already in the set (i.e. if it's not a dupe).
When accessing, just remember its a list of iterators, not a list of values, so a list<set::iterator>::iterator does need two levels of dereferencing to access the value.

The pros and cons are:

Pro: Easy to implement.
Pro: Works.
Pro: Doesn't copy values.
Pro: Uses uniqueness semantics of a plain old set.
Con: Can complicate access syntax.
Con: Must iterate over entire list once if you want to convert it to a list of values.
Con: You've got to lug two containers around.
Con: Isn't fancy enough to have its own transparent container interface, i.e. it is not, itself, a container.

Options you have to enhance this and reduce some of the cons, at the cost of having to write extra code:

Stick both containers in a class/struct with an insert method and whatever else you want, to make carrying it around a bit easier.
Return .second from your insert function if you want the caller to be able to know if it was inserted or not (e.g. maybe you have to delete things if they aren't inserted or something).
Template said class/struct with support for whatever value type. You could also template the unique and ordered container types, if you want to use other types of sets or lists.
Wrap the whole thing in a conformant Container interface, if you'd like all the features of a proper STL container.

Also, you can find ordered set implementations out there (some of the other answers here provide links). I use the one I described here when I'm just coding things quickly; it's simple enough that it's usually quicker for me to just do this than it is to go get an existing implementation.

回复收藏 0 原文

时常饿 2024-10-14 23:35:50

这取决于你想要什么效率。如果您想要“快”的东西，请使用 std::set<> （正如其他人已经建议的那样）。

但是，如果您需要 chache 一致性或将事物保留在向量中（保证对齐内存）而不是集合（没有任何保证，如果我没记错的话，作为树实现），那么您必须直接将 std::vector 与一些标准算法假设您提供的容器已经排序（然后使检查速度更快），例如 std::二进制搜索（）。

回复收藏 0 原文

智商已欠费 2024-10-14 23:35:50

插入集合需要 log(n)。而且排序是免费的。

插入向量（push_back）需要常数时间。对向量进行排序需要 n*log(n)。
但您仍然需要删除重复项。

如果一次性插入然后排序，也可以考虑向量。如果您经常插入，则设置是正确的。

回复收藏 0 原文

万劫不复 2024-10-14 23:35:50

在您的 .h 或 .hpp 中尝试一下：

struct TestWithTime
{
    TestWithTime(unsigned long long timeSecs) : m_timeSecs(timeSecs) {}

    unsigned long long m_timeSecs;
}

struct OrderedByTime
{
    bool operator() (const TestWithTime* first,  const TestWithTime* second) const
    {
        // Important: if the time is equal
        if (first->m_timeSecs == second->m_timeSecs)
        {
            // then compare the pointers
            return first < second;
        }
        return first->m_timeSecs < second->m_timeSecs;
    }
};

typedef std::set<TestWithTime*, OrderedByTime> OrderedDataByTime;

现在您可以使用您的 OrderedDataByTime 集！

Try this in your .h or .hpp:

struct TestWithTime
{
    TestWithTime(unsigned long long timeSecs) : m_timeSecs(timeSecs) {}

    unsigned long long m_timeSecs;
}

struct OrderedByTime
{
    bool operator() (const TestWithTime* first,  const TestWithTime* second) const
    {
        // Important: if the time is equal
        if (first->m_timeSecs == second->m_timeSecs)
        {
            // then compare the pointers
            return first < second;
        }
        return first->m_timeSecs < second->m_timeSecs;
    }
};

typedef std::set<TestWithTime*, OrderedByTime> OrderedDataByTime;

Now you can use your OrderedDataByTime set !!

回复收藏 0 原文