如何根据平等谓词在向量中删除重复物?

发布于 2025-01-22 18:02:15 字数 1269 浏览 2 评论 0 原文

我有一个 struct ,它看起来大致如下:

struct vec3
{
    int x;
    int y;
    int z;

    constexpr bool operator==(const vec3& v) const
    {
        return (x == v.x) && (y == v.y) && (z == v.z);
    }

    constexpr vec3 operator-() const
    {
        return {-x, -y, -z};
    }
};

然后,我生成了 std :: vector vec3 ,每个坐标的随机值。 使用的功能要求在该向量中,没有几个值 {v1,v2} 验证 v1 == -v2 。显然,我需要 operator == (上图中的一个)的定义,否则该问题是微不足道的。

我首先尝试 std :: set std :: sort + std :: unique ,但找不到任何方法来拥有 struct fileing 命名的要求比较两种方法都需要)。

我该如何继续?

注意:

这与对指针进行排序,也来自 c ++如何从类类型的向量中删除重复项?根据某些标准可以分类元素(我认为)。

I have a struct which roughly looks like that:

struct vec3
{
    int x;
    int y;
    int z;

    constexpr bool operator==(const vec3& v) const
    {
        return (x == v.x) && (y == v.y) && (z == v.z);
    }

    constexpr vec3 operator-() const
    {
        return {-x, -y, -z};
    }
};

I then generate a std::vector of vec3 with random values for each coordinates.
The function in which it is used requires that no couple of values {v1, v2} in that vector verifies v1 == -v2. I obviously need that definition of operator== (the one in the snippet above) elsewhere in code, otherwise that problem would be trivial.

I first attempted std::set and std::sort + std::unique, but could not find any way to have a struct filing named requirements Compare for that application (which is needed for both approaches).

How can I proceed?

Note:

This is somewhat different from Removing duplicates from a non-sortable vector in which pointers are sorted, and also from C++ how to remove duplicates from vector of Class type? in which the elements are sortable according to some criteria (I think).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

っ〆星空下的拥抱 2025-01-29 18:02:15

使用的函数要求在该向量填充V1 == -v2

中没有几个值{v1,v2}

,但找不到任何方法来使该应用程序的结构归档命名要求(两种方法都需要)。

在我看来,您正在尝试解决X,但这是您 xy-problem

实现满足 -v == v 平等的有序比较器非常简单。只需比较绝对值:

struct vec3
{
    int x;
    int y;
    int z;

    // normal comparison that treats -x != x
    friend auto operator<=>(const vec3&, const vec3&) = default;
};

// declare in same namespace as vec3 for ADL
vec3 abs(const vec3& v) {
    return {std::abs(v.x), std::abs(v.y), std::abs(v.z)};
}


struct abs_less {
    template< class T, class U>
    constexpr auto operator()( T&& lhs, U&& rhs ) const
        -> decltype(std::forward<T>(lhs) < std::forward<U>(rhs))
    {
        using std::abs; // for integers
        return abs(lhs) < abs(rhs); // this implementation assumes normal comparison operator
        // you can implement logic directly here if that's not possible
    }
};

此比较器与 STD :: SET std :: Sort + std :: unique 同时使用。示例设置:

std::set<vec3, abs_less> example;

当然,您可以直接超载操作员并使用 std :: Less Liest ,但是通常我会建议针对具有不寻常行为的非违约操作员过载。

The function in which it is used requires that no couple of values {v1, v2} in that vector fills v1 == -v2

but could not find any way to have a struct filing named requirements Compare for that application (which is needed for both approaches).

It seems to me that you're trying to solve X, but this is the Y in your XY-problem.

It's fairly simple to implement an ordered comparator that satisfies the equality of -v == v. Simply compare absolute values:

struct vec3
{
    int x;
    int y;
    int z;

    // normal comparison that treats -x != x
    friend auto operator<=>(const vec3&, const vec3&) = default;
};

// declare in same namespace as vec3 for ADL
vec3 abs(const vec3& v) {
    return {std::abs(v.x), std::abs(v.y), std::abs(v.z)};
}


struct abs_less {
    template< class T, class U>
    constexpr auto operator()( T&& lhs, U&& rhs ) const
        -> decltype(std::forward<T>(lhs) < std::forward<U>(rhs))
    {
        using std::abs; // for integers
        return abs(lhs) < abs(rhs); // this implementation assumes normal comparison operator
        // you can implement logic directly here if that's not possible
    }
};

This comparator works with both std::set and std::sort + std::unique. Example with set:

std::set<vec3, abs_less> example;

Of course, you could overload the operator< directly and use std::less, but usually I would recommend against non-defaulted operator overloads that have unusual behaviour.

老子叫无熙 2025-01-29 18:02:15

我相信最简单的方法是使用 std :: Unordered_set 并利用其第二和第三模板参数。

方法

  1. 定义哈希函数

此步骤目标是提供“预滤波”步骤,该步骤根据上下文中的含义消除了明显的重复(因此,在这里, V1 -v1 应该具有相同的哈希)。

这是应该按 每类基础上的东西 。尽管不表现的关键应用程序可能会在下面使用第一个哈瑟,但无法提出通用 perforant 哈希功能(但我不再建议这样做)。

一个。 bad hasher

这是我最初提出的,然后在 @ in Count。

和 @ françoisAndrieux 那就是:

struct bad_hasher
{
    std::size_t operator()(const value_type&) const
    {
        return 0;
    }
};

它使所有哈希碰撞和强制 std :: unordered_set 使用 keyequal 来确定对象等效。
因此,确实有效,但这并不是一个不错的选择。 @ axnsan and @a href =“ https://stackover.com/users/7359094/fran%c3%A7oiisa7oiisa7oiisa7oiisa7oiis a -andrieux“>françoisAndrieux在下面的评论中指出了以下缺点:

  • “它将其性能特征更改为o(n^2)(它将必须通过每个查找中的所有元素迭代)”( - @<<< a href =“ https://stackoverflow.com/users/3194671/axnsan”> axnsan )
  • ” [它将集合]设置为一个简单的无分数链接列表。每个元素都会碰撞其他元素,看起来典型的实现使用&nbsp; 碰撞链接”。 ( - @françoisAndrieux

) /代码>与 std :: vector + std :: remove_if 相同。

b。 更好的 hasher

@ axnsan 建议该特定

struct better_hasher
{
    std::size_t operator()(const vec3& v) const
    {
        return static_cast<std::size_t>(std::abs(v.x) + std::abs(v.y) + std::abs(v.z));
    }
};

应用:

  • better_hasher(v)== etport_hasher(-v)
  • v1!= V2 =&gt; better_hasher(v1)!= better_hasher(v2) 在大多数情况下(1,0,1)将与碰撞(1 ,1,0)
  • 并非所有哈希相撞。
  • 删除明显的重复。

在这种配置中,这可能是我们可以希望的最好的。

然后,由于哈希碰撞,我们需要删除那些“误报”。

  1. 定义关键平等谓词< / strong>

此处的目标是删除没有被藏机检测到的重复(通常是(1,1,0,1) /(1,1,1,0)的向量)或溢出)。

声明一个谓词 struct ,该大致看起来像:

struct key_equal
{
    bool operator()(const value_type& a, const value_type& b) const
    {
        
        return (a == b) || <...>;
    }
};

其中&lt; ...&gt; 在当前上下文中的两个值相同(因此,这里 a ========== b)|| -a == b 例如)。

请注意,这将期望实现 operator ==

  1. 删除重复项

声明 std :: unordered_set 删除重复的重复:

const std::unordered_set<value_type, hasher, key_equal> s(container.begin(), container.end());
container.assign(s.begin(), s.end());
  1. (alt)删除重复(并保存在容器中的原始顺序)

基本上是相同的,但是这检查是否可以将元素插入 std :: Unordered_set 以及是否可以插入不删除它。 Adapted from @yury's answer in

std::unordered_set<value_type, hasher, comparator> s;

const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};

container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());

通用(独立于容器的)模板函数:

template<typename key_equal_t, typename hasher_t, bool order_conservative, typename container_t>
void remove_duplicates(container_t& container)
{
    using value_type = typename container_t::value_type;

    if constexpr(order_conservative)
    {
        std::unordered_set<value_type, hasher_t, key_equal_t> s;
        const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};
        container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());
    }
    else
    {
        const std::unordered_set<value_type, hasher, key_equal_t> s(container.begin(), container.end());
        container.assign(s.begin(), s.end());
    }
}

期望提供 key_equal_t hasher_t (以及 bool 已知的编译时间,如果您关心是否关心,则元素以相同的顺序保留)。我没有在此功能中的两个分支中的任何一个基准测试,所以我不知道一个分支是否比另一个分支机构明显好,尽管这个答案似乎显示手动插入可能更快。

在此用例中的示例:

/// "Predicate" indicating if two values are considered duplicates or not
struct key_equal
{
    bool operator()(const vec3& a, const vec3& b) const
    {
        // Remove identical vectors and their opposite
        return (a == b) || (-a == b);
    }
};

/// Hashes a vec3 by adding absolute values of its components.
struct hasher
{
    std::size_t operator()(const vec3& v) const
    {
        return static_cast<std::size_t>(std::abs(v.x) + std::abs(v.y) + std::abs(v.z));
    }
};

remove_duplicates<key_equal, hasher, order_conservative>(vec);

测试数据:

vec3 v1{1, 1, 0};   // Keep
vec3 v2{0, 1, 0};   // Keep
vec3 v3{1, -1, 0};  // Keep
vec3 v4{-1, -1, 0}; // Remove
vec3 v5{1, 1, 0};   // Remove

std::vector vec{v1, v2, v3, v4, v5};

测试1:非订单保存:

// ...<print vec values>
remove_duplicates<key_equal, hasher, false>(vec);
// ... <print vec values>

输出(之前 /之后):

(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0)
(1, -1, 0) (0, 1, 0) (1, 1, 0) 

测试2:订单保守词:输出:

// ... <print vec values>
remove_duplicates<key_equal, hasher, true>(vec);
// ... <print vec values>

输出(之前 /之后):

(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0) 
(1, 1, 0) (0, 1, 0) (1, -1, 0) 

I believe the simplest method would be to use std::unordered_set and exploit its second and third template parameters.

Method

  1. Define a hash function

This step goal is a provide a " pre-filtering " step which eliminates obvious duplicates according to the meaning in the context (so here, v1 and -v1 should for example have the same hash).

That's something that should be on a per-class basis . There is no way to come up with a generic performant hashing function, though non-performant critical application may use the first hasher below (but I won't really recommend that anymore).

a. The bad hasher

This is what I originally proposed, before taking comment from @axnsan and @François Andrieux in count.

The simplest hasher I can think of looks like that:

struct bad_hasher
{
    std::size_t operator()(const value_type&) const
    {
        return 0;
    }
};

It makes all hash collide and forces std::unordered_set to use KeyEqual to determine objects equality.
So indeed, that works, but that does not make it a good choice. @axnsan and @François Andrieux pointed the following drawbacks in the comment below:

  • "it changes its performance characteristics to O(n^2) (it will have to iterate through all elements on each lookup) "(- @axnsan)
  • "[it converts] the set into a simple unsorted linked list. Every element will collide with every other element, it it looks like typical implementations use collision chaining". (- @François Andrieux)

In other words, this makes std::unordered_set become the same as a std::vector + std::remove_if.

b. The better hasher

@axnsan suggests the following hasher for this particular application:

struct better_hasher
{
    std::size_t operator()(const vec3& v) const
    {
        return static_cast<std::size_t>(std::abs(v.x) + std::abs(v.y) + std::abs(v.z));
    }
};

It fills the following requirements:

  • better_hasher(v) == better_hasher(-v).
  • v1 != v2 => better_hasher(v1) != better_hasher(v2) in most cases ((1, 0, 1) will collide with (1, 1, 0) for example)
  • not all hashes collide.
  • removes obvious duplicates.

Which is probably somewhere near the best we could hope for in this configuration.

We then need to remove those "false positives" due to hash collisions.

  1. Define a key equality predicate

The goal here is to remove duplicates that were not detected by the hasher (here, typically vectors such as (1, 0, 1) / (1, 1, 0) or overflow).

Declare a predicate struct which roughly looks like:

struct key_equal
{
    bool operator()(const value_type& a, const value_type& b) const
    {
        
        return (a == b) || <...>;
    }
};

Where <...> is anything making two values identical in the current context ( so here a == b) || -a == b for example).

Note that this expects operator== to be implemented.

  1. Erase duplicates

Declare an std::unordered_set which removes duplicates:

const std::unordered_set<value_type, hasher, key_equal> s(container.begin(), container.end());
container.assign(s.begin(), s.end());
  1. (alt) Erase duplicates (and conserve original order in container)

Basically the same, but this checks if an element can be inserted in the std::unordered_set, and if does not, remove it. Adapted from @yury's answer in What's the most efficient way to erase duplicates and sort a vector?.

std::unordered_set<value_type, hasher, comparator> s;

const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};

container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());

Generic (container-independent) templated function:

template<typename key_equal_t, typename hasher_t, bool order_conservative, typename container_t>
void remove_duplicates(container_t& container)
{
    using value_type = typename container_t::value_type;

    if constexpr(order_conservative)
    {
        std::unordered_set<value_type, hasher_t, key_equal_t> s;
        const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};
        container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());
    }
    else
    {
        const std::unordered_set<value_type, hasher, key_equal_t> s(container.begin(), container.end());
        container.assign(s.begin(), s.end());
    }
}

Expects to be provided key_equal_t and hasher_t (and a bool known a compile time indicating if you care about element being kept in the same order or not). I did not benchmark any of the two branches in this function so I do not know if one is significantly better than another, though this answer seems show manual insertion may be faster.

Example in this use case:

/// "Predicate" indicating if two values are considered duplicates or not
struct key_equal
{
    bool operator()(const vec3& a, const vec3& b) const
    {
        // Remove identical vectors and their opposite
        return (a == b) || (-a == b);
    }
};

/// Hashes a vec3 by adding absolute values of its components.
struct hasher
{
    std::size_t operator()(const vec3& v) const
    {
        return static_cast<std::size_t>(std::abs(v.x) + std::abs(v.y) + std::abs(v.z));
    }
};

remove_duplicates<key_equal, hasher, order_conservative>(vec);

Test data:

vec3 v1{1, 1, 0};   // Keep
vec3 v2{0, 1, 0};   // Keep
vec3 v3{1, -1, 0};  // Keep
vec3 v4{-1, -1, 0}; // Remove
vec3 v5{1, 1, 0};   // Remove

std::vector vec{v1, v2, v3, v4, v5};

Test 1: non-order-conservative:

// ...<print vec values>
remove_duplicates<key_equal, hasher, false>(vec);
// ... <print vec values>

Output (before / after):

(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0)
(1, -1, 0) (0, 1, 0) (1, 1, 0) 

Test 2: order-conservative:

// ... <print vec values>
remove_duplicates<key_equal, hasher, true>(vec);
// ... <print vec values>

Output (before / after):

(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0) 
(1, 1, 0) (0, 1, 0) (1, -1, 0) 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文