`std::set` 有什么问题吗?

发布于 2024-10-26 04:56:55 字数 2139 浏览 9 评论 0原文

另一个主题中,我试图解决这个问题。问题是从 std::string 中删除重复的字符。

std::string s= "saaangeetha";

由于顺序并不重要,所以我先对 s 进行排序,然后使用 std::unique 最后调整大小以获得 所需结果

aeghnst

这是正确的!


现在我想做同样的事情,但同时我希望字符的顺序保持不变。意思是,我想要这个输出:

sangeth

所以我写了 this

template<typename T>
struct is_repeated
{
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    s.erase(std::remove_if(s.begin(), s.end(), is_repeated<char>()), s.end()); 
    std::cout << s ;
}

给出了这个输出:

saangeth

也就是说, a 被重复,尽管其他重复消失了。代码有什么问题吗?

无论如何,我稍微更改了我的代码:(请参阅评论)

template<typename T>
struct is_repeated
{
    std::set<T> & unique;  //made reference!
    is_repeated(std::set<T> &s) : unique(s) {} //added line!
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    std::set<char> set; //added line!
    s.erase(std::remove_if(s.begin(),s.end(),is_repeated<char>(set)),s.end()); 
    std::cout << s ;
}

输出:

sangeth

问题消失了!

那么第一个解决方案有什么问题呢?

另外,如果我没有将成员变量设为unique引用类型,那么问题就不会出现去

std::setis_repeated 函子有什么问题?到底问题出在哪里呢?

我还注意到,如果将 is_repeated 函子复制到某处,那么它的每个成员也会被复制。我没有看到这里的问题!

In the other topic I was trying to solve this problem. The problem was to remove duplicate characters from a std::string.

std::string s= "saaangeetha";

Since the order was not important, so I sorted s first, and then used std::unique and finally resized it to get the desired result:

aeghnst

That is correct!


Now I want to do the same, but at the same time I want the order of characters intact. Means, I want this output:

sangeth

So I wrote this:

template<typename T>
struct is_repeated
{
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    s.erase(std::remove_if(s.begin(), s.end(), is_repeated<char>()), s.end()); 
    std::cout << s ;
}

Which gives this output:

saangeth

That is, a is repeated, though other repetitions gone. What is wrong with the code?

Anyway I change my code a bit: (see the comment)

template<typename T>
struct is_repeated
{
    std::set<T> & unique;  //made reference!
    is_repeated(std::set<T> &s) : unique(s) {} //added line!
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    std::set<char> set; //added line!
    s.erase(std::remove_if(s.begin(),s.end(),is_repeated<char>(set)),s.end()); 
    std::cout << s ;
}

Output:

sangeth

Problem gone!

So what is wrong with the first solution?

Also, if I don't make the member variable unique reference type, then the problem doesn't go.

What is wrong with std::set or is_repeated functor? Where exactly is the problem?

I also note that if the is_repeated functor is copied somewhere, then every member of it is also copied. I don't see the problem here!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

南烟 2024-11-02 04:57:16

函子类应该是纯函数并且没有自己的状态。请参阅 Scott Meyer 的 Effective STL 书中的第 39 项以获得对此的详细解释。但其要点是,您的函子类可能会在算法内被复制 1 次或多次。

Functor classes should be pure functions and have no state of their own. See item 39 in Scott Meyer's Effective STL book for a good explanation on this. But the gist of it is that your functor class may be copied 1 or more times inside the algorithm.

泅人 2024-11-02 04:57:16

其他答案是正确的,因为问题是您使用的函子不是可复制安全的。特别是,gcc (4.2) 附带的 STL 将 std::remove_if 实现为 std::find_if 的组合,以定位要删除的第一个元素,后跟 < code>std::remove_copy_if 来完成操作。

template <typename ForwardIterator, typename Predicate>
std::remove_if( ForwardIterator first, ForwardIterator end, Predicate pred ) {
   first = std::find_if( first, end, pred ); // [1]
   ForwardIterator i = it;
   return first == last? first 
          : std::remove_copy_if( ++i, end, fist, pred ); // [2]
}

[1] 中的副本意味着找到的第一个元素被添加到函子的副本中,这意味着第一个“a”将被遗忘。函子也在 [2] 中被复制,如果不是的话也没关系,因为该副本的原始函子是一个空函子。

The other answers are correct, in that the issue is that the functor that you are using is not copyable safe. In particular, the STL that comes with gcc (4.2) implements std::remove_if as a combination of std::find_if to locate the first element to delete followed by a std::remove_copy_if to complete the operation.

template <typename ForwardIterator, typename Predicate>
std::remove_if( ForwardIterator first, ForwardIterator end, Predicate pred ) {
   first = std::find_if( first, end, pred ); // [1]
   ForwardIterator i = it;
   return first == last? first 
          : std::remove_copy_if( ++i, end, fist, pred ); // [2]
}

The copy in [1] means that the first element found is added to the copy of the functor and that means that the first 'a' will be lost in oblivion. The functor is also copied in [2], and that would be fine if it were not because the original for that copy is an empty functor.

半夏半凉 2024-11-02 04:57:16

根据 remove_if 的实现,可以复制您的谓词。重构函子并使其无状态,或者使用 Boost.Ref 到“用于传递对函数模板(算法)的引用,这些函数模板(算法)通常会获取其参数的副本”,如下所示:

#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>

#include <boost/ref.hpp>
#include <boost/bind.hpp>

template<typename T>
struct is_repeated {
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 

int main() {
    std::string s= "saaangeetha";
    s.erase(std::remove_if(s.begin(), s.end(), boost::bind<bool>(boost::ref(is_repeated<char>()),_1)), s.end());
    std::cout << s;

    return 0;
}

Depending on the implementation of remove_if can make copies of your predicate. Either refactor your functor and make it stateless or use Boost.Ref to "for passing references to function templates (algorithms) that would usually take copies of their arguments", like so:

#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>

#include <boost/ref.hpp>
#include <boost/bind.hpp>

template<typename T>
struct is_repeated {
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 

int main() {
    std::string s= "saaangeetha";
    s.erase(std::remove_if(s.begin(), s.end(), boost::bind<bool>(boost::ref(is_repeated<char>()),_1)), s.end());
    std::cout << s;

    return 0;
}
断桥再见 2024-11-02 04:57:15

函子的设计方式应该使函子的副本与原始函子相同。也就是说,如果您复制一个函子,然后执行一系列操作,则无论您使用哪个函子,或者即使您交错两个函子,结果都应该是相同的。这使得 STL 实现能够灵活地复制函子并根据需要传递它们。

对于您的第一个函子,此声明不成立,因为如果我复制您的函子然后调用它,您对其存储集所做的更改不会反映在原始函子中,因此副本和原始函子的性能会有所不同。同样,如果您采用第二个函子并使其不通过引用存储其集合,则函子的两个副本的行为将不相同。

不过,函子的最终版本之所以有效,是因为集合是通过引用存储的这一事实意味着任何数量的 tue 函子副本都将彼此表现相同。

希望这有帮助!

Functors are supposed to be designed in a way where a copy of a functor is identical to the original functor. That is, if you make a copy of one functor and then perform a sequence of operations, the result should be the same no matter which functor you use, or even if you interleave the two functors. This gives the STL implementation the flexibility to copy functors and pass them around as it sees fit.

With your first functor, this claim does not hold because if I copy your functor and then call it, the changes you make to its stored set do not reflect in the original functor, so the copy and the original will perform differently. Similarly, if you take your second functor and make it not store its set by reference, the two copies of the functor will not behave identically.

The reason that your final version of the functor works, though, is because the fact that the set is stored by reference means that any number of copies of tue functor will behave identically to one another.

Hope this helps!

万劫不复 2024-11-02 04:57:15

在 GCC (libstdc++) 中remove_if< /code> 本质上实现为

    template<typename It, typename Pred>
    It remove_if(It first, It last, Pred predicate) {
      first = std::find_if(first, last, predicate);
    //                                  ^^^^^^^^^
      if (first == last)
         return first;
      else {
         It result = first;
         ++ result;
         for (; first != last; ++ first) {
           if (!predicate(*first)) {
    //          ^^^^^^^^^
              *result = std::move(*first);
              ++ result;
           }
         }
      }
    }

注意,您的谓词按值传递给 find_if,因此结构以及集合在 find_if 内进行修改code> 不会传播回调用者。

由于第一个重复项出现在:

  saaangeetha
//  ^

初始的“sa”将在 find_if 调用后保留。同时,谓词集为空(find_if 中的插入是本地的)。因此之后的循环将保留第三个a

   sa | angeth
// ^^   ^^^^^^
// ||   kept by the loop in remove_if
// ||
// kept by find_if

In GCC (libstdc++), remove_if is implemented essentially as

    template<typename It, typename Pred>
    It remove_if(It first, It last, Pred predicate) {
      first = std::find_if(first, last, predicate);
    //                                  ^^^^^^^^^
      if (first == last)
         return first;
      else {
         It result = first;
         ++ result;
         for (; first != last; ++ first) {
           if (!predicate(*first)) {
    //          ^^^^^^^^^
              *result = std::move(*first);
              ++ result;
           }
         }
      }
    }

Note that your predicate is passed by-value to find_if, so the struct, and therefore the set, modified inside find_if will not be propagated back to caller.

Since the first duplicate appears at:

  saaangeetha
//  ^

The initial "sa" will be kept after the find_if call. Meanwhile, the predicate's set is empty (the insertions within find_if are local). Therefore the loop afterwards will keep the 3rd a.

   sa | angeth
// ^^   ^^^^^^
// ||   kept by the loop in remove_if
// ||
// kept by find_if
自找没趣 2024-11-02 04:57:15

并不是真正的答案,但作为另一个需要考虑的有趣的花絮,这确实有效,即使它使用原始函子:

#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>

template<typename T>
struct is_repeated {
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    std::remove_copy_if(s.begin(), s.end(), 
                        std::ostream_iterator<char>(std::cout), 
                        is_repeated<char>());
    return 0;
}

编辑:我不认为它会影响此行为,但我还纠正了函子中的一个小错误(显然,operator() 应该采用 T 类型的参数,而不是 char)。

Not really an answer, but as another interesting tidbit to consider, this does work, even though it uses the original functor:

#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>

template<typename T>
struct is_repeated {
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    std::remove_copy_if(s.begin(), s.end(), 
                        std::ostream_iterator<char>(std::cout), 
                        is_repeated<char>());
    return 0;
}

Edit: I don't think it affects this behavior, but I've also corrected a minor slip in your functor (operator() should apparently take a parameter of type T, not char).

咋地 2024-11-02 04:57:15

我认为问题可能在于 is_repeated 函子被复制到 std::remove_if 实现内部的某个位置。如果是这种情况,则使用默认的复制构造函数,然后调用 std::set 复制构造函数。您最终会得到两个可能独立使用的 is_repeated 函子。然而,由于它们中的集合是不同的对象,因此它们看不到相互的变化。如果将字段 is_repeated::unique 转换为引用,则复制的函子仍使用原始集,这就是本例中您想要的。

I suppose the problem could lie in that the is_repeated functor is copied somewhere inside the implementation of std::remove_if. If that is the case, the default copy constructor is used and this in turn calls std::set copy constructor. You end up with two is_repeated functors possibly used independently. However as the sets in both of them are distinct objects, they don't see the mutual changes. If you turn the field is_repeated::unique to a reference, then the copied functor still uses the original set which is what you want in this case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文