虽然已经有很多关于复制构造函数/赋值运算符的问题,但我没有找到适合我的问题的答案。
我有一个像现在这样的课程
class Foo
{
// ...
private:
std::vector<int> vec1;
std::vector<int> vec2;
boost::bimap<unsigned int, unsigned int> bimap;
// And a couple more
};
,似乎正在进行一些相当过度的复制(基于配置文件数据)..所以我的问题是如何最好地解决这个问题?
我应该实现自定义复制构造函数/赋值运算符并使用交换吗?或者我应该定义自己的交换方法并使用它(在适当的情况下)而不是赋值?
由于我不是 C++ 专家,因此非常感谢展示如何正确处理这种情况的示例。
更新:看来我不是很清楚..让我尝试解释一下。该程序基本上是一个动态广度优先搜索程序,对于所采取的每一步,我都需要存储有关该步骤的元数据(即 Foo 类)。现在的问题是(通常)有指数级的步骤,所以你可以想象需要存储大量的这些对象。据我所知,我总是传递(常量)引用。每次我从以下节点计算后继者我需要创建和存储一个图表Foo 对象(但是,在处理该后继对象时,一些数据成员将被添加到这个 foo 中)。
我的个人资料数据大致显示如下(我在这台机器上没有实际数字):
SearchStrategy::Search 13s
FooStore::Save 10s
所以你可以看到我保存这些元数据的时间几乎和搜索图表一样多。哦,FooStore 将 Foo
保存在 google::sparse_hash_map >
。
编译器是 g++4.4 或 g++4.5 (我不在我的开发机器上,所以我现在无法检查)..
更新 2 我在构建后将一些成员分配给像我想明天那样的 Foo 实例
void SetVec1(const std::vector<int>& vec1) { this->vec1 = vec1; };
,我应该将其更改为使用交换方法,这肯定会稍微改进这一点。
如果我不完全清楚我要实现的语义,我很抱歉,但是原因是我不太 当然。
问候,
莫滕
While there is quite a few questions about copy constructors/assignment operators on SO already, I did not find an answer that fit my problem.
I have a class like
class Foo
{
// ...
private:
std::vector<int> vec1;
std::vector<int> vec2;
boost::bimap<unsigned int, unsigned int> bimap;
// And a couple more
};
Now it seems that there is some quite excessive copying going on (based on profile data).. So my question is how to best tackle this?
Should I implement custom copy constructor/assignment operator and use swap? Or should I define my own swap method and use that (where appropriate) instead of assignment?
As I am not a c++ expert, examples that show how to properly handle this situation are greatly appreciated.
UPDATE: It appears I was not terribly clear.. Let me try to explain. The program is basically an on-the-fly breadth-first search program, and for each step taken I need to store metadata about the step (which is the Foo
class).. Now the problem is that there is (usually) exponentially steps, so you can imagine a large number of these objects needs to be stored.. I do pass by (const) reference always as far as I know.. Each time I calculate a successor from a node in the graph I need to create and store ONE Foo object (however, some of the data members will be added to this one foo further on in the processing of this successor)..
My profile data shows roughly something like this (I don't have the actual numbers on this machine):
SearchStrategy::Search 13s
FooStore::Save 10s
So you can see I spend nearly as much time saving this meta data as I do searching through the graph.. Oh, and FooStore saves Foo
in a google::sparse_hash_map<long long, Foo, boost::hash<long long> >
.
Compiler is g++4.4 or g++4.5 (I'm not at my dev. machine, so I cannot check at the moment)..
UPDATE 2 I assign some of the members after construction to a Foo instance like
void SetVec1(const std::vector<int>& vec1) { this->vec1 = vec1; };
I guess tomorrow, I should change this to use the swap method, which should definitely improve this a bit..
I'm sorry if I'm not entirely clear about what semantics I'm trying to achieve, but the reason is that I am not quite sure.
Regards,
Morten
发布评论
评论(6)
一切都取决于复制这个对象在你的情况下意味着什么:
如果它是1,那么这个类看起来是正确的。您不太清楚您所说的确实会产生大量副本的操作,因此我假设您尝试复制整个对象。
如果是2,那么你需要使用像shared_ptr这样的东西来在对象之间共享容器。仅使用shared_ptr而不是真实对象作为成员将隐式允许两个对象(副本和被复制的)引用缓冲区。
这是更简单的方法(如果您有支持 C++0x 的编译器提供的话,请使用 boost::shared_ptr 或 std::shared_ptr )。
还有更困难的方法,但它们肯定会成为以后的问题。
Everything depends on what copying this object means in your case :
If it's 1, then this class seem correct. You're not very clear about the operations that you say does make lot of copies so I'm assuming you try to copy the whole object.
If it's 2, then you need to use something like shared_ptr to share the containers between the objects. Just using shared_ptr instead of real objects as member will implicitely allow the buffers to be refered by both objects (the copy and the copied).
That's the easier way (using boost::shared_ptr or std::shared_ptr if you have a C++0x enabled compiler providing it).
There are harder ways but they will certainly become a problem later.
当然,大家都这么说,不要过早优化。 不要为此烦恼。
如果您的程序设计要求您同时保存数据的多个副本,那么您无能为力。您只需硬着头皮复制数据即可。不,实现自定义复制构造函数和自定义赋值运算符不会使其运行得更快。
如果您的程序不需要此数据的多个同时副本,那么您确实有一些技巧可以减少执行的副本数量。
检测你的复制方法如果是我,即使在尝试改进任何东西之前,我要做的第一件事就是计算我的复制方法被使用的次数
调用。
在有或没有改进的情况下运行您的程序。打印出这些静态成员的值以查看您的更改是否有任何效果。
避免在函数调用中使用引用进行赋值 如果将 Foo 类型的对象传递给函数,请考虑是否可以通过引用来完成。如果您不更改传递的副本,则通过 const 引用传递它是理所当然的。
使用 Foo::swap 避免复制 如果您经常使用复制方法(显式或隐式),请考虑分配来源项是否可以放弃其数据,而不是复制它。
当然,只有当
myFoo
和oldFoo
不再需要访问其数据时,这才有效。而且,您必须实现Foo::swap
无论您做什么,请在更改之前和之后测量您的程序。测量复制方法被调用的次数以及程序中的总时间改进。
Of course, and everyone says this, don't optimize prematurely. Don't bother with this until and unless you prove a) that your program goes too slowly, and b) it would go faster if you didn't copy so much data.
If your program design requires you to hold multiple simultaneous copies of the data, there is nothing you can do. You just have to bite the bullet and copy the data. No, implementing a custom copy constructor and custom assignment operator won't make it go faster.
If your program doesn't require multiple simultaneous copies of this data, then you do have a couple of tricks to reduce the number of copies you perform.
Instrument your copy methods If it were me, the first thing I would do, even before trying to improve anything, is to count the number of times my copy methods were
invoked.
Run your program with and without your improvements. Print out the value of those static members to see if your changes had any effect.
Avoid assignments in function calls by using references If you pass objects of type Foo to functions, consider if you can do it by reference. If you don't change the passed copy, passing it by const reference is a no-brainer.
Avoid copies by using Foo::swap If you use the copy methods (either explicitly or implicitly) a lot, consider whether the assigned-from item could give up its data, rather than copying it.
Of course, this only works if
myFoo
andoldFoo
no longer need access to their data. And, you have to implementFoo::swap
Whatever you do, measure your program before and after your change. Measure the number of times your copy methods are invoked, and the total time improvement in your program.
你的课程看起来并没有那么糟糕,但你没有展示你如何使用它。
如果存在大量复制,则需要通过引用(或者如果可能的话 const 引用)传递这些类的对象。
如果必须复制该类,那么您将无能为力。
Your class doesn't seem that bad, but you do not show how you use it.
If there is lots of copying, then you need to pass objects of those class by reference (or if possible const reference).
If that class has to be copied, then you can not do anything.
如果这确实是个问题,您可以考虑实施 pimpl idiom。但我怀疑这是一个问题,尽管我必须看看你对该类的使用才能确定。
If it's really a problem, you might consider implementing the pimpl idiom. But I doubt it's a problem, though I'd have to see your use of the class to be sure.
复制巨大的向量不太可能很便宜。最有希望的方法是复制稀有的。虽然在 C++ 中无意地调用复制非常容易(可能太容易了),但有一些方法可以避免不必要的复制:
这些技术可能只留下以下副本:算法要求。
有时甚至可以避免其中一些复制。例如,如果您需要两个对象,其中第二个对象是第一个对象的反向副本,则可以创建一个包装对象,其行为类似于反向,但不存储整个副本,仅具有引用。
Copying of huge vectors unlikely can be cheap. The most promising way is to copy rarer. While it's quite easy (may be too easy) in C++ to invoke copy without intention, there are ways to avoid needless copying:
These techniques may leave only copies which are required by algorithm.
Sometimes it's possible to avoid even some of those copying. For example, if you need two objects where the second one is reversed copy of the first one, a wrapper object may be created which acts like reversed, but instead of storing entire copy has only a reference.
减少复制的明显方法是使用诸如shared_ptr之类的东西。然而,对于多线程来说,这种治疗方法可能比疾病更糟糕——增加和减少引用计数需要以原子方式完成,这可能非常昂贵。但是,如果您通常最终会修改副本并需要每个副本都具有唯一性(即,修改副本不会影响原始副本),那么您最终可能会获得更差的性能,并为引用计数的原子增量/减量付出代价,并且仍然进行大量复制。
有一些明显的方法可以避免这种情况。一种是移动唯一的对象而不是复制——如果你能让它发挥作用,那就太好了。另一种是大部分时间使用非原子引用计数,并且仅在线程之间移动数据时才进行深复制。
但没有一个答案是通用且真正干净的。
The obvious way to reduce copying is to use something like a shared_ptr. With multithreading, however, this cure can be worse than the disease -- incrementing and decrementing reference counts needs to be done atomically, which can be quite expensive. If, however, you typically end up modifying the copies and need each copy to act unique (i.e., modifying a copy doesn't affect the original) you can end up with worse performance still, paying for the atomic increment/decrement for reference counting, and still doing lots of copies anyway.
There are a couple of obvious ways to avoid that. One is to move unique objects instead of copying at all -- this is great if you can make it work. Another is to use non-atomic reference counting most of the time, and do deep copies only when moving data between threads.
There is no one answer that'a universal and really clean though.