C++ pimpl 习惯用法与 C 风格相比浪费了一条指令?
(是的,我知道一条机器指令通常并不重要。我问这个问题是因为我想理解 pimpl 习惯用法,并以尽可能最好的方式使用它;而且因为有时我确实关心一条机器指令机器指令。)
在下面的示例代码中,有两个类,Thing
和 其他事情
。用户将包含“thing.hh”。 Thing
使用 pimpl 习惯用法来隐藏其实现。 OtherThing
使用 C 风格 – 返回并接受的非成员函数 指针。这种风格产生稍微更好的机器代码。我是 想知道:有没有一种方法可以使用 C++ 风格——即创建函数 到成员函数中——但仍然保存机器指令。我喜欢这种风格,因为它不会污染类外的命名空间。
注意:我只考虑调用成员函数(在本例中为 calc
)。我不是在考虑对象分配。
以下是我的 Mac 上的文件、命令和机器代码。
thing.hh:
class ThingImpl;
class Thing
{
ThingImpl *impl;
public:
Thing();
int calc();
};
class OtherThing;
OtherThing *make_other();
int calc(OtherThing *);
thing.cc:
#include "thing.hh"
struct ThingImpl
{
int x;
};
Thing::Thing()
{
impl = new ThingImpl;
impl->x = 5;
}
int Thing::calc()
{
return impl->x + 1;
}
struct OtherThing
{
int x;
};
OtherThing *make_other()
{
OtherThing *t = new OtherThing;
t->x = 5;
}
int calc(OtherThing *t)
{
return t->x + 1;
}
main.cc(只是为了测试代码是否实际工作...)
#include "thing.hh"
#include <cstdio>
int main()
{
Thing *t = new Thing;
printf("calc: %d\n", t->calc());
OtherThing *t2 = make_other();
printf("calc: %d\n", calc(t2));
}
Makefile:
all: main
thing.o : thing.cc thing.hh
g++ -fomit-frame-pointer -O2 -c thing.cc
main.o : main.cc thing.hh
g++ -fomit-frame-pointer -O2 -c main.cc
main: main.o thing.o
g++ -O2 -o $@ $^
clean:
rm *.o
rm main
运行 make
然后查看机器代码。在 Mac 上,我使用 otool -tv thing.o | c++filt。在 Linux 上,我认为它是 objdump -d thing.o 。这是相关的输出:
事物::calc():
0000000000000000 movq (%rdi),%rax
0000000000000003 movl (%rax),%eax
0000000000000005 包括 %eax
0000000000000007 ret
计算(其他事物*):
0000000000000010 movl (%rdi),%eax
0000000000000012 包括 %eax
0000000000000014 转
请注意由于指针间接而产生的额外指令。第一个函数查找两个字段(impl,然后 x),而第二个函数只需要获取 x。可以做什么?
(Yes, I know that one machine instruction usually doesn't matter. I'm asking this question because I want to understand the pimpl idiom, and use it in the best possible way; and because sometimes I do care about one machine instruction.)
In the sample code below, there are two classes, Thing
andOtherThing
. Users would include "thing.hh".Thing
uses the pimpl idiom to hide it's implementation.OtherThing
uses a C style – non-member functions that return and take
pointers. This style produces slightly better machine code. I'm
wondering: is there a way to use C++ style – ie, make the functions
into member functions – and yet still save the machine instruction. I like this style because it doesn't pollute the namespace outside the class.
Note: I'm only looking at calling member functions (in this case, calc
). I'm not looking at object allocation.
Below are the files, commands, and the machine code, on my Mac.
thing.hh:
class ThingImpl;
class Thing
{
ThingImpl *impl;
public:
Thing();
int calc();
};
class OtherThing;
OtherThing *make_other();
int calc(OtherThing *);
thing.cc:
#include "thing.hh"
struct ThingImpl
{
int x;
};
Thing::Thing()
{
impl = new ThingImpl;
impl->x = 5;
}
int Thing::calc()
{
return impl->x + 1;
}
struct OtherThing
{
int x;
};
OtherThing *make_other()
{
OtherThing *t = new OtherThing;
t->x = 5;
}
int calc(OtherThing *t)
{
return t->x + 1;
}
main.cc (just to test the code actually works...)
#include "thing.hh"
#include <cstdio>
int main()
{
Thing *t = new Thing;
printf("calc: %d\n", t->calc());
OtherThing *t2 = make_other();
printf("calc: %d\n", calc(t2));
}
Makefile:
all: main
thing.o : thing.cc thing.hh
g++ -fomit-frame-pointer -O2 -c thing.cc
main.o : main.cc thing.hh
g++ -fomit-frame-pointer -O2 -c main.cc
main: main.o thing.o
g++ -O2 -o $@ $^
clean:
rm *.o
rm main
Run make
and then look at the machine code. On the mac I use otool -tv thing.o | c++filt
. On linux I think it's objdump -d thing.o
. Here is the relevant output:
Thing::calc():
0000000000000000 movq (%rdi),%rax
0000000000000003 movl (%rax),%eax
0000000000000005 incl %eax
0000000000000007 ret
calc(OtherThing*):
0000000000000010 movl (%rdi),%eax
0000000000000012 incl %eax
0000000000000014 ret
Notice the extra instruction because of the pointer indirection. The first function looks up two fields (impl, then x), while the second only needs to get x. What can be done?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
一条指示很少是值得花很多时间担心的事情。首先,编译器可以在更复杂的用例中缓存 pImpl,从而在现实场景中分摊成本。其次,流水线架构几乎不可能预测时钟周期的实际成本。如果您在循环中运行这些操作并对差异进行计时,您将对成本有更现实的了解。
One instruction is rarely a thing to spend much time worrying over. Firstly, the compiler may cache the pImpl in a more complex use case, thus amortising the cost in a real-world scenario. Secondly, pipelined architectures make it almost impossible to predict the real cost in clock cycles. You'll get a much more realistic idea of the cost if you run these operations in a loop and time the difference.
不太难,只需在课堂上使用相同的技术即可。任何半途而废的优化器都会内联
琐碎的包装。
Not too hard, just use the same technique inside your class. Any halfway decent optimizer will inline
the trivial wrapper.
有一种令人讨厌的方法,即用足够大的无符号字符数组替换指向 ThingImpl 的指针,然后放置/新重新解释强制转换/显式破坏该 ThingImpl 对象。
或者,您可以仅按值传递
Thing
,因为它不应大于指向ThingImpl
的指针,尽管可能需要多一点(引用计数ThingImpl
会破坏优化,因此您需要某种方法来标记“拥有”Thing
,这在某些架构上可能需要额外的空间)。There's the nasty way, which is to replace the pointer to
ThingImpl
with a big-enough array of unsigned chars and then placement/new reinterpret cast/explicitly destruct theThingImpl
object.Or you could just pass the
Thing
around by value, since it should be no larger than the pointer to theThingImpl
, though may require a little more than that (reference counting of theThingImpl
would defeat the optimisation, so you need some way of flagging the 'owning'Thing
, which might require extra space on some architectures).我不同意你的用法:你没有比较两个相同的东西。
Thing
的构造函数完成。您应该分配
Thing
位于堆栈上,尽管它可能不会更改双重解引用指令...但可能会更改其成本(消除缓存未命中),但要点是
Thing
管理其自身 。内存本身,所以你不能忘记删除实际的内存,而你绝对可以使用 C 风格的方法,我认为自动内存处理值得额外的内存指令,特别是因为正如所说的那样,如果您多次访问取消引用的值,则可能会对其进行缓存,因此
正确性几乎没有什么比性能更重要的。
I disagree about your usage: you are not comparing the 2 same things.
Thing
.You should allocate
Thing
on the stack, though it would not probably change the double dereferencing instruction... but could change its cost (remove a cache miss).However the main point is that
Thing
manages its memory on its own, so you can't forget to delete the actual memory, while you definitely can with the C-style method.I would argue that automatic memory handling is worth an extra memory instruction, specifically because as it's been said, the dereferenced value will probably be cached if you access it more than once, thus amounting to almost nothing.
Correctness is more important than performance.
让编译器去操心吧。它比我们更了解什么实际上更快或更慢。尤其是在如此微小的尺度上。
在类中包含项目比仅仅封装有更多的好处。如果您忘记了如何使用 private 关键字,那么 PIMPL 是个好主意。
Let the compiler worry about it. It knows far more about what is actually faster or slower than we do. Especially on such a minute scale.
Having items in classes has far, far more benefits than just encapsulation. PIMPL's a great idea, if you've forgotten how to use the private keyword.