从紧密的内部循环中调用微小函数的开销？ [C++]

发布于 2024-08-27 09:41:39 字数 1046 浏览 9 评论 0原文

假设您看到一个像这样的循环：

for(int i=0;
    i<thing.getParent().getObjectModel().getElements(SOME_TYPE).count();
    ++i)
{
  thing.getData().insert(
    thing.GetData().Count(),
    thing.getParent().getObjectModel().getElements(SOME_TYPE)[i].getName()
    );
}

如果这是 Java，我可能不会三思而后行。但在 C++ 的性能关键部分，它让我想要修改它......但是我不知道编译器是否足够聪明，使其变得徒劳。这是一个虚构的示例，但它所做的只是将字符串插入到容器中。请不要假设其中任何一个都是 STL 类型，一般性地思考以下问题：

for 循环中的混乱条件是每次都会被评估，还是只评估一次？
如果这些 get 方法只是返回对对象上的成员变量的引用，它们会被内联吗？
您希望自定义 [] 运算符得到优化吗？

换句话说，是否值得花时间（仅在性能方面，而不是可读性方面）将其转换为类似以下内容：

ElementContainer &source = 
   thing.getParent().getObjectModel().getElements(SOME_TYPE);
int num = source.count();
Store &destination = thing.getData();
for(int i=0;i<num;++i)
{
  destination.insert(thing.GetData().Count(), source[i].getName());
}

记住，这是一个紧密循环，每秒调用数百万次。我想知道的是，这一切是否会减少每个循环的几个周期或更实质性的东西？

是的，我知道关于“过早优化”的说法。我知道分析很重要。但这是关于现代编译器（尤其是 Visual Studio）的一个更普遍的问题。

原文

Say you see a loop like this one:

for(int i=0;
    i<thing.getParent().getObjectModel().getElements(SOME_TYPE).count();
    ++i)
{
  thing.getData().insert(
    thing.GetData().Count(),
    thing.getParent().getObjectModel().getElements(SOME_TYPE)[i].getName()
    );
}

if this was Java I'd probably not think twice. But in performance-critical sections of C++, it makes me want to tinker with it... however I don't know if the compiler is smart enough to make it futile.
This is a made up example but all it's doing is inserting strings into a container. Please don't assume any of these are STL types, think in general terms about the following:

Is having a messy condition in the for loop going to get evaluated each time, or only once?
If those get methods are simply returning references to member variables on the objects, will they be inlined away?
Would you expect custom [] operators to get optimized at all?

In other words is it worth the time (in performance only, not readability) to convert it to something like:

ElementContainer &source = 
   thing.getParent().getObjectModel().getElements(SOME_TYPE);
int num = source.count();
Store &destination = thing.getData();
for(int i=0;i<num;++i)
{
  destination.insert(thing.GetData().Count(), source[i].getName());
}

Remember, this is a tight loop, called millions of times a second. What I wonder is if all this will shave a couple of cycles per loop or something more substantial?

Yes I know the quote about "premature optimisation". And I know that profiling is important. But this is a more general question about modern compilers, Visual Studio in particular.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

反差帅 2024-09-03 09:41:39

回答此类问题的一般方法是查看生成的组件。对于 gcc，这涉及用 -S 替换 -c 标志。

我自己的规则是不与编译器对抗。如果要内联某些内容，那么我会确保编译器拥有执行此类内联所需的所有信息，并且（可能）我会尝试使用显式的 inline 关键字敦促他这样做。

此外，内联可以节省一些操作码，但会使代码增长，就 L1 缓存而言，这可能对性能非常不利。

回复收藏 0 原文

挽梦忆笙歌 2024-09-03 09:41:39

您提出的所有问题都是特定于编译器的，因此唯一明智的答案是“这取决于”。如果这对您很重要，您应该（一如既往）查看编译器发出的代码并进行一些计时实验。确保您的代码在所有优化都打开的情况下进行编译 - 这对于诸如 operator[]() 之类的东西会产生很大的影响，它通常作为内联函数实现，但不会被内联（至少在 GCC 中）除非您打开优化。

回复收藏 0 原文