使用返回值优化围绕 get 和函数调用的循环?

发布于 2024-11-03 12:23:59 字数 1022 浏览 1 评论 0原文

这是一个从缓冲源获取数据并将其发送以进行处理的片段。如果队列为空,则 get() 返回 null,并且 process 方法很乐意接受 null 且不执行任何操作。 对此进行编码的最佳方法是什么?

something a; // any legal C++ return type...
aQueueOfSomethings g;

while (true) { 
    a=g.get();
    process(a);
}

无法预测通过 get() 到达的值,它们就是它们本身,需要尽快将它们出队并传递给 process() 。

我在这里没有看到很多浪费的精力 - 如果我跳过名为“a”的显式局部变量并使循环成为一个行:

    process(g.get());

g.get() 的隐式返回值仍将分配空间,可能涉及构造函数调用等等。

如果返回的东西有任何大小或复杂性,最好有一个指向它的指针而不是它的副本,并传递该指针而不是按值复制......所以我' 喜欢

something *a;

    g.get(a);
    process(a);

我更

 something a;

    a=g.get();
    process(a);

用 C++ 编写一个测试用例,尝试两行和一行版本,循环 100,000,000 次。

如果 a 是一个有 4 个整数和 2 个浮点数的对象,并且 process() 方法接触到它们,那么两行解决方案实际上更快!如果 a 对象是单个 int,则单行版本更快。如果对象很复杂,但 process() 方法只触及一个值,则单行版本更快。

对我来说最有趣的是,使用 g++ 编译器、Mac OS X 10.5.8,-O 一级优化开关会在 1 行和 2 行版本中产生相同、更快的操作。

除了让编译器优化、两种方法都用一行、没有显式中间变量、通过引用传递以避免复制之外,还有什么可以让它运行得更快吗?我觉得我错过了一些明显的东西。

Here is a fragment getting data from a buffered source and sending it along to be processed. If the queue is empty, get() returns a null, and the process method is happy to take a null and do nothing.
What is the most optimum way to code this?

something a; // any legal C++ return type...
aQueueOfSomethings g;

while (true) { 
    a=g.get();
    process(a);
}

There is no way to predict the values arriving via get(), they are what they are, and they need to be dequeued and passed on to process() as quickly as possible.

I don't see a lot of wasted effort here- if I skip the explicit local variable named 'a' and make the loop a one liner:

    process(g.get());

the implicit return value of g.get() will still have space allocated, might involve a constructor call, etc, etc.

If the thing returned has any size or complexity, it would be better to have a pointer to it rather than a copy of it, and pass that pointer rather than a copy by value... So I'd prefer to have

something *a;

    g.get(a);
    process(a);

rather than

 something a;

    a=g.get();
    process(a);

I wrote a test case in c++ trying the two line and one line versions, loop 100,000,000 times.

If the a is an object with 4 integer and 2 floating point numbers, and the process() method touches them all, the two line solution is actually faster! If the a object is a single int, the one-line version is faster. If the object is complex but the process() method just touches one value, the one-line version is faster.

Most interesting to me, using g++ compiler, Mac OS X 10.5.8, the -O first level optimization switch results in identical, much faster, operation, with both 1 and 2 line versions.

Other than letting the compiler optimize, a single line for both methods and no explicit intermediate variable, and pass by reference so avoiding making copies, is there anything that would generally make it run faster? I feel like I'm missing something obvious.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

你对谁都笑 2024-11-10 12:23:59

我认为这是无用优化的极端情况

(您正在采取一些缓冲并且想要对其进行位优化的东西?)

此外,编译器会将两种方式编译为完全相同的代码,并且(在大多数情况下)完全有权进行返回值优化和尾部调用优化。

结合queue_class::get()的可能内联,你的问题似乎完全没有意义

I think this is a supreme case of useless optimization

(you are taking something that buffers and want to bit-optimize it?)

Also, the compiler will compile both ways to exactly the same code, and (in most circumstances) is completely entitled to do return value optimization and tail call optimization.

Combined with probable inlining of queue_class::get() your issue seems to be completely MOOT

番薯 2024-11-10 12:23:59

我相信你正试图在自己的工作中击败编译器。

您遇到过性能问题吗?如果没有,您可能会专注于生成可以维护的可读代码(您似乎已经拥有),而不是诉诸可能过早的优化并通过奇怪的优化使代码变得混乱。

I believe your are trying to beat the compiler at his own job.

Have you experienced performance issues ? If not, you might focus on producing a readable code (which you seem to have) that you can maintain rather than resorting to what could be premature optimization and clutter the code with weird optimizations.

因为看清所以看轻 2024-11-10 12:23:59

此代码的问题不在于您所做的事情,而在于它必须旋转 - 浪费 CPU 周期,而您的计算机正在执行的某些其他任务可能会使用这些周期 - 即使没有工作可做。

如果有很多程序都采取这种态度(它们是计算机之王,会占用整个 CPU),那么一切都会慢得像爬行一样。让你的代码像这样工作是一个非常重大的决定。

如果可能,请更改整个模型,以便在有更多可用数据时获得某种回调/信号/事件

The issue with this code is not in what you've done, but in that it has to spin - wasting CPU cycles that some other task your computer's performing might have used - even when there's no work to do.

If there are many programs that take this attitude (that they're king of the computer and will hog entire CPUs) then everything slows to an absolute crawl. It's a very drastic decision to let your code work like this.

If possible, change the entire model so that you get a callback/signal/event of some kind when there's more data available.

千仐 2024-11-10 12:23:59

你是对的,你应该让编译器优化,但如果你知道这样做是安全的:

while (true) { 
    a=g.get();
    b=g.get();
    c=g.get();
    d=g.get();
    process(a);
    process(b);
    process(c);
    process(d);
}

那么它可能会让事情变得更快。

或者,更极端的是,准备好返回类型(或指向它的指针)的整个数组,然后循环处理它们。如果 process() 和 get() 都使用大量代码,那么这样做可能意味着所有代码都可以保留在立即缓存中,而不是每次调用函数时从进一步的缓存中获取。

编译器无法进行此优化,因为它可能不知道重新排序函数调用是安全的。

You're right that you should let the compiler optimise, but if you know that it is safe to do this:

while (true) { 
    a=g.get();
    b=g.get();
    c=g.get();
    d=g.get();
    process(a);
    process(b);
    process(c);
    process(d);
}

then it might make things faster.

Or, even more extreme, get a whole array of the return type (or pointers to it) ready, then loop over it processing them. If process() and get() both use a lot of code, then doing this could mean all the code can stay in immediate cache, instead of being fetched from a further cache each time the function is called.

The compiler can't make this optimisation because it probably doesn't know that it's safe to re-order function calls.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文