如何比较 C++ 中 log() 和 fp 除法的性能?

发布于 2024-09-02 07:24:33 字数 610 浏览 11 评论 0原文

我在 C++ 中使用基于日志的类来存储非常小的浮点值(因为这些值超出了 double 的范围)。当我执行大量乘法时,这具有将乘法转换为和的额外好处。

但是,在我的算法中的某个时刻,我需要将标准 double 值除以 integer 值,然后执行 *= 来基于日志的值。我已经为基于日志的类重载了 *= 运算符,并且首先通过运行 log() 将右侧值转换为基于日志的值,然后添加到左侧值。 因此实际执行的操作是浮点除法、log()和浮点求和。

我的问题是,首先将分母转换为基于对数的值是否会更快,这将用浮点减法代替浮点除法,产生以下操作链:两次 log()、浮点减法、浮点求和。

最后,这归结为浮点除法比 log() 更快还是更慢。我怀疑一个常见的答案是,这是依赖于编译器和体系结构的,所以我会说我在 darwin 10.3.0 上使用 Apple 的 gcc 4.2。尽管如此,我还是希望得到一个答案,对这两个运算符的速度进行一般性评论和/或了解如何自己测量差异,因为这里可能会发生更多事情,例如执行执行类型转换的构造函数等等

干杯!

I’m using a log-based class in C++ to store very small floating-point values (as the values otherwise go beyond the scope of double). As I’m performing a large number of multiplications, this has the added benefit of converting the multiplications to sums.

However, at a certain point in my algorithm, I need to divide a standard double value by an integer value and than do a *= to a log-based value. I have overloaded the *= operator for my log-based class and the right-hand side value is first converted to a log-based value by running log() and than added to the left-hand side value.
Thus the operations actually performed are floating-point division, log() and floating-point summation.

My question whether it would be faster to first convert the denominator to a log-based value, which would replace the floating-point division with floating-point subtraction, yielding the following chain of operations: twice log(), floating-point subtraction, floating-point summation.

In the end, this boils down to whether floating-point division is faster or slower than log(). I suspect that a common answer would be that this is compiler and architecture dependent, so I’ll say that I use gcc 4.2 from Apple on darwin 10.3.0. Still, I hope to get an answer with a general remark on the speed of these two operators and/or an idea on how to measure the difference myself, as there might be more going on here, e.g. executing the constructors that do the type conversion etc.

Cheers!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

小镇女孩 2024-09-09 07:24:33

你是否多次除以同一个整数?如果是这样,您可以乘以 1./yourInteger,并且只除一次。如果可能的话,这会比任何一个都快。

至于您的实际问题,它不仅取决于编译器和体系结构,还取决于微体系结构和数据。

在您的特定平台 (darwin/x86) 上,对于当前硬件 i5/i7:divide(1) 约为 24 个周期,log( )(2) 约为 35 个周期。然而,由于除法仅使用单个指令调度槽,因此硬件的重新排序引擎可以在除法运行时执行其他有用的计算;相比之下,log( ) 是在软件中实现的,因此处理器将其他计算提升到对数延迟的机会较少。这意味着在实践中,除法通常会更快一些。

1) 来自英特尔优化手册

2) 通过在紧密循环中调用 log( ) 并使用 mach_absolute_time( ) 获取挂起时间来测量。

Do you divide by the same integer multiple times? If so you can instead multiply by 1./yourInteger, and only do the divide once. That would be faster than either if possible.

As to your actual question, it's not only compiler and architecture dependent, but also micro-architecture and data dependent.

On your particular platform (darwin/x86), for current hardware i5/i7: ~24 cycles for divide(1), ~35 cycles for log( )(2). However, because divide only uses a single instruction dispatch slot, the hardware's reorder engine can do other useful computation while the divide is in flight; log( ) is implemented in software, by contrast, and so there is less opportunity for the processor to hoist other computations into the latency of the logarithm. This means that in practice, divide will often be a good bit faster.

1) From the Intel Optimization Manual

2) Measured by calling log( ) in a tight loop and using mach_absolute_time( ) to get wall time.

故人爱我别走 2024-09-09 07:24:33

在 x86 架构上,对数比除法花费的时间明显更长:FYL2X 需要 85 个周期(吞吐量) FDIV 的 40 个周期相比。如果其他架构有很大不同,我会感到惊讶。使用浮点除法。

On the x86 architecture, logarithms take significantly longer than divisions: 85 cycles (throughput) for FYL2X compared to 40 cycles for FDIV. I would be surprised if other architectures are much different. Go with the the floating-point division.

旧人 2024-09-09 07:24:33

除法的主要问题是,尽管它在大多数现代 CPU 上是一条指令,但通常具有较高的延迟(PowerPC 上为 31 个周期 - 不确定 x86 上是什么)。如果您有其他可以与除法同时发出的非相关指令,则可能会隐藏一些延迟。因此,答案在某种程度上取决于包含除法的循环中的指令组合类型和依赖关系(更不用说您正在使用哪个 CPU)。

话虽如此,我的直觉是,在大多数架构上,除法将比日志函数更快。

The main problem with division is that although it is a single instruction on most modern CPUs it typically has a high latency (31 cycles on PowerPC - not sure what is on x86). Some of this latency can be buried though if you have other non-dependent instructions which can be issued at the same time as the division. So the answer will depend somewhat on what kind of instruction mix and dependencies you have in the loop that contains your divide (not to mention which CPU you are using).

Having said that, my gut feeling is that divide will be faster than a log function on most architectures.

梦太阳 2024-09-09 07:24:33

我非常确定通过任何算法进行对数计算都将比 FP 除法昂贵得多。

当然,唯一确定的方法就是对其进行编码并测量代码的性能。从您的描述来看,实现这两个版本并并行尝试应该不会太困难。

I'm pretty sure that doing a log computation via whatever algorithm is going to be rather more expensive than even FP division would be.

Of course the only way to be sure is to code it up and measure the performance of the code. From your description it sounds like it shouldn't be too difficult to implement both versions and try it side-by-side.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文