检测浮点软件仿真
我正在开发一个应用程序,其中运行速度比精度更重要。数字运算涉及浮点运算,我担心 double 和/或 long double 是在软件中处理的,而不是在处理器上本地处理的(这在32 位架构对吧?)。我想在硬件支持下使用最高精度进行条件编译,但我还没有找到一种快速简便的方法来检测软件模拟。我在 GNU/Linux 上使用 g++,我不关心可移植性。它在 x86 架构上运行,因此我假设 float
始终是本机的。
I'm working on an application where runtime speed is more important than precision. The number crunching involves floating point arithmetic and I'm concerned about double
and/or long double
being handled in software instead of natively on the processor (this is always true on a 32-bit arch right?). I would like to conditionally compile using the highest precision with hardware support, but I haven't found a quick and easy way to detect software emulation. I'm using g++ on GNU/Linux and I'm not concerned about portability. It's running on x86 arch, so I'm assuming that float
is always native.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
现代 x86 上的浮点单元 (FPU) 本质上是双精度型(事实上,它甚至比双精度型更大),而不是浮点型(32 位中的“32”描述的是整数寄存器宽度,而不是浮点宽度)。但是,如果您的代码利用向量化 SSE 指令(并行执行 4 个单精度操作或 2 个双精度操作),则情况并非如此。
如果没有,那么将应用程序从浮动切换到双倍所带来的主要速度影响将在于增加的内存带宽。
The Floating-point unit (FPU) on modern x86 is natively double (in fact, it's even bigger than double), not float (the "32" in 32-bit describes the integer register widths, not the floating-point width). This is not true, however, if your code is taking advantage of vectorized SSE instructions, which do either 4 single or 2 double operations in parallel.
If not, then your main speed hit by switching your app from float to double will be in the increased memory bandwidth.
不。常见的 CPU 具有用于
double
的专用硬件(在某些情况下也有long double
)。老实说,如果性能是一个问题,那么您应该了解您的 CPU。查看 CPU 手册,找出每种数据类型的性能损失是什么。即使在缺乏“适当的”
double
支持的 CPU 上,它仍然没有在软件中模拟。 Cell CPU(以 Playstation 3 闻名)只是将双精度值通过 FPU 两次,因此它比浮点计算成本高得多,但它不是软件模拟。您仍然有用于双重
处理的专用指令。它们只是比等效的float
指令效率低。除非您的目标是 20 年历史的 CPU 或小型、有限的嵌入式处理器,否则浮点指令将在硬件中处理,尽管并非所有架构都能同样有效地处理每种数据类型
No. Common CPU's have dedicated hardware for
double
(and in some caseslong double
as well). And honestly, if performance is a concern, then you should know your CPU. Hit the CPU manuals, and figure out what the performance penalty for each datatype is.Even on CPUs that lack "proper"
double
support, it still isn't emulated in software. The Cell CPU (of Playstation 3 fame) simply passes adouble
twice through the FPU, so it's a lot costlier than afloat
computation, but it's not software emulation. You still have dedicated instructions fordouble
processing. They're just less efficient than the equivalentfloat
instructions.Unless you either target 20-year-old CPU's, or small, limited embedded processors, floating-point instructions will be handled in hardware, although not all architectures handle every datatype equally efficiently
x86 在硬件中实现了
float
、double
等功能,并且已经这样做了很长时间。许多现代 32 位程序都采用 SSE2 支持,因为它已经存在了好几年,并且可以依赖于消费芯片上的存在。x86 does
float
,double
, and more in hardware, and has done for a long time. Many modern 32bit programs assume SSE2 support, as that's been around for several years now and can be depended on to be present on a consumer chip.在 x86 上,硬件通常在内部使用 80 位,这对于 double 来说绰绰有余。
您确定性能确实是一个问题(通过分析代码)还是只是猜测它可能不受支持?
On x86, the hardware typically uses 80 bits internally, which is more than enough for double.
Are you sure that performance is a real concern (from profiling the code) or just guessing that it may not be supported?