浮点双精度常数被视为扩展
在 664 位应用程序中,如果我将浮点常量与从 StrToFloat() 获得的相同“值”的值进行比较,我会得到不同的结果。例如:
procedure foo;
const
d1 = 0.6694716
d3: double = 0.6694716
var
d2: double;
begin
d2 := StrToFloat('0.6694716');
if d1 = d2 then
beep;
if d1 = d3 then
beep;
if d2 = d3 then
beep;
end;
d1 和 d3 的十六进制值为 $B47339B4,而 d2 的十六进制值为 $B47339B3。虽然它们是“平等”的比较。它们在技术上并不相同。据我所知,常数 d1 和 d2 是错误的。也许编译器使用 FPU,这是由于舍入造成的?
缺少将所有常量作为字符串并在运行时转换它们。有人知道解决这个问题的方法吗?
In a 664 bit app, If I compare a floating point constant to a value obtained from StrToFloat() for the same "value" I get a different results. For example:
procedure foo;
const
d1 = 0.6694716
d3: double = 0.6694716
var
d2: double;
begin
d2 := StrToFloat('0.6694716');
if d1 = d2 then
beep;
if d1 = d3 then
beep;
if d2 = d3 then
beep;
end;
The d1 and d3 have a hex value of $B47339B4 while d2 has a hex value of $B47339B3. While they are "equal" for a comparison. They technically are not the same. From what I can tell, the constants d1 and d2 are wrong. Perhaps the compiler uses the FPU and this is due to rounding?
Short of making all my constants as strings and converting them at run time. Anyone know of a work around for this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Delphi 中的浮点文字默认是扩展的。在 64 位中,这应该没有任何区别,但在 32 位中却有区别。我的猜测是,解析器仍然在内部将浮点文字表示为 10 字节浮点(扩展),然后 64 位编译器在编译时将其“舍入”为 8 字节(双精度)。
如果我的假设是正确的,那么可能没有什么办法可以规避这一点。
编辑
Delphi执行以下转换
Float literals in Delphi are Extended by default. In 64 bits, that shouldn't make any difference, but in 32 bits it does. My guess is that the parser still internally represents float literals as a 10 byte float(extended), and then the 64 bits compiler "round it down" to 8 bytes(double) when compiling.
If my hypothesis is right, there might be nothing that can be done to circumvent that.
EDIT
Delphi does the following conversion
令人厌恶的是,我将 const 作为字符串值,并在单元初始化中将它们转换为 double 。这样我们就得到了双倍而不是扩展的数学。这似乎有效。我预计它不适用于 32 位。
Disgusting as it is, I put the const as string values and the convert them to double in the unit initialization. That way we get double not extended math. This seems to work. I expect it will not work with 32-bit.