intel MacBook 和 M1 之间的 np.float32 浮点差异
我最近将我的 Intel MacBook Pro 13" 升级为配备 M1 Pro 的 MacBook Pro 14"。一直在努力让我的软件重新编译和工作。幸运的是,除了一些晦涩的 Fortran 代码和 Python 中的浮点问题外,没有什么大问题。关于 python/numpy 我有以下问题。
我有一个很大的代码库,但为了简单起见,我将使用这个简单的函数,将飞行高度转换为压力来显示问题。
def fl2pres(FL):
P0=101325
T0=288.15
T1=216.65
g=9.80665
R=287.0528742
GAMMA=0.0065
P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))
h=FL*30.48
return np.where(h<=11000, \
P0*np.exp(-g/GAMMA/R*np.log((T0/(T0-GAMMA*h) ))),\
P11*np.exp(-g/R/T1*(h-11000)) )
当我在 M1 Pro 上运行代码时,我得到:
In [2]: fl2pres(np.float64([400, 200]))
Out[3]: array([18753.90334892, 46563.239766 ])
和;
In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.90234375, 46563.25080916])
在我的旧款 Intel MacBook Pro 上执行同样的操作,我得到:
In [2]: fl2pres(np.float64([400, 200]))
Out[2]: array([18753.90334892, 46563.239766 ])
和;
In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.904296888, 46563.24778944])
float64 计算匹配,但 float32 不匹配。我们在代码中大量使用 float32 来优化内存。我知道由于架构差异,可能会发生这种浮点错误,但想知道是否可以进行简单的修复,因为目前某些单元测试失败了。我可以将架构包含在这些测试中,但我希望有一个更简单的解决方案?
将所有输入转换为 float64 使我的单元测试通过,从而解决了这个问题,但由于我们有相当多的大型数组和数据帧,因此对内存的影响是不必要的。
两台笔记本电脑都运行通过 homebrew 安装的 python 3.9.10、pandas 1.4.1 和 numpy 1.22.3(安装用于映射加速和 blas)。
编辑 我更改了打印中间值的函数,以查看发生变化的位置:
def fl2pres(FL):
P0=101325
T0=288.15
T1=216.65
g=9.80665
R=287.0528742
GAMMA=0.0065
P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))
h=FL*30.48
A = np.log((T0/(T0-GAMMA*h)))
B = np.exp(-g/GAMMA/R*A)
C = np.exp(-g/R/T1*(h-11000))
print(f"P11:{P11}, h:{h}, A:{A}, B:{B}, C:{C}")
return np.where(h<=11000, P0*B, P11*C)
使用与上述 float32 情况相同的输入运行此函数,我在 M1 Pro 上得到:
P11:22632.040591374975, h:[12192. 6096.], A:[0.32161594 0.14793371], B:[0.1844504 0.45954345], C:[0.82864394 2.16691503]
array([18753.90334892, 46563.239766 ])
在 Intel 上:
P11:22632.040591374975, h:[12192. 6096.], A:[0.32161596 0.14793368], B:[0.18445034 0.45954353], C:[0.828644 2.166915]
array([18753.90429688, 46563.24778944])
I have recently upgraded my Intel MacBook Pro 13" to a MacBook Pro 14" with M1 Pro. Been working hard on getting my software to compile and work again. No big issues fortunately, except for floating point problems in some obscure fortran code and in python. With regard to python/numpy I have the following question.
I have a large code base bur for simplicity will use this simple function that converts flight level to pressure to show the issue.
def fl2pres(FL):
P0=101325
T0=288.15
T1=216.65
g=9.80665
R=287.0528742
GAMMA=0.0065
P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))
h=FL*30.48
return np.where(h<=11000, \
P0*np.exp(-g/GAMMA/R*np.log((T0/(T0-GAMMA*h) ))),\
P11*np.exp(-g/R/T1*(h-11000)) )
When I run the code on my M1 Pro, I get:
In [2]: fl2pres(np.float64([400, 200]))
Out[3]: array([18753.90334892, 46563.239766 ])
and;
In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.90234375, 46563.25080916])
Doing the same on my older Intel MacBook Pro I get:
In [2]: fl2pres(np.float64([400, 200]))
Out[2]: array([18753.90334892, 46563.239766 ])
and;
In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.904296888, 46563.24778944])
The float64 calculations match but the float32 do not. We use float32 quite a lot throughout our code for memory optimisation. I understand that due to architectural differences this sort of floating point errors can occur but was wondering whether a simple fix was possible as currently some unit-tests fail. I could include the architecture in these tests but am hoping for an easier solution?
Converting all inputs to float64 makes my unit-tests pass and hence fixes this issue but sine we have quite some large arrays and dataframes, the impact on memory is unwanted.
Both laptops run python 3.9.10 installed through homebrew, pandas 1.4.1 and numpy 1.22.3 (installed to map against accelerate and blas).
EDIT
I have changes the function to print intermediate values to see where changes occur:
def fl2pres(FL):
P0=101325
T0=288.15
T1=216.65
g=9.80665
R=287.0528742
GAMMA=0.0065
P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))
h=FL*30.48
A = np.log((T0/(T0-GAMMA*h)))
B = np.exp(-g/GAMMA/R*A)
C = np.exp(-g/R/T1*(h-11000))
print(f"P11:{P11}, h:{h}, A:{A}, B:{B}, C:{C}")
return np.where(h<=11000, P0*B, P11*C)
Running this function with the same input as above for the float32 case, I get on M1 Pro:
P11:22632.040591374975, h:[12192. 6096.], A:[0.32161594 0.14793371], B:[0.1844504 0.45954345], C:[0.82864394 2.16691503]
array([18753.90334892, 46563.239766 ])
On Intel:
P11:22632.040591374975, h:[12192. 6096.], A:[0.32161596 0.14793368], B:[0.18445034 0.45954353], C:[0.828644 2.166915]
array([18753.90429688, 46563.24778944])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据我在 numpy 的 GitHub 上创建的问题:
我还没有尝试使用
NPY_DISABLE_SVML=1
编译 numpy,我现在的计划是使用一个可以在我的所有平台上运行的 docker 容器,并使用单个“truth” “为了我的测试。As per the issue I created at numpy's GitHub:
I haven't tried compiling numpy using
NPY_DISABLE_SVML=1
and my plan now is to use a docker container that can run on all my platforms and use a single "truth" for my tests.