intel MacBook 和 M1 之间的 np.float32 浮点差异

发布于 2025-01-13 07:28:24 字数 2285 浏览 0 评论 0原文

我最近将我的 Intel MacBook Pro 13" 升级为配备 M1 Pro 的 MacBook Pro 14"。一直在努力让我的软件重新编译和工作。幸运的是，除了一些晦涩的 Fortran 代码和 Python 中的浮点问题外，没有什么大问题。关于 python/numpy 我有以下问题。

我有一个很大的代码库，但为了简单起见，我将使用这个简单的函数，将飞行高度转换为压力来显示问题。

def fl2pres(FL):
    P0=101325
    T0=288.15
    T1=216.65
    g=9.80665
    R=287.0528742
    GAMMA=0.0065
    P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))

    h=FL*30.48

    return np.where(h<=11000, \
        P0*np.exp(-g/GAMMA/R*np.log((T0/(T0-GAMMA*h) ))),\
            P11*np.exp(-g/R/T1*(h-11000)) )

当我在 M1 Pro 上运行代码时，我得到：

In [2]: fl2pres(np.float64([400, 200]))
Out[3]: array([18753.90334892, 46563.239766  ])

和；

In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.90234375, 46563.25080916])

在我的旧款 Intel MacBook Pro 上执行同样的操作，我得到：

In [2]: fl2pres(np.float64([400, 200]))
Out[2]: array([18753.90334892, 46563.239766  ])

和；

In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.904296888, 46563.24778944])

float64 计算匹配，但 float32 不匹配。我们在代码中大量使用 float32 来优化内存。我知道由于架构差异，可能会发生这种浮点错误，但想知道是否可以进行简单的修复，因为目前某些单元测试失败了。我可以将架构包含在这些测试中，但我希望有一个更简单的解决方案？

将所有输入转换为 float64 使我的单元测试通过，从而解决了这个问题，但由于我们有相当多的大型数组和数据帧，因此对内存的影响是不必要的。

两台笔记本电脑都运行通过 homebrew 安装的 python 3.9.10、pandas 1.4.1 和 numpy 1.22.3（安装用于映射加速和 blas）。

编辑我更改了打印中间值的函数，以查看发生变化的位置：

def fl2pres(FL):
    P0=101325
    T0=288.15
    T1=216.65
    g=9.80665
    R=287.0528742
    GAMMA=0.0065
    P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))

    h=FL*30.48
    A = np.log((T0/(T0-GAMMA*h)))
    B = np.exp(-g/GAMMA/R*A)
    C = np.exp(-g/R/T1*(h-11000))
    print(f"P11:{P11}, h:{h}, A:{A}, B:{B}, C:{C}")
    return np.where(h<=11000, P0*B, P11*C)

使用与上述 float32 情况相同的输入运行此函数，我在 M1 Pro 上得到：

P11:22632.040591374975, h:[12192.  6096.], A:[0.32161594 0.14793371], B:[0.1844504  0.45954345], C:[0.82864394 2.16691503]
array([18753.90334892, 46563.239766  ])

在 Intel 上：

P11:22632.040591374975, h:[12192.  6096.], A:[0.32161596 0.14793368], B:[0.18445034 0.45954353], C:[0.828644 2.166915]
array([18753.90429688, 46563.24778944])

原文

I have recently upgraded my Intel MacBook Pro 13" to a MacBook Pro 14" with M1 Pro. Been working hard on getting my software to compile and work again. No big issues fortunately, except for floating point problems in some obscure fortran code and in python. With regard to python/numpy I have the following question.

I have a large code base bur for simplicity will use this simple function that converts flight level to pressure to show the issue.

def fl2pres(FL):
    P0=101325
    T0=288.15
    T1=216.65
    g=9.80665
    R=287.0528742
    GAMMA=0.0065
    P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))

    h=FL*30.48

    return np.where(h<=11000, \
        P0*np.exp(-g/GAMMA/R*np.log((T0/(T0-GAMMA*h) ))),\
            P11*np.exp(-g/R/T1*(h-11000)) )

When I run the code on my M1 Pro, I get:

In [2]: fl2pres(np.float64([400, 200]))
Out[3]: array([18753.90334892, 46563.239766  ])

and;

In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.90234375, 46563.25080916])

Doing the same on my older Intel MacBook Pro I get:

In [2]: fl2pres(np.float64([400, 200]))
Out[2]: array([18753.90334892, 46563.239766  ])

and;

In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.904296888, 46563.24778944])

The float64 calculations match but the float32 do not. We use float32 quite a lot throughout our code for memory optimisation. I understand that due to architectural differences this sort of floating point errors can occur but was wondering whether a simple fix was possible as currently some unit-tests fail. I could include the architecture in these tests but am hoping for an easier solution?

Converting all inputs to float64 makes my unit-tests pass and hence fixes this issue but sine we have quite some large arrays and dataframes, the impact on memory is unwanted.

Both laptops run python 3.9.10 installed through homebrew, pandas 1.4.1 and numpy 1.22.3 (installed to map against accelerate and blas).

EDIT
I have changes the function to print intermediate values to see where changes occur:

def fl2pres(FL):
    P0=101325
    T0=288.15
    T1=216.65
    g=9.80665
    R=287.0528742
    GAMMA=0.0065
    P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))

    h=FL*30.48
    A = np.log((T0/(T0-GAMMA*h)))
    B = np.exp(-g/GAMMA/R*A)
    C = np.exp(-g/R/T1*(h-11000))
    print(f"P11:{P11}, h:{h}, A:{A}, B:{B}, C:{C}")
    return np.where(h<=11000, P0*B, P11*C)

Running this function with the same input as above for the float32 case, I get on M1 Pro:

P11:22632.040591374975, h:[12192.  6096.], A:[0.32161594 0.14793371], B:[0.1844504  0.45954345], C:[0.82864394 2.16691503]
array([18753.90334892, 46563.239766  ])

On Intel:

P11:22632.040591374975, h:[12192.  6096.], A:[0.32161596 0.14793368], B:[0.18445034 0.45954353], C:[0.828644 2.166915]
array([18753.90429688, 46563.24778944])

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄风 2025-01-20 07:28:24

根据我在 numpy 的 GitHub 上创建的问题：

您所经历的差异似乎都在一个单一的范围内
“ULP”（最后一位的单位），也许是 2？对于特殊的数学函数，
像 exp 或 sin 一样，不幸的是，小错误是预料之中的，并且可以
系统相关（硬件和操作系统/数学库）。
有一件事情可能会产生稍微大一点的影响
可以在较新的机器上使用 NumPy 的 SVML（即仅在
英特尔一）。可以在构建时禁用它
NPY_DISABLE_SVML=1 作为环境变量，但我不认为你
可以在不构建 NumPy 的情况下禁用它的使用。（然而，现在，
很可能 M1 机器精度较低，或者它们
两者大致相同，只是不同）

我还没有尝试使用 NPY_DISABLE_SVML=1 编译 numpy，我现在的计划是使用一个可以在我的所有平台上运行的 docker 容器，并使用单个“truth” “为了我的测试。