MAC M1异常缓慢
我已经为我的numpy进行了简单的速度测试:
import numpy as np
A = np.random.rand(1000, 1000)
B = np.random.rand(1000, 1000)
%timeit A.dot(B)
结果是:
30.3 ms ± 829 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
与其他人通常看到的结果相比,这个结果似乎异常缓慢(平均少于10 ms)。我想知道这种行为的原因可能是什么。
我的系统是Macos Big Sur上的M1芯片。 Python版本为3.8.13,Numpy版本为1.22.4。 Numpy是通过
pip install "numpy==1.22.4"
np.show_config()
is:
openblas64__info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
blas_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
openblas64__lapack_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
lapack_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42
not found = AVX,F16C,FMA3,AVX2,AVX512F,AVX512CD,AVX512_KNL,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
edit:
我对此代码shippet进行了另一项测试(来自 1 ):
import time
import numpy as np
np.random.seed(42)
a = np.random.uniform(size=(300, 300))
runtimes = 10
timecosts = []
for _ in range(runtimes):
s_time = time.time()
for i in range(100):
a += 1
np.linalg.svd(a)
timecosts.append(time.time() - s_time)
print(f'mean of {runtimes} runs: {np.mean(timecosts):.5f}s')
我的测试结果是:
mean of 10 runs: 6.17438s
而网站上的参考结果 1 是:(芯片是m1 max),
+-----------------------------------+-----------------------+--------------------+
| Python installed by (run on)→ | Miniforge (native M1) | Anaconda (Rosseta) |
+----------------------+------------+------------+----------+----------+---------+
| Numpy installed by ↓ | Run from → | Terminal | PyCharm | Terminal | PyCharm |
+----------------------+------------+------------+----------+----------+---------+
| Apple Tensorflow | 4.19151 | 4.86248 | / | / |
+-----------------------------------+------------+----------+----------+---------+
| conda install numpy | 4.29386 | 4.98370 | 4.10029 | 4.99271 |
+-----------------------------------+------------+----------+----------+---------+
从结果,我的时间,我的时间安排与参考中的任何numpy版本相比,代码较慢。
I have conducted a simple speed test for my numpy:
import numpy as np
A = np.random.rand(1000, 1000)
B = np.random.rand(1000, 1000)
%timeit A.dot(B)
The result is:
30.3 ms ± 829 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
This result seems abnormally slow compared with what others typically see (less than 10 ms on average). I'm wondering what could possibly be the cause of such behavior.
My system is MacOS Big Sur on M1 chip. Python version is 3.8.13, numpy version is 1.22.4. The numpy is installed via
pip install "numpy==1.22.4"
The output of np.show_config()
is:
openblas64__info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
blas_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
openblas64__lapack_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
lapack_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42
not found = AVX,F16C,FMA3,AVX2,AVX512F,AVX512CD,AVX512_KNL,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
Edit:
I did another test with this code snippet (from 1):
import time
import numpy as np
np.random.seed(42)
a = np.random.uniform(size=(300, 300))
runtimes = 10
timecosts = []
for _ in range(runtimes):
s_time = time.time()
for i in range(100):
a += 1
np.linalg.svd(a)
timecosts.append(time.time() - s_time)
print(f'mean of {runtimes} runs: {np.mean(timecosts):.5f}s')
The result of my test is:
mean of 10 runs: 6.17438s
whereas the reference results on the website 1 are: (the chip is M1 Max)
+-----------------------------------+-----------------------+--------------------+
| Python installed by (run on)→ | Miniforge (native M1) | Anaconda (Rosseta) |
+----------------------+------------+------------+----------+----------+---------+
| Numpy installed by ↓ | Run from → | Terminal | PyCharm | Terminal | PyCharm |
+----------------------+------------+------------+----------+----------+---------+
| Apple Tensorflow | 4.19151 | 4.86248 | / | / |
+-----------------------------------+------------+----------+----------+---------+
| conda install numpy | 4.29386 | 4.98370 | 4.10029 | 4.99271 |
+-----------------------------------+------------+----------+----------+---------+
From the results, the timing of my code is slower compared with any of the numpy versions in the reference.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我注意到M1上的类似放缓,但是我认为至少在我的计算机上的实际原因并不是从根本上有故障的Numpy安装,而是基准本身的一些问题。考虑以下示例:
computing
x = at @ a; eigh(x)
需要2 ms,而eigh(在 @ a)
400 ms。我认为在后一种情况下,%timeit
是一些问题。也许由于某种原因,计算被路由到“效率核心”?我的初步答案是,您的第一个基准标准
>%TimeIt
是不可靠的。I've noticed similar slowdowns on M1, but I think the actual cause, at least on my computer, is not a fundamentally faulty Numpy installation, but some problem with the benchmarks themselves. Consider the following example:
Computing
x = a.T @ a; eigh(x)
takes 2 ms, whileeigh(a.T @ a)
400 ms. I think in the latter case it's some problem with%timeit
. Maybe for some reason the computation gets routed to "efficiency cores"?My tentative answer is that your first benchmark with
%timeit
is not reliable.如果您怀疑在TimeIt中存在问题,请尝试使用时间代替
Apple Silicon上的Numpy的更多信息,请阅读链接波纹管中的第一个答案。为了获得最佳性能,建议使用Apple的加速Veclib。如果您使用conda安装,请访问 @andrejhribernik的评论:
为什么M1 Max上的本地人比Old Intel i5上的Python慢得多?
If you suspect an issue in timeit, try using time instead
For more information on numpy on Apple silicon, please read the first answer in the link bellow. For optimal performance, it is advised to use Apple's accelerated vecLib. If you install using conda, then check out also @AndrejHribernik's comment:
Why Python native on M1 Max is greatly slower than Python on old Intel i5?