加速 Numpy 数组上的循环

发布于 2024-11-06 20:05:32 字数 359 浏览 1 评论 0原文

在我的代码中，我有一个 for 循环，它对多维 numpy 数组进行索引，并使用每次迭代时获得的子数组进行一些操作。看起来像这样

for sub in Arr:
  #do stuff using sub

现在使用 sub 完成的内容已完全矢量化，因此它应该是高效的。另一方面，这个循环迭代大约 ~10^5 次，是瓶颈。你认为我会通过将这部分卸载到 C 来获得改进吗？我有点不愿意这样做，因为 do stuff using sub 使用广播、切片、智能索引技巧，这些技巧写起来很乏味我也欢迎关于在将计算卸载到 C 时如何处理广播、切片、智能索引的想法和建议。

原文

In my code I have for loop that indexes over a multidimensional numpy array and does some operation using the sub-array that is obtained at each iteration. It looks like this

for sub in Arr:
  #do stuff using sub

Now the stuff that is done using sub is fully vectorized, so it should be efficient. On the other hand this loop iterates about ~10^5 times and is the bottleneck. Do you think I will get an improvement by offloading this part to C. I am somewhat reluctant to do so because the do stuff using sub uses broadcasting, slicing, smart-indexing tricks that would be tedious to write in plain C. I would also welcome thoughts and suggestions about how to deal with broadcasting, slicing, smart-indexing when offloading computation to C.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

热风软妹 2024-11-13 20:05:33

你可以看看scipy.weave。您可以使用 scipy.weave.blitz 将表达式透明地转换为 C++ 代码并运行它。它将自动处理切片并消除临时变量，但您声称 for 循环的主体不会创建临时变量，因此您的情况可能会有所不同。

但是，如果您想用更高效的东西替换整个 for 循环，那么您可以使用 scipy.inline 。缺点是您必须编写 C++ 代码。这应该不会太难，因为您可以使用与 numpy 数组表达式非常接近的 Blitz++ 语法。直接支持切片，但不支持广播。

有两种解决方法：

是使用 numpy-C api 并使用多维迭代器。他们透明地处理广播。但是，您正在调用 Numpy 运行时，因此可能会产生一些开销。另一种选择，可能也是更简单的选择是使用通常的矩阵表示法进行广播。广播操作可以写成全为向量的外积。好处是 Blitz++ 实际上不会在内存中创建这个临时广播数组，它会弄清楚如何将其包装到等效循环中。
对于第二个选项，请查看http://www.oonumerics。 org/blitz/docs/blitz_3.html#SEC88 用于索引占位符。只要你的矩阵维数小于 11 就可以了。此链接显示了如何使用它们来形成外部产品 http://www.oonumerics .org/blitz/docs/blitz_3.html#SEC99（搜索外部产品以转到文档的相关部分）。