减少Python中长for循环的时间
我这边的另一个愚蠢问题;)我对下面的 len(x)=len(y)=7'700'000 代码片段有一些问题:
from numpy import *
for k in range(len(x)):
if x[k] == xmax:
xind = -1
else:
xind = int(floor((x[k]-xmin)/xdelta))
if y[k] == ymax:
yind = -1
else:
yind = int(floor((y[k]-ymin)/ydelta))
arr = append(arr,grid[xind,yind])
所有变量都是浮点数或整数,除了 arr
和网格。 arr
是一个一维数组,grid
是一个二维数组。
我的问题是运行循环需要很长时间(几分钟)。谁能解释一下,为什么需要这么长时间?有人有建议吗?即使我尝试通过 arange()
交换 range()
,我也只节省了一秒钟。
谢谢。
第一次编辑 对不起。忘记告诉我正在导入 numpy
第二次编辑
我在 2D 网格中有一些点。网格的每个单元格都存储了一个值。我必须找出该点的位置并将该值应用于新数组。这就是我的问题和我的想法。
ps:如果想更好地理解的话请看图。单元格的值用不同的颜色表示。
a other stupid question from my side ;) I have some issues with the following snippet with len(x)=len(y)=7'700'000:
from numpy import *
for k in range(len(x)):
if x[k] == xmax:
xind = -1
else:
xind = int(floor((x[k]-xmin)/xdelta))
if y[k] == ymax:
yind = -1
else:
yind = int(floor((y[k]-ymin)/ydelta))
arr = append(arr,grid[xind,yind])
All variables are floats or integers except arr
and grid
. arr
is a 1D-array and grid
is a 2D-array.
My problem is that it takes a long time to run through the loop (several minutes). Can anyone explain me, why this takes such a long time? Have anyone a suggestion? Even if I try to exchange range()
through arange()
then I save only some second.
Thanks.
1st EDIT
Sorry. Forgot to tell that I'm importing numpy
2nd EDIT
I have some points in a 2D-grid. Each cell of the grid have a value stored. I have to find out which position the point have and apply the value to a new array. That's my problem and my idea.
p.s.: look at the picture if you want to understand it better. the values of the cell are represented with different colors.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
怎么样:
注意:如果你使用 numpy,如果你想高效地做事,不要像 python 列表那样对待数组。
How about something like:
Note: if you're using numpy don't treat the arrays like python lists if you want to do things efficiently.
如果您不想生成巨大的额外文件,还有 izip列表。
There's also izip for if you don't want to generate a giant extra list.
除了数据大小之外,我看不到明显的问题。您的计算机能够将所有内容保存在内存中吗?如果没有,您可能会在交换内存中“跳来跳去”,这总是很慢。如果内存中有完整的数据,请尝试一下 psyco。它可能会大大加快你的计算速度。
I cannot see an obvious problem, beside the size of the data. Is your computer able to hold everything in memory? If not, you are probably "jumping around" in swapped memory, which will always be slow. If the complete data is in memory, give psyco a try. It might speed up your calculation a lot.
我怀疑问题可能出在您存储结果的方式上:
The docs for
append
说它返回:这意味着您将在每次迭代中释放和分配越来越大的数组。我建议预先分配一个正确大小的数组,然后在每次迭代中用数据填充它。例如:
I suspect the problem might be in the way you're storing the results:
The docs for
append
say it returns:This means you'll be deallocating and allocating a larger and larger array every iteration. I suggest allocating an array of the correct size up-front, then populating it with data in each iteration. e.g.:
x的长度是700万?我想这就是原因!
迭代发生了 700 万次,
也许你应该做另一种循环。
真的有必要循环超过7m次吗?
x's lenght is 7 millions? I think that's why!
THe iterations ocurrs 7 millions times,
probably you shoud make another kind of loop.
It's really necesary looping over 7 m times?