加快 Numpy 数组的迭代速度
我正在使用 Numpy 执行图像处理,特别是运行标准差拉伸。这会读取 X 列,找到标准值。并执行百分比线性拉伸。然后它迭代到下一个“组”列并执行相同的操作。输入图像是 1GB、32 位、单波段栅格,需要相当长的时间来处理(几个小时)。下面是代码。
我意识到我有 3 个嵌套的 for 循环,这可能是瓶颈发生的地方。如果我在“盒子”中处理图像,也就是说加载一个[500,500]的数组并迭代图像处理时间是相当短的。不幸的是,相机错误要求我迭代极长的条带 (52,000 x 4) (y,x) 以避免条带。
任何有关加快速度的建议将不胜感激:
def box(dataset, outdataset, sampleSize, n):
quiet = 0
sample = sampleSize
#iterate over all of the bands
for j in xrange(1, dataset.RasterCount + 1): #1 based counter
band = dataset.GetRasterBand(j)
NDV = band.GetNoDataValue()
print "Processing band: " + str(j)
#define the interval at which blocks are created
intervalY = int(band.YSize/1)
intervalX = int(band.XSize/2000) #to be changed to sampleSize when working
#iterate through the rows
scanBlockCounter = 0
for i in xrange(0,band.YSize,intervalY):
#If the next i is going to fail due to the edge of the image/array
if i + (intervalY*2) < band.YSize:
numberRows = intervalY
else:
numberRows = band.YSize - i
for h in xrange(0,band.XSize, intervalX):
if h + (intervalX*2) < band.XSize:
numberColumns = intervalX
else:
numberColumns = band.XSize - h
scanBlock = band.ReadAsArray(h,i,numberColumns, numberRows).astype(numpy.float)
standardDeviation = numpy.std(scanBlock)
mean = numpy.mean(scanBlock)
newMin = mean - (standardDeviation * n)
newMax = mean + (standardDeviation * n)
outputBlock = ((scanBlock - newMin)/(newMax-newMin))*255
outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset
scanBlockCounter = scanBlockCounter + 1
#print str(scanBlockCounter) + ": " + str(scanBlock.shape) + str(h)+ ", " + str(intervalX)
if numberColumns == band.XSize - h:
break
#update progress line
if not quiet:
gdal.TermProgress_nocb( (float(h+1) / band.YSize) )
这是更新: 在不使用配置文件模块的情况下,因为我不想开始将一小部分代码包装到函数中,所以我混合使用了 print 和 exit 语句来真正粗略地了解哪些行花费了最多时间。幸运的是(我确实明白我是多么幸运)一根线拖垮了一切。
outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset
看来 GDAL 在打开输出文件和写出数组时效率相当低。考虑到这一点,我决定将修改后的数组“outBlock”添加到 python 列表中,然后写出块。这是我更改的部分:
刚刚修改了outputBlock ...
#Add the array to a list (tuple)
outputArrayList.append(outputBlock)
#Check the interval counter and if it is "time" write out the array
if len(outputArrayList) >= (intervalX * writeSize) or finisher == 1:
#Convert the tuple to a numpy array. Here we horizontally stack the tuple of arrays.
stacked = numpy.hstack(outputArrayList)
#Write out the array
outRaster = outdataset.GetRasterBand(j).WriteArray(stacked,xOffset,i)#array, xOffset, yOffset
xOffset = xOffset + (intervalX*(intervalX * writeSize))
#Cleanup to conserve memory
outputArrayList = list()
stacked = None
finisher=0
Finisher只是一个处理边缘的标志。花了一些时间才弄清楚如何从列表中构建数组。其中,使用 numpy.array 创建一个 3 维数组(有人愿意解释为什么吗?),而写入数组需要一个 2 维数组。总处理时间现在从不到 2 分钟到 5 分钟不等。知道为什么时间范围可能存在吗?
非常感谢所有发帖的人!下一步是真正进入 Numpy 并了解矢量化以进行额外的优化。
I am working on performing image processing using Numpy, specifically a running standard deviation stretch. This reads in X number of columns, finds the Std. and performs a percentage linear stretch. It then iterates to the next "group" of columns and performs the same operations. The input image is a 1GB, 32-bit, single band raster which is taking quite a long time to process (hours). Below is the code.
I realize that I have 3 nested for loops which is, presumably where the bottleneck is occurring. If I process the image in "boxes", that is to say loading an array that is [500,500] and iterating through the image processing time is quite short. Unfortunately, camera error requires that I iterate in extremely long strips (52,000 x 4) (y,x) to avoid banding.
Any suggestions on speeding this up would be appreciated:
def box(dataset, outdataset, sampleSize, n):
quiet = 0
sample = sampleSize
#iterate over all of the bands
for j in xrange(1, dataset.RasterCount + 1): #1 based counter
band = dataset.GetRasterBand(j)
NDV = band.GetNoDataValue()
print "Processing band: " + str(j)
#define the interval at which blocks are created
intervalY = int(band.YSize/1)
intervalX = int(band.XSize/2000) #to be changed to sampleSize when working
#iterate through the rows
scanBlockCounter = 0
for i in xrange(0,band.YSize,intervalY):
#If the next i is going to fail due to the edge of the image/array
if i + (intervalY*2) < band.YSize:
numberRows = intervalY
else:
numberRows = band.YSize - i
for h in xrange(0,band.XSize, intervalX):
if h + (intervalX*2) < band.XSize:
numberColumns = intervalX
else:
numberColumns = band.XSize - h
scanBlock = band.ReadAsArray(h,i,numberColumns, numberRows).astype(numpy.float)
standardDeviation = numpy.std(scanBlock)
mean = numpy.mean(scanBlock)
newMin = mean - (standardDeviation * n)
newMax = mean + (standardDeviation * n)
outputBlock = ((scanBlock - newMin)/(newMax-newMin))*255
outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset
scanBlockCounter = scanBlockCounter + 1
#print str(scanBlockCounter) + ": " + str(scanBlock.shape) + str(h)+ ", " + str(intervalX)
if numberColumns == band.XSize - h:
break
#update progress line
if not quiet:
gdal.TermProgress_nocb( (float(h+1) / band.YSize) )
Here is an update:
Without using the profile module, as I did not want to start wrapping small sections of the code into functions I used a mix of print and exit statements to get a really rough idea about which lines were taking the most time. Luckily (and I do understand how lucky I was) one line was dragging everything down.
outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset
It appears that GDAL is quite inefficient when opening the output file and writing out the array. With this in mind I decided to add my modified arrays "outBlock" to a python list, then write out chunks. Here is the segment that I changed:
The outputBlock was just modified ...
#Add the array to a list (tuple)
outputArrayList.append(outputBlock)
#Check the interval counter and if it is "time" write out the array
if len(outputArrayList) >= (intervalX * writeSize) or finisher == 1:
#Convert the tuple to a numpy array. Here we horizontally stack the tuple of arrays.
stacked = numpy.hstack(outputArrayList)
#Write out the array
outRaster = outdataset.GetRasterBand(j).WriteArray(stacked,xOffset,i)#array, xOffset, yOffset
xOffset = xOffset + (intervalX*(intervalX * writeSize))
#Cleanup to conserve memory
outputArrayList = list()
stacked = None
finisher=0
Finisher is simply a flag that handles the edges. It took a bit of time to figure out how to build an array from the list. In that, using numpy.array was creating a 3-d array (anyone care to explain why?) and write array requires a 2d array. Total processing time is now varying from just under 2 minutes to 5 minutes. Any idea why the range of times might exist?
Many thanks to everyone who posted! The next step is to really get into Numpy and learn about vectorization for additional optimization.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
加速
numpy
数据操作的一种方法是使用vectorize
。本质上,vectorize 接受一个函数f
并创建一个新函数g
,该函数将f
映射到数组a
上。然后像这样调用g
:g(a)
。如果没有您正在使用的数据,我不能肯定地说这是否会有帮助,但也许您可以将上面的内容重写为一组可以矢量化的函数。也许在这种情况下,您可以将索引数组矢量化为 ReadAsArray(h,i,numberColumns, numberRows)。以下是潜在优势的示例:
加速 15 倍!另请注意,numpy 切片可以优雅地处理
ndarray
的边缘:因此,如果您可以将
ReadAsArray
数据放入ndarray
中,那么您就不需要进行任何边缘检查恶作剧。关于你关于重塑的问题——重塑根本不会从根本上改变数据。它只是改变了 numpy 索引数据的“步幅”。当您调用
reshape
方法时,返回的值是数据的新视图;数据根本没有被复制或改变,带有旧步幅信息的旧视图也没有被复制或改变。One way to speed up operations over
numpy
data is to usevectorize
. Essentially, vectorize takes a functionf
and creates a new functiong
that mapsf
over an arraya
.g
is then called like so:g(a)
.Without having the data you're working with available, I can't say for certain whether this will help, but perhaps you can rewrite the above as a set of functions that can be
vectorized
. Perhaps in this case you could vectorize over an array of indices intoReadAsArray(h,i,numberColumns, numberRows)
. Here's an example of the potential benefit:A 15x speedup! Note also that numpy slicing handles the edges of
ndarray
s elegantly:So if you could get your
ReadAsArray
data into anndarray
you wouldn't have to do any edge-checking shenanigans.Regarding your question about reshaping -- reshaping doesn't fundamentally alter the data at all. It just changes the "strides" by which
numpy
indices the data. When you call thereshape
method, the value returned is a new view into the data; the data isn't copied or altered at all, nor is the old view with the old stride information.按要求答复。
如果您受 IO 限制,则应该对读/写进行分块。尝试将约 500 MB 的数据转储到 ndarray,处理所有数据,将其写出,然后获取下一个约 500 MB 的数据。确保重用 ndarray。
Answered as requested.
If you are IO bound, you should chunk your reads/writes. Try dumping ~500 MB of data to an ndarray, process it all, write it out and then grab the next ~500 MB. Make sure to reuse the ndarray.
没有尝试完全理解您在做什么,我注意到您没有使用任何 numpy 切片 或 数组广播,这两者都可以加快您的代码速度,或者,最起码,让它更具可读性。如果这些与您的问题无关,我深表歉意。
Without trying to completely understand exactly what you are doing, I notice that you aren't using any numpy slices or array broadcasting, both of which may speed up your code, or, at the very least, make it more readable. My apologies if these aren't germane to your problem.