加快 Numpy 数组的迭代速度

发布于 2024-11-19 15:23:46 字数 3633 浏览 5 评论 0原文

我正在使用 Numpy 执行图像处理,特别是运行标准差拉伸。这会读取 X 列,找到标准值。并执行百分比线性拉伸。然后它迭代到下一个“组”列并执行相同的操作。输入图像是 1GB、32 位、单波段栅格,需要相当长的时间来处理(几个小时)。下面是代码。

我意识到我有 3 个嵌套的 for 循环,这可能是瓶颈发生的地方。如果我在“盒子”中处理图像,也就是说加载一个[500,500]的数组并迭代图像处理时间是相当短的。不幸的是,相机错误要求我迭代极长的条带 (52,000 x 4) (y,x) 以避免条带。

任何有关加快速度的建议将不胜感激:

def box(dataset, outdataset, sampleSize, n):

    quiet = 0
    sample = sampleSize
    #iterate over all of the bands
    for j in xrange(1, dataset.RasterCount + 1): #1 based counter

        band = dataset.GetRasterBand(j)
        NDV = band.GetNoDataValue()

        print "Processing band: " + str(j)       

        #define the interval at which blocks are created
        intervalY = int(band.YSize/1)    
        intervalX = int(band.XSize/2000) #to be changed to sampleSize when working

        #iterate through the rows
        scanBlockCounter = 0

        for i in xrange(0,band.YSize,intervalY):

            #If the next i is going to fail due to the edge of the image/array
            if i + (intervalY*2) < band.YSize:
                numberRows = intervalY
            else:
                numberRows = band.YSize - i

            for h in xrange(0,band.XSize, intervalX):

                if h + (intervalX*2) < band.XSize:
                    numberColumns = intervalX
                else:
                    numberColumns = band.XSize - h

                scanBlock = band.ReadAsArray(h,i,numberColumns, numberRows).astype(numpy.float)

                standardDeviation = numpy.std(scanBlock)
                mean = numpy.mean(scanBlock)

                newMin = mean - (standardDeviation * n)
                newMax = mean + (standardDeviation * n)

                outputBlock = ((scanBlock - newMin)/(newMax-newMin))*255
                outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset


                scanBlockCounter = scanBlockCounter + 1
                #print str(scanBlockCounter) + ": " + str(scanBlock.shape) + str(h)+ ", " + str(intervalX)
                if numberColumns == band.XSize - h:
                    break

                #update progress line
                if not quiet:
                    gdal.TermProgress_nocb( (float(h+1) / band.YSize) )

这是更新: 在不使用配置文件模块的情况下,因为我不想开始将一小部分代码包装到函数中,所以我混合使用了 print 和 exit 语句来真正粗略地了解哪些行花费了最多时间。幸运的是(我确实明白我是多么幸运)一根线拖垮了一切。

    outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset

看来 GDAL 在打开输出文件和写出数组时效率相当低。考虑到这一点,我决定将修改后的数组“outBlock”添加到 python 列表中,然后写出块。这是我更改的部分:

刚刚修改了outputBlock ...

         #Add the array to a list (tuple)
            outputArrayList.append(outputBlock)

            #Check the interval counter and if it is "time" write out the array
            if len(outputArrayList) >= (intervalX * writeSize) or finisher == 1:

                #Convert the tuple to a numpy array.  Here we horizontally stack the tuple of arrays.
                stacked = numpy.hstack(outputArrayList)

                #Write out the array
                outRaster = outdataset.GetRasterBand(j).WriteArray(stacked,xOffset,i)#array, xOffset, yOffset
                xOffset = xOffset + (intervalX*(intervalX * writeSize))

                #Cleanup to conserve memory
                outputArrayList = list()
                stacked = None
                finisher=0

Finisher只是一个处理边缘的标志。花了一些时间才弄清楚如何从列表中构建数组。其中,使用 numpy.array 创建一个 3 维数组(有人愿意解释为什么吗?),而写入数组需要一个 2 维数组。总处理时间现在从不到 2 分钟到 5 分钟不等。知道为什么时间范围可能存在吗?

非常感谢所有发帖的人!下一步是真正进入 Numpy 并了解矢量化以进行额外的优化。

I am working on performing image processing using Numpy, specifically a running standard deviation stretch. This reads in X number of columns, finds the Std. and performs a percentage linear stretch. It then iterates to the next "group" of columns and performs the same operations. The input image is a 1GB, 32-bit, single band raster which is taking quite a long time to process (hours). Below is the code.

I realize that I have 3 nested for loops which is, presumably where the bottleneck is occurring. If I process the image in "boxes", that is to say loading an array that is [500,500] and iterating through the image processing time is quite short. Unfortunately, camera error requires that I iterate in extremely long strips (52,000 x 4) (y,x) to avoid banding.

Any suggestions on speeding this up would be appreciated:

def box(dataset, outdataset, sampleSize, n):

    quiet = 0
    sample = sampleSize
    #iterate over all of the bands
    for j in xrange(1, dataset.RasterCount + 1): #1 based counter

        band = dataset.GetRasterBand(j)
        NDV = band.GetNoDataValue()

        print "Processing band: " + str(j)       

        #define the interval at which blocks are created
        intervalY = int(band.YSize/1)    
        intervalX = int(band.XSize/2000) #to be changed to sampleSize when working

        #iterate through the rows
        scanBlockCounter = 0

        for i in xrange(0,band.YSize,intervalY):

            #If the next i is going to fail due to the edge of the image/array
            if i + (intervalY*2) < band.YSize:
                numberRows = intervalY
            else:
                numberRows = band.YSize - i

            for h in xrange(0,band.XSize, intervalX):

                if h + (intervalX*2) < band.XSize:
                    numberColumns = intervalX
                else:
                    numberColumns = band.XSize - h

                scanBlock = band.ReadAsArray(h,i,numberColumns, numberRows).astype(numpy.float)

                standardDeviation = numpy.std(scanBlock)
                mean = numpy.mean(scanBlock)

                newMin = mean - (standardDeviation * n)
                newMax = mean + (standardDeviation * n)

                outputBlock = ((scanBlock - newMin)/(newMax-newMin))*255
                outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset


                scanBlockCounter = scanBlockCounter + 1
                #print str(scanBlockCounter) + ": " + str(scanBlock.shape) + str(h)+ ", " + str(intervalX)
                if numberColumns == band.XSize - h:
                    break

                #update progress line
                if not quiet:
                    gdal.TermProgress_nocb( (float(h+1) / band.YSize) )

Here is an update:
Without using the profile module, as I did not want to start wrapping small sections of the code into functions I used a mix of print and exit statements to get a really rough idea about which lines were taking the most time. Luckily (and I do understand how lucky I was) one line was dragging everything down.

    outRaster = outdataset.GetRasterBand(j).WriteArray(outputBlock,h,i)#array, xOffset, yOffset

It appears that GDAL is quite inefficient when opening the output file and writing out the array. With this in mind I decided to add my modified arrays "outBlock" to a python list, then write out chunks. Here is the segment that I changed:

The outputBlock was just modified ...

         #Add the array to a list (tuple)
            outputArrayList.append(outputBlock)

            #Check the interval counter and if it is "time" write out the array
            if len(outputArrayList) >= (intervalX * writeSize) or finisher == 1:

                #Convert the tuple to a numpy array.  Here we horizontally stack the tuple of arrays.
                stacked = numpy.hstack(outputArrayList)

                #Write out the array
                outRaster = outdataset.GetRasterBand(j).WriteArray(stacked,xOffset,i)#array, xOffset, yOffset
                xOffset = xOffset + (intervalX*(intervalX * writeSize))

                #Cleanup to conserve memory
                outputArrayList = list()
                stacked = None
                finisher=0

Finisher is simply a flag that handles the edges. It took a bit of time to figure out how to build an array from the list. In that, using numpy.array was creating a 3-d array (anyone care to explain why?) and write array requires a 2d array. Total processing time is now varying from just under 2 minutes to 5 minutes. Any idea why the range of times might exist?

Many thanks to everyone who posted! The next step is to really get into Numpy and learn about vectorization for additional optimization.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

被你宠の有点坏 2024-11-26 15:23:46

加速 numpy 数据操作的一种方法是使用 vectorize。本质上,vectorize 接受一个函数 f 并创建一个新函数 g,该函数将 f 映射到数组 a 上。然后像这样调用gg(a)

>>> sqrt_vec = numpy.vectorize(lambda x: x ** 0.5)
>>> sqrt_vec(numpy.arange(10))
array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ])

如果没有您正在使用的数据,我不能肯定地说这是否会有帮助,但也许您可以将上面的内容重写为一组可以矢量化的函数。也许在这种情况下,您可以将索引数组矢量化为 ReadAsArray(h,i,numberColumns, numberRows)。以下是潜在优势的示例:

>>> print setup1
import numpy
sqrt_vec = numpy.vectorize(lambda x: x ** 0.5)
>>> print setup2
import numpy
def sqrt_vec(a):
    r = numpy.zeros(len(a))
    for i in xrange(len(a)):
        r[i] = a[i] ** 0.5
    return r
>>> timeit.timeit(stmt='a = sqrt_vec(numpy.arange(1000000))', setup=setup1, number=1)
0.30318188667297363
>>> timeit.timeit(stmt='a = sqrt_vec(numpy.arange(1000000))', setup=setup2, number=1)
4.5400981903076172

加速 15 倍!另请注意,numpy 切片可以优雅地处理 ndarray 的边缘:

>>> a = numpy.arange(25).reshape((5, 5))
>>> a[3:7, 3:7]
array([[18, 19],
       [23, 24]])

因此,如果您可以将 ReadAsArray 数据放入 ndarray 中,那么您就不需要进行任何边缘检查恶作剧。


关于你关于重塑的问题——重塑根本不会从根本上改变数据。它只是改变了 numpy 索引数据的“步幅”。当您调用 reshape 方法时,返回的值是数据的新视图;数据根本没有被复制或改变,带有旧步幅信息的旧视图也没有被复制或改变。

>>> a = numpy.arange(25)
>>> b = a.reshape((5, 5))
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])
>>> b
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])
>>> a[5]
5
>>> b[1][0]
5
>>> a[5] = 4792
>>> b[1][0]
4792
>>> a.strides
(8,)
>>> b.strides
(40, 8)

One way to speed up operations over numpy data is to use vectorize. Essentially, vectorize takes a function f and creates a new function g that maps f over an array a. g is then called like so: g(a).

>>> sqrt_vec = numpy.vectorize(lambda x: x ** 0.5)
>>> sqrt_vec(numpy.arange(10))
array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ])

Without having the data you're working with available, I can't say for certain whether this will help, but perhaps you can rewrite the above as a set of functions that can be vectorized. Perhaps in this case you could vectorize over an array of indices into ReadAsArray(h,i,numberColumns, numberRows). Here's an example of the potential benefit:

>>> print setup1
import numpy
sqrt_vec = numpy.vectorize(lambda x: x ** 0.5)
>>> print setup2
import numpy
def sqrt_vec(a):
    r = numpy.zeros(len(a))
    for i in xrange(len(a)):
        r[i] = a[i] ** 0.5
    return r
>>> timeit.timeit(stmt='a = sqrt_vec(numpy.arange(1000000))', setup=setup1, number=1)
0.30318188667297363
>>> timeit.timeit(stmt='a = sqrt_vec(numpy.arange(1000000))', setup=setup2, number=1)
4.5400981903076172

A 15x speedup! Note also that numpy slicing handles the edges of ndarrays elegantly:

>>> a = numpy.arange(25).reshape((5, 5))
>>> a[3:7, 3:7]
array([[18, 19],
       [23, 24]])

So if you could get your ReadAsArray data into an ndarray you wouldn't have to do any edge-checking shenanigans.


Regarding your question about reshaping -- reshaping doesn't fundamentally alter the data at all. It just changes the "strides" by which numpy indices the data. When you call the reshape method, the value returned is a new view into the data; the data isn't copied or altered at all, nor is the old view with the old stride information.

>>> a = numpy.arange(25)
>>> b = a.reshape((5, 5))
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])
>>> b
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])
>>> a[5]
5
>>> b[1][0]
5
>>> a[5] = 4792
>>> b[1][0]
4792
>>> a.strides
(8,)
>>> b.strides
(40, 8)
相对绾红妆 2024-11-26 15:23:46

按要求答复。

如果您受 IO 限制,则应该对读/写进行分块。尝试将约 500 MB 的数据转储到 ndarray,处理所有数据,将其写出,然后获取下一个约 500 MB 的数据。确保重用 ndarray。

Answered as requested.

If you are IO bound, you should chunk your reads/writes. Try dumping ~500 MB of data to an ndarray, process it all, write it out and then grab the next ~500 MB. Make sure to reuse the ndarray.

玩套路吗 2024-11-26 15:23:46

没有尝试完全理解您在做什么,我注意到您没有使用任何 numpy 切片数组广播,这两者都可以加快您的代码速度,或者,最起码,让它更具可读性。如果这些与您的问题无关,我深表歉意。

Without trying to completely understand exactly what you are doing, I notice that you aren't using any numpy slices or array broadcasting, both of which may speed up your code, or, at the very least, make it more readable. My apologies if these aren't germane to your problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文