使用 GDAL 和 Python 的最小距离算法
我正在尝试使用 GDAL 和 Python 实现图像分类的最小距离算法。计算样本区域的平均像素值并将其存储到数组列表(“sample_array”)中后,我将图像读入名为“values”的数组中。使用以下代码循环遍历该数组:
values = valBD.ReadAsArray()
# loop through pixel columns
for X in range(0,XSize):
# loop thorugh pixel lines
for Y in range (0, YSize):
# initialize variables
minDist = 9999
# get minimum distance
for iSample in range (0, sample_count):
# dist = calc_distance(values[jPixel, iPixel], sample_array[iSample])
# computing minimum distance
iPixelVal = values[Y, X]
mean = sample_array[iSample]
dist = math.sqrt((iPixelVal - mean) * (iPixelVal - mean)) # only for testing
if dist < minDist:
minDist = dist
values[Y, X] = iSample
classBD.WriteArray(values, xoff=0, yoff=0)
对于大图像,此过程需要很长时间。这就是为什么我想问是否有人知道更快的方法。我不太了解 python 中不同变量的访问速度。或者也许有人知道我可以使用的图书馆。 提前致谢, 马里奥
I'm trying to implement the Minimum Distance Algorithm for image classification using GDAL and Python. After calculating the mean pixel-value of the sample areas and storing them into a list of arrays ("sample_array"), I read the image into an array called "values". With the following code I loop through this array:
values = valBD.ReadAsArray()
# loop through pixel columns
for X in range(0,XSize):
# loop thorugh pixel lines
for Y in range (0, YSize):
# initialize variables
minDist = 9999
# get minimum distance
for iSample in range (0, sample_count):
# dist = calc_distance(values[jPixel, iPixel], sample_array[iSample])
# computing minimum distance
iPixelVal = values[Y, X]
mean = sample_array[iSample]
dist = math.sqrt((iPixelVal - mean) * (iPixelVal - mean)) # only for testing
if dist < minDist:
minDist = dist
values[Y, X] = iSample
classBD.WriteArray(values, xoff=0, yoff=0)
This procedure takes very long for big images. That's why I want to ask if somebody knows a faster method. I don't know much about access-speed of different variables in python. Or maybe someone knows a libary I could use.
Thanks in advance,
Mario
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您绝对应该使用 NumPy。我使用一些相当大的栅格数据集,NumPy 会烧毁它们。在我的机器上,使用下面的代码对于 1000 x 1000 阵列没有明显的延迟。代码后面有对其工作原理的解释。
cdist()< /code> 计算
values
中每个元素到samples
中每个元素的“距离”。这会生成一个 1,000,000 x 3 的数组,其中每行n
都有从原始数组中的像素n
到每个样本值[1, 2, 3]
。argmin(axis=1)
为您提供每行最小值的索引,这就是您想要的。快速重塑可为您提供图像所需的矩形格式。You should definitely be using NumPy. I work with some pretty large raster datasets and NumPy burns through them. On my machine, with the code below there's no noticeable delay for a 1000 x 1000 array. An explanation of how this works follows the code.
cdist()
calculates the "distance" from each element invalues
to each of the elements insamples
. This generates a 1,000,000 x 3 array, where each rown
has the distance from pixeln
in the original array to each of the sample values[1, 2, 3]
.argmin(axis=1)
gives you the index of the minimum value along each row, which is what you want. A quick reshape gives you the rectangular format you'd expect for an image.同意 Thomas K 的观点:使用 PIL,或者编写一个 C 函数并使用例如 ctypes 包装它,或者至少使用一些 numPy 矩阵运算。
或者在现有代码上使用 pypy(对于图像代码,JIT 编译的代码可以快 100 倍)。尝试 pypy 并告诉我们您获得了多少加速。
底线:永远不要在 cPython 中像这样原生地进行像素级的操作,解释和内存管理开销会杀了你。
Agree with Thomas K: use PIL, or else write a C-function and wrap it using e.g. ctypes, or at very least use some numPy matrix operations.
Or else use pypy on your existing code (JIT-compiled code can be 100x faster, on image code). Try pypy and tell us what speedup you got.
Bottom line: never do stuff pixel-wise like this natively in cPython, the interpreting and memory-mgt overhead will kill you.