使用 python 和 numpy 进行二维卷积
我正在尝试使用 numpy 在 python 中执行 2d 卷积
我有一个 2d 数组,如下所示,内核 H_r 为行,H_c 为列
data = np.zeros((nr, nc), dtype=np.float32)
#fill array with some data here then convolve
for r in range(nr):
data[r,:] = np.convolve(data[r,:], H_r, 'same')
for c in range(nc):
data[:,c] = np.convolve(data[:,c], H_c, 'same')
data = data.astype(np.uint8);
它不会产生我期望的输出,这段代码看起来不错吗,我认为问题在于从 float32 到 8bit 的转换。最好的方法是什么
谢谢
I am trying to perform a 2d convolution in python using numpy
I have a 2d array as follows with kernel H_r for the rows and H_c for the columns
data = np.zeros((nr, nc), dtype=np.float32)
#fill array with some data here then convolve
for r in range(nr):
data[r,:] = np.convolve(data[r,:], H_r, 'same')
for c in range(nc):
data[:,c] = np.convolve(data[:,c], H_c, 'same')
data = data.astype(np.uint8);
It does not produce the output that I was expecting, does this code look OK, I think the problem is with the casting from float32 to 8bit. Whats the best way to do this
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
也许这不是最优化的解决方案,但这是我之前在 Python 的 numpy 库中使用的实现:
我希望这段代码可以帮助其他有同样疑问的人。
问候。
Maybe it is not the most optimized solution, but this is an implementation I used before with numpy library for Python:
I hope this code helps other guys with the same doubt.
Regards.
编辑 [2019 年 1 月]
@Tashus 评论如下是正确的,因此 @dudemeister 的答案可能更切中要害。他建议的函数也更有效,因为避免了直接的 2D 卷积和所需的操作数量。
可能的问题
我相信您正在执行两个一维卷积,第一个每列,第二个每行,并将第一个的结果替换为第二个的结果。
请注意
numpy.convolve
'same' 参数的 a> 返回一个与所提供的最大形状相同的数组,因此当您进行第一次卷积时,您已经填充了整个data
数组。在这些步骤中可视化数组的一种好方法是使用 Hinton图表,以便您可以检查哪些元素已经具有值。
可能的解决方案
您可以尝试将两个卷积的结果相加(使用
data[:,c] += ..
而不是data[:,c] =
第二个for
循环),如果您的卷积矩阵是使用一维H_r
和H_c
矩阵的结果,如下所示:另一种方法是使用 < code>scipy.signal.convolve2d 与二维卷积数组,这可能是您首先想要做的。
Edit [Jan 2019]
@Tashus comment bellow is correct, and @dudemeister's answer is thus probably more on the mark. The function he suggested is also more efficient, by avoiding a direct 2D convolution and the number of operations that would entail.
Possible Problem
I believe you are doing two 1d convolutions, the first per columns and the second per rows, and replacing the results from the first with the results of the second.
Notice that
numpy.convolve
with the'same'
argument returns an array of equal shape to the largest one provided, so when you make the first convolution you already populated the entiredata
array.One good way to visualize your arrays during these steps is to use Hinton diagrams, so you can check which elements already have a value.
Possible Solution
You can try to add the results of the two convolutions (use
data[:,c] += ..
instead ofdata[:,c] =
on the secondfor
loop), if your convolution matrix is the result of using the one dimensionalH_r
andH_c
matrices like so:Another way to do that would be to use
scipy.signal.convolve2d
with a 2d convolution array, which is probably what you wanted to do in the first place.由于您已经将内核分开,因此您应该简单地使用 scipy 中的 sepfir2d 函数:
另一方面,您那里的代码看起来没问题......
Since you already have your kernel separated you should simply use the sepfir2d function from scipy:
On the other hand, the code you have there looks all right ...
我检查了很多实现,但没有找到符合我目的的实现,这应该非常简单。所以这是一个非常简单的 for 循环实现
I checked out many implementations and found none for my purpose, which should be really simple. So here is a dead-simple implementation with for loop
它也可能不是最优化的解决方案,但它比 @omotto 提出的解决方案快大约十倍,并且它只使用基本的 numpy 函数(如 reshape、expand_dims、tile...)并且没有“for”循环:
我尝试添加很多注释来解释该方法,但总体思路是将 3D 输入图像重塑为 5D 形状之一(output_image_height、kernel_height、output_image_width、kernel_width、output_image_channel),然后直接使用基本数组应用内核乘法。当然,此方法会使用更多内存(在执行期间,图像的大小因此乘以 kernel_height*kernel_width),但速度更快。
为了完成这个重塑步骤,我“过度使用”了 numpy 数组的索引方法,特别是将 numpy 数组作为 numpy 数组的索引的可能性。
这种方法也可以用于使用基本数学函数在 Pytorch 或 Tensorflow 中重新编码 2D 卷积积,但我毫不怀疑地说它会比现有的 nn.conv2d 运算符慢......
我真的很喜欢编码这个仅使用 numpy 基本工具的方法。
It might not be the most optimized solution either, but it is approximately ten times faster than the one proposed by @omotto and it only uses basic numpy function (as reshape, expand_dims, tile...) and no 'for' loops:
I tried to add a lot of comments to explain the method but the global idea is to reshape the 3D input image to a 5D one of shape (output_image_height, kernel_height, output_image_width, kernel_width, output_image_channel) and then to apply the kernel directly using the basic array multiplication. Of course, this methods is then using more memory (during the execution the size of the image is thus multiply by kernel_height*kernel_width) but it is faster.
To do this reshape step, I 'over-used' the indexing methods of numpy arrays, especially, the possibility of giving a numpy array as indices into a numpy array.
This methods could also be used to re-code the 2D convolution product in Pytorch or Tensorflow using the base math functions but I have no doubt in saying that it will be slower than the existing nn.conv2d operator...
I really enjoyed coding this method by only using the numpy basic tools.
仅使用基本 numpy 的超级简单且快速的卷积:
运行它:
不是沿着图像滑动内核并逐像素计算变换,而是创建与内核中每个元素对应的一系列图像的移位版本并应用相应的内核每个移动图像版本的值。
这可能是仅使用基本 numpy 可以获得的最快速度;速度已经与 scipy convolve2d 的 C 实现相当,并且优于 fftconvolve。这个想法类似于@Tatarize。此示例仅适用于一种颜色分量;对于 RGB,只需对每个重复(或相应地修改算法)。
Super simple and fast convolution using only basic numpy:
Run it:
Instead of sliding the kernel along the image and computing the transformation pixel by pixel, create a series of shifted versions of the image corresponding to each element in the kernel and apply the corresponding kernel value to each of the shifted image versions.
This is probably the fastest you can get using just basic numpy; the speed is already comparable to C implementation of scipy convolve2d and better than fftconvolve. The idea is similar to @Tatarize. This example works only for one color component; for RGB just repeat for each (or modify the algorithm accordingly).
最明显的方法之一是对内核进行硬编码。
此示例执行完全展开的 3x3 框模糊。您可以将具有不同值的值相乘,然后除以不同的金额。但是,如果您真的想要最快且最肮脏的方法,这就是它。我认为它比 Guillaume Mougeot 的方法好 5 倍。他的方法比其他方法好 10 倍。
如果您正在做高斯模糊之类的事情,它可能会丢失几个步骤。并且需要增加一些东西。
One of the most obvious is to hard code the kernel.
This example does a box blur 3x3 completely unrolled. You can multiply the values where you have a different value and divide them by a different amount. But, if you honestly want the quickest and dirtiest method this is it. I think it beats Guillaume Mougeot's method by a factor of like 5. His method beating the others by a factor of 10.
It may lose a few steps if you're doing something like a gaussian blur. and need to multiply some stuff.
我编写了这个使用
numpy.lib.stride_tricks.as_strided
的convolve_stride
。此外,它支持跨步和扩张。它还兼容阶数 > 的张量。 2.I wrote this
convolve_stride
which usesnumpy.lib.stride_tricks.as_strided
. Moreover it supports both strides and dilation. It is also compatible to tensor with order > 2.尝试第一轮然后转换为 uint8:
Try to first round and then cast to uint8:
它也可以获取不对称图像。为了对一批二维矩阵执行相关(深度学习术语中的卷积),可以迭代所有通道,计算每个通道切片与相应滤波器切片的相关性。
例如:如果图像为 (28,28,3) 并且滤波器大小为 (5,5,3),则从图像通道中取出 3 个切片中的每一个,并使用上面的自定义函数执行互相关,然后堆叠结果矩阵在输出的相应维度中。
It can also take asymmetric images. In order to perform correlation(convolution in deep learning lingo) on a batch of 2d matrices, one can iterate over all the channels, calculate the correlation for each of the channel slices with the respective filter slice.
For example: If image is (28,28,3) and filter size is (5,5,3) then take each of the 3 slices from the image channel and perform the cross correlation using the custom function above and stack the resulting matrix in the respective dimension of the output.
此代码不正确:
请参阅从多维卷积到一维的 Nussbaumer 变换。
This code incorrect:
See Nussbaumer transformation from multidimentional convolution to one dimentional.