加速 PyTorch 中的一维卷积
对于我的项目,我将pytorch
用作线性代数后端。对于我的代码的性能部分,我需要进行2个小(2至9之间的长度)矢量(1D张量)的1D卷积。我的代码允许对输入进行批量处理,因此我可以堆叠几个输入向量来创建矩阵,然后可以同时将其全部卷积。由于torch.conv1d
不允许沿2D输入的单个维度进行卷积,因此我必须编写自己的卷积功能,称为confolve
。但是,这个新功能由双循环组成,因此非常慢。
问题:如何使卷曲
函数通过更好的代码设计更快地执行,并让它能够处理批处理输入(= 2D张量)?
部分答案:< Strong>以某种方式避免了下面的双面式前面的
是三个jupyter笔记本电脑单元,它们重新创建了一个最小的例子。请注意,您需要line_profiler
和%% writefile
魔术命令以使此工作!
%%writefile SO_CONVOLVE_QUESTION.py
import torch
def conv1d(a, v):
padding = v.shape[-1] - 1
return torch.conv1d(
input=a.view(1, 1, -1), weight=v.flip(0).view(1, 1, -1), padding=padding, stride=1
).squeeze()
def convolve(a, v):
if a.ndim == 1:
a = a.view(1, -1)
v = v.view(1, -1)
nrows, vcols = v.shape
acols = a.shape[1]
expanded = a.view((nrows, acols, 1)) * v.view((nrows, 1, vcols))
noutdim = vcols + acols - 1
out = torch.zeros((nrows, noutdim))
for i in range(acols):
for j in range(vcols):
out[:, i+j] += expanded[:, i, j]
return out.squeeze()
x = torch.randn(5)
y = torch.randn(7)
我将代码写入so_convolve_question.py
,因为这是line_profiler
的必要条件,并用作timeit.timeit.timeit.timeit
的设置。
现在,我们可以在非批量输入(x,y
)上评估上述代码的输出和性能,并批量输入(x_batch,y_batch
):
from SO_CONVOLVE_QUESTION import *
# Without batch processing
res1 = conv1d(x, y)
res = convolve(x, y)
print(torch.allclose(res1, res)) # True
# With batch processing, NB first dimension!
x_batch = torch.randn(5, 5)
y_batch = torch.randn(5, 7)
results = []
for i in range(5):
results.append(conv1d(x_batch[i, :], y_batch[i, :]))
res1 = torch.stack(results)
res = convolve(x_batch, y_batch)
print(torch.allclose(res1, res)) # True
print(timeit.timeit('convolve(x, y)', setup=setup, number=10000)) # 4.83391789999996
print(timeit.timeit('conv1d(x, y)', setup=setup, number=10000)) # 0.2799923000000035
在您上方的块中可以看到,使用Conv1D
功能执行5次卷积会产生与批处理输入上的Convolve
相同的结果。我们还可以看到,卷曲
(= 4.8s)比conv1d
(= 0.28s)慢得多。在下面,我们评估浏览
函数的慢部分,而无需使用line_profiler
:
%load_ext line_profiler
%lprun -f convolve convolve(x, y) # evaluated without batch-processing!
输出:
Timer unit: 1e-07 s
Total time: 0.0010383 s
File: C:\python_projects\pysumo\SO_CONVOLVE_QUESTION.py
Function: convolve at line 9
Line # Hits Time Per Hit % Time Line Contents
==============================================================
9 def convolve(a, v):
10 1 68.0 68.0 0.7 if a.ndim == 1:
11 1 271.0 271.0 2.6 a = a.view(1, -1)
12 1 44.0 44.0 0.4 v = v.view(1, -1)
13
14 1 28.0 28.0 0.3 nrows, vcols = v.shape
15 1 12.0 12.0 0.1 acols = a.shape[1]
16
17 1 4337.0 4337.0 41.8 expanded = a.view((nrows, acols, 1)) * v.view((nrows, 1, vcols))
18 1 12.0 12.0 0.1 noutdim = vcols + acols - 1
19 1 127.0 127.0 1.2 out = torch.zeros((nrows, noutdim))
20 6 32.0 5.3 0.3 for i in range(acols):
21 40 209.0 5.2 2.0 for j in range(vcols):
22 35 5194.0 148.4 50.0 out[:, i+j] += expanded[:, i, j]
23 1 49.0 49.0 0.5 return out.squeeze()
显然是双循环的,而创建的行显然是
展开的行>张量最慢。这些零件是否可以通过更好的代码设计来避免?
For my project I am using pytorch
as a linear algebra backend. For the performance part of my code, I need to do 1D convolutions of 2 small (length between 2 and 9) vectors (1D tensors) a very large number of times. My code allows for batch-processing of inputs and thus I can stack a couple of input vectors to create matrices that can then be convolved all at the same time. Since torch.conv1d
does not allow for convolving along a single dimension for 2D inputs, I had to write my own convolution function called convolve
. This new function however consists of a double for-loop and is therefore very very slow.
Question: how can I make the convolve
function perform faster through better code-design and let it be able to deal with batched inputs (=2D tensors)?
Partial answer: somehow avoid the double for-loop
Below are three jupyter notebook cells that recreate a minimal example. Note that the you need line_profiler
and the %%writefile
magic command to make this work!
%%writefile SO_CONVOLVE_QUESTION.py
import torch
def conv1d(a, v):
padding = v.shape[-1] - 1
return torch.conv1d(
input=a.view(1, 1, -1), weight=v.flip(0).view(1, 1, -1), padding=padding, stride=1
).squeeze()
def convolve(a, v):
if a.ndim == 1:
a = a.view(1, -1)
v = v.view(1, -1)
nrows, vcols = v.shape
acols = a.shape[1]
expanded = a.view((nrows, acols, 1)) * v.view((nrows, 1, vcols))
noutdim = vcols + acols - 1
out = torch.zeros((nrows, noutdim))
for i in range(acols):
for j in range(vcols):
out[:, i+j] += expanded[:, i, j]
return out.squeeze()
x = torch.randn(5)
y = torch.randn(7)
I write the code to the SO_CONVOLVE_QUESTION.py
because that is necessary for line_profiler
and to use as a setup for timeit.timeit
.
Now we can evaluate the output and performance of the code above on non-batch input (x, y
) and batched input (x_batch, y_batch
):
from SO_CONVOLVE_QUESTION import *
# Without batch processing
res1 = conv1d(x, y)
res = convolve(x, y)
print(torch.allclose(res1, res)) # True
# With batch processing, NB first dimension!
x_batch = torch.randn(5, 5)
y_batch = torch.randn(5, 7)
results = []
for i in range(5):
results.append(conv1d(x_batch[i, :], y_batch[i, :]))
res1 = torch.stack(results)
res = convolve(x_batch, y_batch)
print(torch.allclose(res1, res)) # True
print(timeit.timeit('convolve(x, y)', setup=setup, number=10000)) # 4.83391789999996
print(timeit.timeit('conv1d(x, y)', setup=setup, number=10000)) # 0.2799923000000035
In the block above you can see that performing convolution 5 times using the conv1d
function produces the same result as convolve
on batched inputs. We can also see that convolve
(= 4.8s) is much slower than the conv1d
(= 0.28s). Below we assess the slow part of the convolve
function WITHOUT batch processing using line_profiler
:
%load_ext line_profiler
%lprun -f convolve convolve(x, y) # evaluated without batch-processing!
Output:
Timer unit: 1e-07 s
Total time: 0.0010383 s
File: C:\python_projects\pysumo\SO_CONVOLVE_QUESTION.py
Function: convolve at line 9
Line # Hits Time Per Hit % Time Line Contents
==============================================================
9 def convolve(a, v):
10 1 68.0 68.0 0.7 if a.ndim == 1:
11 1 271.0 271.0 2.6 a = a.view(1, -1)
12 1 44.0 44.0 0.4 v = v.view(1, -1)
13
14 1 28.0 28.0 0.3 nrows, vcols = v.shape
15 1 12.0 12.0 0.1 acols = a.shape[1]
16
17 1 4337.0 4337.0 41.8 expanded = a.view((nrows, acols, 1)) * v.view((nrows, 1, vcols))
18 1 12.0 12.0 0.1 noutdim = vcols + acols - 1
19 1 127.0 127.0 1.2 out = torch.zeros((nrows, noutdim))
20 6 32.0 5.3 0.3 for i in range(acols):
21 40 209.0 5.2 2.0 for j in range(vcols):
22 35 5194.0 148.4 50.0 out[:, i+j] += expanded[:, i, j]
23 1 49.0 49.0 0.5 return out.squeeze()
Obviously a double for-loop and the line creating the expanded
tensor are the slowest. Are these parts avoidable with better code-design?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Pytorch拥有一个批处理分析工具,称为
torch.nn.Functiuntional
,在那里您有一个conv1d
函数(显然还有2D,还有更多)。我们将使用 conv1d 。假设您要在
v1
中给出的100个向量与v2
给出的另一个向量。v1
具有的维度(minibatch,in Channels,striges),默认情况下需要1个频道。此外,v2
具有 *(\ text {out_channels}的尺寸(out_channels,off_channels,groups/in_channels,kw) *。 >和v2
将由:现在我们可以简单地计算必要的填充
,并且可以使用我没有时间来完成卷积
,但是它应该比您的初始double要快得多。
Pytorch has a batch analyzing tool called
torch.nn.functional
and there you have aconv1d
function (obviously 2d as well and much much more). we will use conv1d.Suppose you want to convolve 100 vectors given in
v1
with 1 another vector given inv2
.v1
has dimension of (minibatch , in channels , weights) and you need 1 channel by default. In addition,v2
has dimensions of * (\text{out_channels} , (out_channels,groups / in_channels,kW)*. You are using 1 channel and therefore 1 group sov1
andv2
will be given by:now we can simply compute the necessary padding via
and the convolution can be done using
I did not time it but it should be considerably faster than your initial double for loop.
事实证明,有一种方法可以通过沿维度对输入进行分组而无需 for 循环来实现此目的:
Turns out that there is a way to do it without for-loops via grouping of the inputs along a dimension: