为什么这个广播操作比嵌套的循环速度慢
我的功能可以检查数组中的每个元素是否大于下限,并且比上限较小。
在以下代码中,我很想知道为什么bunds_error
(嵌套的前面)比bounds_error2
(矢量化,广播操作)快,一个人可以做什么使此功能运行速度更快。
using BenchmarkTools
function bounds_error(x, xl)
num_x_rows = size(x,1)
num_dim = size(xl, 1)
for i in 1:num_x_rows
for j in 1:num_dim
if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
return true
end
end
end
return false
end
function bounds_error2(x, xl)
for row in eachrow(x)
xlt = transpose(xl)
if any(row .< xlt[1, :]) == true || any(row .> xlt[2, :])
return true
end
end
return false
end
#number of rows in xl (or xlimits) will always be equal to number of columns in x
xl = [ -5.0 5.0
-5.0 5.0
-5.0 5.0]
x = [1.0 2.0 3.0;
4.0 5.0 6.0]
@btime bounds_error(x, xl) #~20.645 ns (0 allocations: 0 bytes); true
@btime bounds_error2(x, xl) #~347.870 ns (12 allocations: 704 bytes); true
I have a function that checks if every element in an array is greater than a lower bound and lesser than an upper bound.
In the following code, I'm curious to know why bounds_error
(a nested for-loop) is faster than bounds_error2
(vectorized, broadcast operation) and what can one do to make this function run faster.
using BenchmarkTools
function bounds_error(x, xl)
num_x_rows = size(x,1)
num_dim = size(xl, 1)
for i in 1:num_x_rows
for j in 1:num_dim
if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
return true
end
end
end
return false
end
function bounds_error2(x, xl)
for row in eachrow(x)
xlt = transpose(xl)
if any(row .< xlt[1, :]) == true || any(row .> xlt[2, :])
return true
end
end
return false
end
#number of rows in xl (or xlimits) will always be equal to number of columns in x
xl = [ -5.0 5.0
-5.0 5.0
-5.0 5.0]
x = [1.0 2.0 3.0;
4.0 5.0 6.0]
@btime bounds_error(x, xl) #~20.645 ns (0 allocations: 0 bytes); true
@btime bounds_error2(x, xl) #~347.870 ns (12 allocations: 704 bytes); true
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这种差异的主要原因是内存分配(
0
vs.12
在此处)。当前,朱莉娅(Julia)中的切片创建一个副本,因此
xlt [1,:]
和xlt [2,:]
分配内存。要解决此问题,您应该使用@views
。第二个问题是元素比较row。&lt; xlt [1,:]
和row。&gt; xlt [2,:]
创建一个临时布尔数组。为避免分配临时数组,您应该映射任何(t-&gt; t [1]&lt; t [2],zip(row,xl1))
,以便对比较完成一个元素在一个循环中。应用了这些提示后,我的计算机上的性能差异仅大约
2NS
,它是为了evernrow
,zip
,等的便利性。而不是手动循环。注意,对于第一个功能,您可以使用
axes()
方便地在第一或二维上循环。而且,当使用BenchmarkTools.jl对任何Julia代码进行基准测试时,请不要忘记插值($
)函数的所有变量名称,以避免在全局变量上工作。The main reason for this difference is memory allocations (
0
vs.12
here).Currently, slices in Julia create a copy, so
xlt[1,:]
andxlt[2,:]
allocates memory. To remedy this problem you should use@views
. The second issue is the element-wise comparisonsrow .< xlt[1,:]
androw .> xlt[2,:]
create a temporary Boolean array. To avoid allocation of a temporary array, you should mapany(t->t[1]<t[2], zip(row,xl1))
so that the comparison is done one element at a time like a loop.After applying these tips, the performance difference on my machine is now about
2ns
only, which accounts for the convenience ofeachrow
,zip
, etc. instead of manual loops.Note, for the first function, you can use
axes()
to loop over first or second dimension conveniently. And when benchmarking any Julia code with BenchmarkTools.jl, don't forget to interpolate ($
) all variable names of a function to avoid working on global variables.