为什么这个广播操作比嵌套的循环速度慢

发布于 2025-02-11 00:27:46 字数 1041 浏览 0 评论 0原文

我的功能可以检查数组中的每个元素是否大于下限，并且比上限较小。

在以下代码中，我很想知道为什么bunds_error（嵌套的前面）比bounds_error2（矢量化，广播操作）快，一个人可以做什么使此功能运行速度更快。

using BenchmarkTools

function bounds_error(x, xl)
    num_x_rows = size(x,1)
    num_dim = size(xl, 1)
    for i in 1:num_x_rows
        for j in 1:num_dim
            if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
                return true
            end
        end
    end
    return false
end

function bounds_error2(x, xl)
    for row in eachrow(x)
        xlt = transpose(xl)
        if any(row .< xlt[1, :]) == true || any(row .> xlt[2, :])
            return true
        end
    end
    return false
end

#number of rows in xl (or xlimits) will always be equal to number of columns in x

xl =  [     -5.0  5.0
            -5.0  5.0
            -5.0  5.0]

x = [1.0 2.0 3.0; 
     4.0 5.0 6.0]

@btime bounds_error(x, xl) #~20.645 ns (0 allocations: 0 bytes); true

@btime bounds_error2(x, xl) #~347.870 ns (12 allocations: 704 bytes); true

原文

I have a function that checks if every element in an array is greater than a lower bound and lesser than an upper bound.

In the following code, I'm curious to know why bounds_error (a nested for-loop) is faster than bounds_error2 (vectorized, broadcast operation) and what can one do to make this function run faster.

using BenchmarkTools

function bounds_error(x, xl)
    num_x_rows = size(x,1)
    num_dim = size(xl, 1)
    for i in 1:num_x_rows
        for j in 1:num_dim
            if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
                return true
            end
        end
    end
    return false
end

function bounds_error2(x, xl)
    for row in eachrow(x)
        xlt = transpose(xl)
        if any(row .< xlt[1, :]) == true || any(row .> xlt[2, :])
            return true
        end
    end
    return false
end

#number of rows in xl (or xlimits) will always be equal to number of columns in x

xl =  [     -5.0  5.0
            -5.0  5.0
            -5.0  5.0]

x = [1.0 2.0 3.0; 
     4.0 5.0 6.0]

@btime bounds_error(x, xl) #~20.645 ns (0 allocations: 0 bytes); true

@btime bounds_error2(x, xl) #~347.870 ns (12 allocations: 704 bytes); true

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

各自安好 2025-02-18 00:27:47

这种差异的主要原因是内存分配（0 vs. 12在此处）。

#  20.645 ns (0 allocations: 0 bytes)
# 347.870 ns (12 allocations: 704 bytes)

当前，朱莉娅（Julia）中的切片创建一个副本，因此xlt [1，：]和xlt [2，：]分配内存。要解决此问题，您应该使用@views。第二个问题是元素比较row。＆lt; xlt [1，：]和row。＆gt; xlt [2，：]创建一个临时布尔数组。为避免分配临时数组，您应该映射任何（t-＆gt; t [1]＆lt; t [2]，zip（row，xl1）），以便对比较完成一个元素在一个循环中。

应用了这些提示后，我的计算机上的性能差异仅大约2NS，它是为了evernrow，zip，等的便利性。而不是手动循环。

注意，对于第一个功能，您可以使用axes（）方便地在第一或二维上循环。而且，当使用BenchmarkTools.jl对任何Julia代码进行基准测试时，请不要忘记插值（$）函数的所有变量名称，以避免在全局变量上工作。

function bounds_error(x, xl)
    for i in axes(x,1)
        for j in axes(xl, 1)
            if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
                return true
            end
        end
    end
    return false
end

@views function bounds_error2(x, xl)
    xl1, xl2 = xl[:,1], xl[:,2]
    for row in eachrow(x)
        if any(t->t[1]<t[2], zip(row,xl1)) || any(t->t[1]>t[2], zip(row,xl2))
            return true
        end
    end
    return false
end

# number of rows in xl (or xlimits) will always be equal to number of columns in x
xl = [-5.0  5.0
      -5.0  5.0
      -5.0  5.0]

x = [1.0 2.0 3.0; 
     4.0 5.0 6.0]

@btime bounds_error($x, $xl)  #  8.100 ns (0 allocations: 0 bytes)
@btime bounds_error2($x, $xl) # 10.800 ns (0 allocations: 0 bytes)

The main reason for this difference is memory allocations (0 vs. 12 here).

#  20.645 ns (0 allocations: 0 bytes)
# 347.870 ns (12 allocations: 704 bytes)

Currently, slices in Julia create a copy, so xlt[1,:] and xlt[2,:] allocates memory. To remedy this problem you should use @views. The second issue is the element-wise comparisons row .< xlt[1,:] and row .> xlt[2,:] create a temporary Boolean array. To avoid allocation of a temporary array, you should map any(t->t[1]<t[2], zip(row,xl1)) so that the comparison is done one element at a time like a loop.

After applying these tips, the performance difference on my machine is now about 2ns only, which accounts for the convenience of eachrow, zip, etc. instead of manual loops.

Note, for the first function, you can use axes() to loop over first or second dimension conveniently. And when benchmarking any Julia code with BenchmarkTools.jl, don't forget to interpolate ($) all variable names of a function to avoid working on global variables.

function bounds_error(x, xl)
    for i in axes(x,1)
        for j in axes(xl, 1)
            if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
                return true
            end
        end
    end
    return false
end

@views function bounds_error2(x, xl)
    xl1, xl2 = xl[:,1], xl[:,2]
    for row in eachrow(x)
        if any(t->t[1]<t[2], zip(row,xl1)) || any(t->t[1]>t[2], zip(row,xl2))
            return true
        end
    end
    return false
end

# number of rows in xl (or xlimits) will always be equal to number of columns in x
xl = [-5.0  5.0
      -5.0  5.0
      -5.0  5.0]

x = [1.0 2.0 3.0; 
     4.0 5.0 6.0]

@btime bounds_error($x, $xl)  #  8.100 ns (0 allocations: 0 bytes)
@btime bounds_error2($x, $xl) # 10.800 ns (0 allocations: 0 bytes)

回复收藏 0 原文

~没有更多了~