有没有办法减少 Julia Flux 梯度调用的垃圾收集和/或动态调度？

发布于 2025-01-10 20:09:11 字数 728 浏览 5 评论 0原文

有谁知道是否有任何方法可以减少通量中梯度调用的 gc 和/或动态调度量？我尝试使用 FastClosures.jl ，以及将损失包装到可调用结构中以防止闭包以及随后的 Core.Box 调用，但似乎没有什么明显的区别。

MWE：

using Flux

function get_grad(dnn, x, y)
    p = params(dnn)
    g = gradient(p) do
        Flux.mse(dnn(x),y)
    end
    return g
end

const DNN = Dense(3, 2)
const X = rand(Float32, 3)
const Y = rand(Float32, 2)

@profiler for _ in 1:10_000; get_grad(DNN, X, Y); end

配置文件（橙色 - 动态调度，红色 - 垃圾收集，动态调度）

Zygote\src\compiler\interface.jl

原文

Does anyone know if there's any way to reduce the amount of gc and/or dynamic dispatch for a gradient call in flux? I've tried using FastClosures.jl, as well as wrapping loss into a callable struct to prevent closures and consequently Core.Box calls, but nothing seems to make an appreciable difference.

MWE:

using Flux

function get_grad(dnn, x, y)
    p = params(dnn)
    g = gradient(p) do
        Flux.mse(dnn(x),y)
    end
    return g
end

const DNN = Dense(3, 2)
const X = rand(Float32, 3)
const Y = rand(Float32, 2)

@profiler for _ in 1:10_000; get_grad(DNN, X, Y); end

Profile (orange - dynamic dispatch, red - garbage collection, dynamic dispatch)

Zygote\src\compiler\interface.jl

分享到QQ

分享到微博