EfficientNet 宽度缩放如何影响模型的 FLOPS?
我正在通过 Tan 等人重建 EfficientNet。 (2019)。还有一些我不明白的宽度缩放因子。在论文中,他们试图在给定的资源限制下最大化模型的准确性:
根据上面的等式,超出的 FLOPS 应该是 dwr^2。 尽管如此,在下一节中:
I'm re-building the EfficientNet by Tan et al. (2019). And there's something I don't understand in the width scaling factor. In the paper they tried to maximize the model accuracy for given resource constraints as:
Based on the above equation, the FLOPS exceeds by this should be d.w.r^2.
Nonetheless, in the next section:
So the FLOPS increase by (α.β^2.γ^2)^φ which is equal to d.w^2.r^2
I'm I getting this right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
卷积运算的价值是 V(input) x V(filter) FLOPs。因此,使用形状为 (Hx, Wx, Cx) 的图像和形状为 (Hf(i), Wf(i)) 的第 i 个滤波器。如果所有过滤器的宽度都增加了 w,那么当 i == 1 时,FLOPs 会超过 w。但随后的过滤器中的输入也增加了 w。例如。 X(1) = (Hx, Wx, Cx.w)。因此,从 i == 2 到卷积结束,输入通道的 FLOPs 实际上增加了 w,滤波器通道的 FLOP 增加了 w,这导致了方程中的 w^2。
A convolution operation is worth V(input) x V(filter) FLOPs. So with an image of shape (Hx, Wx, Cx) and ith filter of shape (Hf(i), Wf(i)). If all the filters increased in width by w then with i == 1, the FLOPs exceed by w. But then the input in the followed filters also increased by w. Eg. X(1) = (Hx, Wx, Cx.w). So from i == 2 to the end of the convolutions, the FLOPs actually increase by w for the input channels and w for filter channels, which lead to the w^2 in the equation.