神经网络中残留连接的使用是什么?

发布于 2025-02-07 09:36:32 字数 401 浏览 1 评论 0原文

我最近一直在学习自我发明的变形金刚,“您需要的就是您所需要的”论文。在描述论文中使用的神经网络的架构时,本文的一个分解包括有关剩余连接的解释:

“(当然)在编码器和解码器块中使用了残留层连接”” (原点: https://www.kaggle.com/code.com/code /不幸的是,/居民/变压器 - 架构 - 自我注意/笔记本/笔记本

这对我来说并不明显。剩余连接的目的是什么?为什么这应该是标准实践?

I've recently been learning about self-attention transformers and the "Attention is All You Need" paper. When describing the architecture of the neural network used in the paper, one breakdown of the paper included this explanation for residual connections:

"Residual layer connections are used (of course) in both encoder and decoder blocks"
(origin: https://www.kaggle.com/code/residentmario/transformer-architecture-self-attention/notebook)

This was, unfortunately, not obvious to me. What is the purpose of residual connections, and why should this be standard practice?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

楠木可依 2025-02-14 09:36:32

跳过连接没有什么“显而易见的”,这是作为一个社区,我们学到了艰难的方式。基本的前提是,在饲料向前层的神经网络参数化中,很难学习识别功能。跳过连接使此特殊功能(f(x)= x)非常容易学习,这可以提高网络学习稳定性,并在广泛的应用程序中的整体性能,几乎没有额外的计算成本。从本质上讲,您是在不需要时不使用复杂的计算部分的一部分网络的一种简单方法当前对数学的理解!)。

您可以查看旧论文,例如高速公路网络否则,这将是对三角的条件。

There is nothing "obvious" about skip connections, it is something that as a community we learned the hard way. The basic premise is that in neural network parametrisation of feed forward layers, it is surprisingly hard to learn identify function. Skip connections make this special function (f(x)=x) extremely easy to learn, which improves network learning stability, and overall performance in a wide range of applications, at pretty much no extra computational cost. You are essentially giving a network an easy way of not using convoluted, comlpex part of computation when it does not need to, and thus allow us to use complex and big architectures without in depth understanding of the dynamics of the problem (which are beyond our current understanding of math!).

You can look at old-ish papers like highway networks showing how it allows to train very deep models that otherwise would be to ill-conditioned to trian.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文