反向模式AD和正向模式AD功能相同吗？

发布于 2025-01-09 07:53:48 字数 842 浏览 1 评论 0原文

你好，我对编程很熟悉，但对 Julia 很陌生，所以答案可能是显而易见的。

前向模式 AD 通常与神经网络的前向传播进行比较，反向模式 AD 与反向传播进行比较，显然你不能用前向传播代替反向传播。正向和反向模式 AD 都计算梯度。但它们的功能是否相同，或者忽略效率，反向模式是否做了正向模式不做的事情？或者，是否存在使用反向模式的应用程序，在忽略效率的情况下，无法使用正向模式？

提出这个问题的原因是该论文 https://richarde.dev/papers/2022/ad/higher- order-ad.pdf 可证明正确、渐近有效、高阶反向模式自动微分 将反向模式 AD 定义为一种在存在大量输入时计算梯度的正确且有效的方法。然而，它们的定义计算与前向模式 AD 相同的函数。

非常感谢对我的理解的任何纠正。

说明：

正向算法表示采用 dx 并计算 df(x) 的算法，反向算法表示采用 df(x) 并计算可能的 dx 的算法。

最容易理解的自动微分算法是称为前向模式自动微分的前向算法，但当输入向量 (x) 较大时效率不高。因此，发明了反向算法，该算法对于大量输入向量非常有效，这些算法被称为反向模式自动微分。这些反向算法似乎更难理解和实现。

论文中引用了一种用于计算自动微分的 Haskel 前向算法，该算法对于大输入向量非常有效。由于效率高，这被称为反向模式自动微分并不是没有道理的。假设它们是正确的并且它们的算法可以为 Julia 程序实现，这是否意味着反向算法不再有用。或者是否有一些用例（不是自动微分）仍然需要逆向算法？

原文

Hi I am old to programming but very new to Julia so the answer may be obvious.

Forward Mode AD is often compared with the forward pass of a neural net, Reverse Mode AD is compared with back propagation and clearly you cannot replace back propagation with a forward pass.
Forward and Reverse Mode AD both compute the gradient. But are they the same function or, ignoring efficiency, is Reverse Mode doing something the Forward Mode is not? Alternatively are there applications using Reverse Mode where, ignoring efficiency, Forward Mode could not be used?

The reason for asking this is that the paper
https://richarde.dev/papers/2022/ad/higher-order-ad.pdf Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation
defines Reverse Mode AD as a correct and efficient way to compute the gradient when there is a large number of inputs. Yet their definition computes the same function as Froward Mode AD.

Any correction of my understanding much appreciated.

Clarification:

Let Forward Algorithm means an algorithm that takes dx and computes df(x) and Reverse Algorithm mean an algorithm the takes df(x) and computes possible dx.

The easiest Automatic Differentiation algorithm to understand is a Forward Algorithm called Forward Mode Automatic Differentiation but is not efficient when there is a large vector of inputs (x). Hence Reverse Algorithms were invented that were efficient with a large vector of inputs these were called Reverse Mode Automatic Differentiation. These Reverse Algorithms appear much harder both to understand and to implement.

In the paper cited a Haskel forward algorithm for computing Automatic Differentiation was given that is efficient with a large vector of inputs. This, because of the efficiency, was not unreasonably called Reverse Mode Automatic Differentiation. On the assumption both that they are correct and that their algorithm can be implemented for Julia programs does it mean that the Reverse Algorithms are no longer useful. Or is there some use cases (not Automatic Differentiation) for which Reverse Algorithms are still needed?

分享到QQ

分享到微博