ARM NEON:哪些指令对必须等待写回?

发布于 2024-12-19 05:19:02 字数 172 浏览 1 评论 0原文

在 ARM NEON 文档中,它说:

[...]某些指令对可能必须等待,直到值被写回寄存器文件。

我还没有找到一个列表来定义可以使用转发结果的指令对和必须等待写回的指令对。

有谁知道列出这些对的表格或文档?

In the ARM NEON documentation, it says:

[...] some pairs of instructions might have to wait until the value is written back to the register file.

I haven't come across a list that defines the instruction pairs that can use forwarded results and the instruction pairs that have to wait for write back.

Does anyone know of a table or documentation that lists these pairs?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

岛歌少女 2024-12-26 05:19:02

从广义上讲,您合理期望转发的内容就是转发。 vmul.f32 转发到 vadd.f32 等。

我不相信确切的转发路径以您正在寻找的方式在任何地方精确记录。反正我还没找到他们。如果您确实找到了它们,请务必告诉我们在哪里。当然,对于任何给定的指令对确定是否发生转发并不难,但这不是通用的解决方案。对不起。

Broadly speaking, what you would reasonably expect to forward, forwards. vmul.f32 forwards to vadd.f32 and the like.

I don't believe that the exact forwarding paths are precisely documented anywhere in the manner you're looking for. I haven't found them, anyway. If you do find them, be sure to let us know where. It is, of course, not too hard to determine for any given pair of instructions whether or not forwarding occurs, but that's not a general solution. Sorry.

北音执念 2024-12-26 05:19:02

有人知道列出这些对的表格或文档吗?

这些配对超过 9000,但无法全部列出。
例如:

VADD.F32 q0,q0,q1
VMUL.F32 q3,q0,q2

第一条指令在第四个周期写回结果,而第二条指令在第二个周期需要它(q0)作为源,因此由于源尚未准备好,因此存在停顿(或管道“漏洞”) ") 在这两条指令之间。

要计算此档位,您可以使用以下在线工具:
http://pulsar.webshaker.net/ccc/result.php?lng=us

Does anyone know of a table or documentation that lists these pairs?

These pairs are over 9000 and they all can't be listed.
For example:

VADD.F32 q0,q0,q1
VMUL.F32 q3,q0,q2

the first instruction writes-back the result in 4th cycle, while the second instruction requires it (q0) as a source in 2nd cycle, so as the source is not ready yet there's a stall (or pipeline "hole") between this two instructions.

To calculate this stalls you can use the following online tool:
http://pulsar.webshaker.net/ccc/result.php?lng=us

影子的影子 2024-12-26 05:19:02

整数乘法累加。

http: //infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/ch16s06s03.html 很有帮助:

如果乘法累加跟随一个乘法或另一个乘法
乘法累加,并取决于第一个的结果
指令,那么如果两条指令之间的依赖关系是
相同类型和尺寸,处理器使用特殊的乘法器
累加器转发。这种特殊的转发意味着乘以
指令可以连续发出,因为第一个指令的结果
N5中的指令被转发到第二个累加器
N4 中的指令。如果说明书的尺寸和类型不符合
匹配,则 N3 中需要 Dd 或 Qd。这适用于组合
乘法累加指令 VMLA、VMLS、VQDMLA 和
VQDMLS,以及乘法指令 VMUL 和 VQDMUL

不要假设浮点乘法累加以相同的方式工作。我没有将浮点 NEON 指令用于任何性能关键的操作,因此我无法在这里提供任何经验,但请确保您阅读并理解 http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/BCGDCECC.html

Integer multiply accumulates.

The section at the end of http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/ch16s06s03.html is helpful:

If a multiply-accumulate follows a multiply or another
multiply-accumulate, and depends on the result of that first
instruction, then if the dependency between both instructions are of
the same type and size, the processor uses a special multiplier
accumulator forwarding. This special forwarding means the multiply
instructions can issue back-to-back because the result of the first
instruction in N5 is forwarded to the accumulator of the second
instruction in N4. If the size and type of the instructions do not
match, then Dd or Qd is required in N3. This applies to combinations
of the multiply-accumulate instructions VMLA, VMLS, VQDMLA, and
VQDMLS, and the multiply instructions VMUL and VQDMUL

Don't assume that floating point multiply accumulates work in the same way. I haven't used floating point NEON instructions for anything performance critical so I can't offer any experience here, but make sure you read and understand the note at the end of http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/BCGDCECC.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文