ARM NEON:哪些指令对必须等待写回?
在 ARM NEON 文档中,它说:
[...]某些指令对可能必须等待,直到值被写回寄存器文件。
我还没有找到一个列表来定义可以使用转发结果的指令对和必须等待写回的指令对。
有谁知道列出这些对的表格或文档?
In the ARM NEON documentation, it says:
[...] some pairs of instructions might have to wait until the value is written back to the register file.
I haven't come across a list that defines the instruction pairs that can use forwarded results and the instruction pairs that have to wait for write back.
Does anyone know of a table or documentation that lists these pairs?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
从广义上讲,您合理期望转发的内容就是转发。 vmul.f32 转发到 vadd.f32 等。
我不相信确切的转发路径以您正在寻找的方式在任何地方精确记录。反正我还没找到他们。如果您确实找到了它们,请务必告诉我们在哪里。当然,对于任何给定的指令对确定是否发生转发并不难,但这不是通用的解决方案。对不起。
Broadly speaking, what you would reasonably expect to forward, forwards. vmul.f32 forwards to vadd.f32 and the like.
I don't believe that the exact forwarding paths are precisely documented anywhere in the manner you're looking for. I haven't found them, anyway. If you do find them, be sure to let us know where. It is, of course, not too hard to determine for any given pair of instructions whether or not forwarding occurs, but that's not a general solution. Sorry.
这些配对超过 9000,但无法全部列出。
例如:
第一条指令在第四个周期写回结果,而第二条指令在第二个周期需要它(q0)作为源,因此由于源尚未准备好,因此存在停顿(或管道“漏洞”) ") 在这两条指令之间。
要计算此档位,您可以使用以下在线工具:
http://pulsar.webshaker.net/ccc/result.php?lng=us
These pairs are over 9000 and they all can't be listed.
For example:
the first instruction writes-back the result in 4th cycle, while the second instruction requires it (q0) as a source in 2nd cycle, so as the source is not ready yet there's a stall (or pipeline "hole") between this two instructions.
To calculate this stalls you can use the following online tool:
http://pulsar.webshaker.net/ccc/result.php?lng=us
整数乘法累加。
http: //infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/ch16s06s03.html 很有帮助:
不要假设浮点乘法累加以相同的方式工作。我没有将浮点 NEON 指令用于任何性能关键的操作,因此我无法在这里提供任何经验,但请确保您阅读并理解 http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/BCGDCECC.html
Integer multiply accumulates.
The section at the end of http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/ch16s06s03.html is helpful:
Don't assume that floating point multiply accumulates work in the same way. I haven't used floating point NEON instructions for anything performance critical so I can't offer any experience here, but make sure you read and understand the note at the end of http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/BCGDCECC.html