什么是冗余与非冗余数字格式?
我无法理解该 FPGA 电路中使用的算法。它处理冗余与非冗余数字格式。我已经看到了非冗余格式的一些数学(形式)定义,但我无法真正理解它。
描述该算法的论文摘录:
图 3 显示了可扩展蒙哥马利乘法器的框图。内核包含 p 个 w 位 PE,总共 wp 个位单元。 Z 以进位保存冗余形式存储。如果 PE p 在 PE1 完成 Z^(e-1) 之前完成 Z^0,则结果必须排队,直到 PE1 再次可用。 [5] 中的设计以冗余形式对结果进行排队,每个条目需要 2w 位。对于较大的 n 队列会消耗大量区域,因此我们建议将 Z 转换为非冗余形式以节省一半的队列空间,如图 4 所示。在第一个周期中,< em>Z初始化为0。当不需要排队时,直接绕过进位保存冗余Z'以避免进位传播加法器的延迟。非冗余的Z结果也是系统的输出。
以及图表:
和这是“改进的”PE 框图。这显示了“改进的”PE 框图 -“改进的”与一些不相关的方面有关。
我没有“未改进”的 FIFO 的图片,但我认为它只是直接普通 FIFO。我不明白的是,FIFO 的 CPA 和 3 输入 MUX 是否以某种方式在格式之间进行转换?
了解冗余与非冗余格式(在具体示例中)是第一步,了解该电路如何实现它是第二步。
I'm having trouble understanding the algorithm being used in this FPGA circuit. It deals with redundant versus non-redundant number format. I have seen some mathematical (formal) definitions of non-redundant format but I just can't really grasp it.
Excerpt from this paper describing the algorithm:
Figure 3 shows a block diagram of the scalable Montgomery multiplier. The kernel contains p w-bit PEs for a total of wp bit cells. Z is stored in carry-save redundant form. If PE p completes Z^0 before PE1 has finished Z^(e-1), the result must be queued until PE1 becomes available again. The design in [5] queues the results in redundant form, requiring 2w bits per entry. For large n the queue consumes significant area, so we propose converting Z to nonredundant form to save half the queue space, as shown in Figure 4. On the first cycle, Z is initialized to 0. When no queuing is needed, the carry-save redundant Z' is bypassed directly to avoid the latency of the carry-propagate adder. The nonredundant Z result is also an output of the system.
And the diagrams:
And here is the "improved" PE block diagram. This shows the 'improved' PE block diagram - 'improved' has to do with some unrelated aspects.
I don't have a picture of the 'not improved' FIFO but I think it is just a straight normal FIFO. What I don't understand is, does the FIFO's CPA and 3 input MUX somehow convert between formats?
Understanding redundant versus non-redundant formats (in concrete examples) is the first step, understanding how this circuit achieves it would be step 2..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一些背景知识和对 users.ece.utexas.edu/~adnan/vlsi-05-backup/lec12Datapath.ppt 的了解表明:
进行正确的二进制添加相对较慢和/或占用空间,因为正确传播进位所需的时间。
如果按位并行工作,则可以获取三个二进制数,将每个数中同一位置的位相加,并生成两个二进制数。
幻灯片 27 指出 0001 + 0111 + 1101 = 1011 + 0101(0)。
由于乘法器需要进行大量加法,因此您将加法器树构建为 3 个数字减少到 2 个数字的集合,最终得到两个数字作为输出 abcde....z
和ABCDE...Z0。这是冗余形式的输出,真正的答案实际上是 abcde...z + ABCDE...Z0
A bit of background and a look at users.ece.utexas.edu/~adnan/vlsi-05-backup/lec12Datapath.ppt suggests the following:
Doing a proper binary add is relatively slow and/or area-consuming, because of the time that it takes to propagate the carries properly.
If you work bit-wise in parallel you can take three binary numbers, sum the bits at the same location in each number, and produce two binary numbers.
Slide 27 points out that 0001 + 0111 + 1101 = 1011 + 0101(0).
Since a multiplier needs to do a LOT of additions, you build the adder tree as a collection of reductions of 3 numbers to 2 numbers, eventually ending up with two numbers as output, abcde....z
and ABCDE...Z0. This is your output in redundant form, and the true answer is in fact abcde...z + ABCDE...Z0