如何使用 Verilog 和 FPGA 计算一系列组合电路的传播延迟?
我是 FPGA 和 HDL 的新手,但我正在尝试学习,但无法弄清楚这一点。如何通过多个级别的组合逻辑来计算或估计传播延迟。我只能凭经验确定这一点还是可以在设计时弄清楚。在这种情况下,我使用 FPGA 来实现奇偶校验设置和检查电路。该电路看起来像一个异或门的树形网络,如示例图片所示,但我打算异或 16 个寄存器,因此会有更多级别或异或运算。我希望能够计算每个“级别”异或逻辑的传播延迟,以便我可以确定整个奇偶校验检查和设置操作将花费多少时钟周期分数或多少纳秒。希望我说得有道理。
非常感谢您的帮助。
I'm new to FPGA and HDL but I'm trying to learn and cant figure this out. How can I calculate or estimate propagation delay though several levels of combination logic. Can I only determine this empirically or can I figure it out at design time. In this situation I'm using and FPGA to implement a parity setting and checking circuit. The circuit would look like a tree network of xor gates like the example pictures, except I intent xor 16 registers so there will be more levels or xor operations. I would like to be able to calculate the propagation delay though each "level" xor logic so I can determine how many fractions of clock cycles or how many nanoseconds the entire parity checking and setting operations will take. Hope I'm making sense.
Thanks a lot for the help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您需要“知识”,正如我在“高性能 FPGA 设计的艺术”中所解释的那样。 http://www.fpgacpu.org/log/aug02.html#art “你必须 . ..启动你的工具并设计一些测试电路,然后打开时序分析仪和 FPGA 编辑器并查看结果、延迟(逻辑和路由)往往是什么,等等。”
当你这样做了一段时间之后,你就会看到这类问题,并且就知道(或者有一个很好的想法)。
在这种情况下,例如,我知道在 FPGA 中,16 输入 XOR 将由 4 或 6 输入查找表(4-LUT 或 6-LUT)两层深度的树构建,并且它不能在仅一个 LUT 深度的电路中实现。因此,流水线实现中此类电路的最小延迟将是(按照 Xilinx 时序命名法):
tCKO -- 任何 16 个触发器的时钟到输出延迟
tILO -- 通过第一级 LUT 的延迟
tAS -- 通过第一级 LUT 的延迟第二级 LUTS + 触发器设置时间(假设在同一片中实现)
,对于 Virtex-6速度 -1 我预计该值约为 1.5 ns。
正如其他人所说,组件切换延迟数据位于相关设备的数据表中,但网络路由延迟却没有。事实上,随着时间的推移,您甚至可能开始记住关键延迟,并了解可以使用多少 FPGA 原语(例如 LUT)并且仍然制定特定的时钟周期/时钟频率目标。
无论如何,我只是用一些我编写的一次性 Verilog
和一个简单的 UCF 文件尝试了这一点:
我的设备中的总延迟约为 1.4 ns。亲自尝试一下看看!
以下是静态时序分析器输出的一条路径:
如您所见,数据表中的逻辑延迟仅为 480 ps 左右,而网络布线延迟为 770 ns,时钟偏差等稍多,总计低于 1.3 ns。这实际上比 700 MHz / 1.43 ns 的全局时钟树上的组件开关限制 / Fmax 更快......
所以总而言之,当您尝试一些测试电路并尝试调整它们时,您将获得经验来帮助您估计如何当在 LUT 等 FPGA 原语中实现时,您的电路将运行得更快。
如果这真的很重要,那么通过综合、布局布线和静态时序分析来实现设计是无可替代的。不要忘记添加时序约束来为工具提供一些目标,然后尝试迭代地降低最小时钟周期,直到收敛到最小周期。
快乐黑客!
You need "The Knowledge" as I explain here in "The Art of High Performance FPGA Design". http://www.fpgacpu.org/log/aug02.html#art "You have to ... crank up your tools and design some test circuits, and then open up the timing analyzer and the FPGA editor and pour over what came out, what the latencies (logic and routing) tend to be, etc."
After you do that for a while, you will look at this kind of question, and just know (or have a pretty good idea).
In this case, for example, I know in an FPGA, a 16-input XOR will be built out of a tree of 4- or 6-input lookup tables (4-LUTs or 6-LUTs) two deep, and it cannot be implemented in circuit only one LUT deep. Therefore the minimum delay for such a circuit in a pipelined implementation is going to be (in Xilinx timing nomenclature):
tCKO -- clock to output delay of any of the 16-flip-flops
tILO -- delay through the first level LUTs
tAS -- delay through 2nd level of LUTS + flip-flop setup time assuming implemented in the same slice
and for Virtex-6 speed -1 I would expect this to be ~1.5 ns.
As others have said, the component switching delay data is in the data sheets for your device in question, but the net routing delays are not. Indeed, in time, you may even start to remember the key delays and develop a sense for how many FPGA primitives like LUTs you can use and still make a particular clock period / clock frequency target.
Anyway I just tried this with some throwaway Verilog I coded up:
and a simple UCF file:
and the total delay in my device was about 1.4 ns. Try it for yourself and see!
Here is one path from the static timing analyzer output:
As you can see, the logic delays from the datasheets are only about 480 ps whereas the net routing delays are 770 ns and clock skew etc. is a bit more, total under 1.3 ns. This is actually faster than a component switching limit / Fmax on the global clock tree of 700 MHz / 1.43 ns...
So in summary, as you try some test circuits, and trying tuning them, you will get experience that helps you estimate how fast your circuit will run when implemented in FPGA primitives like LUTs.
And if it really matters, there is no substite for implementing the design through synthesis, place-and-route, and static timing analysis. Don't forget to add timing constraints to give the tools something to target, and then experiment lowering the min clock period iteratively until you converge on a min period.
Happy hacking!
仅当您拥有可提供所有组件的温度、电源电压和制造工艺变化函数的延迟时,您才能通过多个逻辑阶段来估计传播延迟。在 IC 领域,这是使用静态时序分析工具自动完成的。我不确定 FPGA 设计方法。
正如 Oli Charlesworth 提到的,总体延迟还取决于互连线延迟。其他因素包括:输入驱动强度和输出负载。
You can estimate the propagation delays through several stages of logic only if you have timinig models which provide delays as a function of temperature, supply voltage and manufacturing process variation for all of your components. In the IC world, this is done automatically using static timinig analysis tools. I'm not sure about FPGA design methodologies.
As Oli Charlesworth mentions, the overall delay also depends on interconnect wire delays. Other factors are: input drive strength and output load.
理论上,无需编码即可获得 FPGA 中的传播延迟,但这并不容易。
最简单的方法是使用您需要的 IO 信号创建一个非常简单的项目,用 VHDL、Verilog 甚至使用原理图捕获编写代码,综合和布线设计,然后查看该工具生成的报告文件查看实际延误情况。
要了解报告文件中的一些参数,可以查看各FPGA公司提供的“DC and Switching Characteristics”文档。例如,对于 Xilinx 的 Spartan 6 系列器件,它是: http://www .xilinx.com/support/documentation/data_sheets/ds162.pdf
希望这有帮助,
/法哈德
Theoretically it is possible to get the propgation delays in and FPGA without coding, but it is not going to be easy.
the easiest way to do so is to create a very simple project with the IO signals you need, write the code in VHDL, Verilog or even using schematic capture, synthesize and route the design and then look into the report file generated by the tool to see the actual delays.
To understand some of the parameters in the report file, you can look into the "DC and Switching Characteristics" document provided by all FPGA companies. For example, for Spartan 6 family devices from Xilinx, it is: http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf
Hope this helps,
/Farhad
当您在特定平台上进行更多编码时,您会感受到这种感觉,但这是成为一名优秀 RTL 工程师的艺术的一部分。
当您编写代码时,请对其进行模拟和综合。确保您了解综合工具报告的时序路径,并对您所描述的逻辑有一个良好的心理印象。如果您发现自己在时间安排上非常不合时宜,那么您需要重新考虑您的设计,但要尽早进行。没有什么比花时间进行设计、让它工作并通过所有测试,却发现它不够快更糟糕的了。
然后你改变你的目标FPGA或技术库,你就必须重新调整你所有的期望。
It's the kind of thing you get a feel for as you do more coding on a particular platform, but it's part of the art of being a good RTL engineer.
As you write your code, put it through both simulation and synthesis. Make sure you understand the timing paths that the synthesis tool reports, and have a good mental image of the logic you're describing. If you find yourself hugely out with respect to timing, then you need to re-think your design, but do this early. There's nothing worse than spending time on a design, getting it working and passing all it's tests, just to find out it's not fast enough.
Then you change your target FPGA or technology library, and you have to readjust all your expectations.