如何使用 Verilog 和 FPGA 计算一系列组合电路的传播延迟？

发布于 2024-12-27 04:50:02 字数 342 浏览 7 评论 0原文

我是 FPGA 和 HDL 的新手，但我正在尝试学习，但无法弄清楚这一点。如何通过多个级别的组合逻辑来计算或估计传播延迟。我只能凭经验确定这一点还是可以在设计时弄清楚。在这种情况下，我使用 FPGA 来实现奇偶校验设置和检查电路。该电路看起来像一个异或门的树形网络，如示例图片所示，但我打算异或 16 个寄存器，因此会有更多级别或异或运算。我希望能够计算每个“级别”异或逻辑的传播延迟，以便我可以确定整个奇偶校验检查和设置操作将花费多少时钟周期分数或多少纳秒。希望我说得有道理。

奇偶校验设置网络

非常感谢您的帮助。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ι不睡觉的鱼゛ 2025-01-03 04:50:02

您需要“知识”，正如我在“高性能 FPGA 设计的艺术”中所解释的那样。 http://www.fpgacpu.org/log/aug02.html#art “你必须 . ..启动你的工具并设计一些测试电路，然后打开时序分析仪和 FPGA 编辑器并查看结果、延迟（逻辑和路由）往往是什么，等等。”

当你这样做了一段时间之后，你就会看到这类问题，并且就知道（或者有一个很好的想法）。

在这种情况下，例如，我知道在 FPGA 中，16 输入 XOR 将由 4 或 6 输入查找表（4-LUT 或 6-LUT）两层深度的树构建，并且它不能在仅一个 LUT 深度的电路中实现。因此，流水线实现中此类电路的最小延迟将是（按照 Xilinx 时序命名法）：

tCKO -- 任何 16 个触发器的时钟到输出延迟
tILO -- 通过第一级 LUT 的延迟
tAS -- 通过第一级 LUT 的延迟第二级 LUTS + 触发器设置时间（假设在同一片中实现）
加上网络布线延迟

，对于 Virtex-6速度 -1 我预计该值约为 1.5 ns。

正如其他人所说，组件切换延迟数据位于相关设备的数据表中，但网络路由延迟却没有。事实上，随着时间的推移，您甚至可能开始记住关键延迟，并了解可以使用多少 FPGA 原语（例如 LUT）并且仍然制定特定的时钟周期/时钟频率目标。

无论如何，我只是用一些我编写的一次性 Verilog

module t(clk, i, o);
  input clk;
  input [15:0] i;
  output reg o;

  reg [15:0] d;
  always @(posedge clk) begin
    d <= i;
    o <= ^d;
  end
endmodule

和一个简单的 UCF 文件尝试了这一点：

net clk period = 1.5 ns;

我的设备中的总延迟约为 1.4 ns。亲自尝试一下看看！

以下是静态时序分析器输出的一条路径：

Paths for end point o (SLICE_X3Y68.A5), 6 paths
--------------------------------------------------------------------------------
Slack (setup path):     0.198ns (requirement - (data path - clock path skew + uncertainty))
  Source:               d_13 (FF)
  Destination:          o (FF)
  Requirement:          1.500ns
  Data Path Delay:      1.248ns (Levels of Logic = 2)
  Clock Path Skew:      -0.019ns (0.089 - 0.108)
  Source Clock:         clk_BUFGP rising at 0.000ns
  Destination Clock:    clk_BUFGP rising at 1.500ns
  Clock Uncertainty:    0.035ns

  Clock Uncertainty:          0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.070ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.000ns
    Phase Error (PE):           0.000ns

  Maximum Data Path at Slow Process Corner: d_13 to o
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    SLICE_X3Y67.BQ       Tcko                  0.337   d<15>
                                                       d_13
    SLICE_X2Y68.A2       net (fanout=1)        0.590   d<13>
    SLICE_X2Y68.A        Tilo                  0.068   d<11>
                                                       d[15]_reduce_xor_21_xo<0>1
    SLICE_X3Y68.A5       net (fanout=1)        0.180   d[15]_reduce_xor_21_xo<0>
    SLICE_X3Y68.CLK      Tas                   0.073   d<10>
                                                       d[15]_reduce_xor_21_xo<0>3
                                                       o
    -------------------------------------------------  ---------------------------
    Total                                      1.248ns (0.478ns logic, 0.770ns route)
                                                       (38.3% logic, 61.7% route)

如您所见，数据表中的逻辑延迟仅为 480 ps 左右，而网络布线延迟为 770 ns，时钟偏差等稍多，总计低于 1.3 ns。这实际上比 700 MHz / 1.43 ns 的全局时钟树上的组件开关限制 / Fmax 更快......

所以总而言之，当您尝试一些测试电路并尝试调整它们时，您将获得经验来帮助您估计如何当在 LUT 等 FPGA 原语中实现时，您的电路将运行得更快。

如果这真的很重要，那么通过综合、布局布线和静态时序分析来实现设计是无可替代的。不要忘记添加时序约束来为工具提供一些目标，然后尝试迭代地降低最小时钟周期，直到收敛到最小周期。

快乐黑客！

You need "The Knowledge" as I explain here in "The Art of High Performance FPGA Design". http://www.fpgacpu.org/log/aug02.html#art "You have to ... crank up your tools and design some test circuits, and then open up the timing analyzer and the FPGA editor and pour over what came out, what the latencies (logic and routing) tend to be, etc."

After you do that for a while, you will look at this kind of question, and just know (or have a pretty good idea).

In this case, for example, I know in an FPGA, a 16-input XOR will be built out of a tree of 4- or 6-input lookup tables (4-LUTs or 6-LUTs) two deep, and it cannot be implemented in circuit only one LUT deep. Therefore the minimum delay for such a circuit in a pipelined implementation is going to be (in Xilinx timing nomenclature):

tCKO -- clock to output delay of any of the 16-flip-flops
tILO -- delay through the first level LUTs
tAS -- delay through 2nd level of LUTS + flip-flop setup time assuming implemented in the same slice
plus net routing delays

and for Virtex-6 speed -1 I would expect this to be ~1.5 ns.

As others have said, the component switching delay data is in the data sheets for your device in question, but the net routing delays are not. Indeed, in time, you may even start to remember the key delays and develop a sense for how many FPGA primitives like LUTs you can use and still make a particular clock period / clock frequency target.

Anyway I just tried this with some throwaway Verilog I coded up:

module t(clk, i, o);
  input clk;
  input [15:0] i;
  output reg o;

  reg [15:0] d;
  always @(posedge clk) begin
    d <= i;
    o <= ^d;
  end
endmodule

and a simple UCF file:

net clk period = 1.5 ns;

and the total delay in my device was about 1.4 ns. Try it for yourself and see!

Here is one path from the static timing analyzer output:

Paths for end point o (SLICE_X3Y68.A5), 6 paths
--------------------------------------------------------------------------------
Slack (setup path):     0.198ns (requirement - (data path - clock path skew + uncertainty))
  Source:               d_13 (FF)
  Destination:          o (FF)
  Requirement:          1.500ns
  Data Path Delay:      1.248ns (Levels of Logic = 2)
  Clock Path Skew:      -0.019ns (0.089 - 0.108)
  Source Clock:         clk_BUFGP rising at 0.000ns
  Destination Clock:    clk_BUFGP rising at 1.500ns
  Clock Uncertainty:    0.035ns

  Clock Uncertainty:          0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.070ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.000ns
    Phase Error (PE):           0.000ns

  Maximum Data Path at Slow Process Corner: d_13 to o
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    SLICE_X3Y67.BQ       Tcko                  0.337   d<15>
                                                       d_13
    SLICE_X2Y68.A2       net (fanout=1)        0.590   d<13>
    SLICE_X2Y68.A        Tilo                  0.068   d<11>
                                                       d[15]_reduce_xor_21_xo<0>1
    SLICE_X3Y68.A5       net (fanout=1)        0.180   d[15]_reduce_xor_21_xo<0>
    SLICE_X3Y68.CLK      Tas                   0.073   d<10>
                                                       d[15]_reduce_xor_21_xo<0>3
                                                       o
    -------------------------------------------------  ---------------------------
    Total                                      1.248ns (0.478ns logic, 0.770ns route)
                                                       (38.3% logic, 61.7% route)

As you can see, the logic delays from the datasheets are only about 480 ps whereas the net routing delays are 770 ns and clock skew etc. is a bit more, total under 1.3 ns. This is actually faster than a component switching limit / Fmax on the global clock tree of 700 MHz / 1.43 ns...

So in summary, as you try some test circuits, and trying tuning them, you will get experience that helps you estimate how fast your circuit will run when implemented in FPGA primitives like LUTs.

And if it really matters, there is no substite for implementing the design through synthesis, place-and-route, and static timing analysis. Don't forget to add timing constraints to give the tools something to target, and then experiment lowering the min clock period iteratively until you converge on a min period.

Happy hacking!

回复收藏 0 原文

嗫嚅 2025-01-03 04:50:02

仅当您拥有可提供所有组件的温度、电源电压和制造工艺变化函数的延迟时，您才能通过多个逻辑阶段来估计传播延迟。在 IC 领域，这是使用静态时序分析工具自动完成的。我不确定 FPGA 设计方法。

正如 Oli Charlesworth 提到的，总体延迟还取决于互连线延迟。其他因素包括：输入驱动强度和输出负载。

回复收藏 0 原文

许仙没带伞 2025-01-03 04:50:02

理论上，无需编码即可获得 FPGA 中的传播延迟，但这并不容易。

最简单的方法是使用您需要的 IO 信号创建一个非常简单的项目，用 VHDL、Verilog 甚至使用原理图捕获编写代码，综合和布线设计，然后查看该工具生成的报告文件查看实际延误情况。

要了解报告文件中的一些参数，可以查看各FPGA公司提供的“DC and Switching Characteristics”文档。例如，对于 Xilinx 的 Spartan 6 系列器件，它是： http://www .xilinx.com/support/documentation/data_sheets/ds162.pdf

希望这有帮助，
/法哈德

回复收藏 0 原文