使用非阻塞赋值进行顺序执行（不是顺序逻辑）

发布于 2025-01-10 10:57:38 字数 3802 浏览 4 评论 0原文

我正在研究 Verilog 设计，在 FSM 中使用 SRAM。由于我想制造 IC，所以我需要稍后对其进行合成。我的问题是，我有一个使用 reg 寄存器的完整工作代码，其中我使用阻塞分配进行并发操作。由于该系统中没有时钟，因此工作正常。现在，我想用基于 SRAM 的存储器替换这些寄存器，这会将时钟引入系统。我的第一个想法是使用非阻塞赋值并将依赖项列表从 always @(*) 更改为 always @ (negedge clk)。

在下面的代码片段中，我想从SRAM（SR4）读取5组数据。所以我要做的就是放置一个计数器，一直计数到 5 (wait_var) 才能发生这种情况。通过引入额外的计数器，该代码确保在第一个时钟沿进入计数器，并在随后的时钟沿从 SRAM 读取五组数据。这种技术适用于像这样的简单逻辑。

    S_INIT_MEM: begin
                    // ******Off-Chip (External) Controller will write the data into SR4. Once data is written, init_data will be raised to 1.******
                    if (init_data == 1'b0) begin
                        CEN4            <= CEN;
                        WEN4            <= WEN;
                        RETN4           <= RETN;
                        EMA4            <= EMA;
                        A4              <= A_in;
                        D4              <= D_in;
                    end
                    else begin
                        CEN4            <= 1'b0;    //SR4 is enabled
                        EMA4            <= 3'b0;    //EMA set to 0
                        WEN4            <= 1'b1;    //SR4 set to read mode
                        RETN4           <= 1'b1;    //SR4 RETN is turned ON
                        A4              <= 8'b0000_0000;
                        if (wait_var < 6) begin
                            if (A4 == 8'b0000_0000 ) begin
                                NUM_DIMENSIONS <= Q4;
                                A4 <= 8'b0000_0001;
                            end
                            if (A4 == 8'b0000_0001 ) begin
                                NUM_PARTICLES <= Q4;
                                A4 <= 8'b0000_0010;
                            end
                            if (A4 == 8'b0000_0010 ) begin
                                n_gd_iterations <= Q4;
                                A4 <= 8'b0000_0011;
                            end
                            if (A4 == 8'b0000_0011 ) begin
                                iterations  <= Q4;
                                A4 <= 8'b0000_0100;
                            end
                            if (A4 == 8'b0000_0100 ) begin
                                threshold_val   <= Q4;
                                A4 <= 8'b0000_0101;
                            end
                            wait_var <= wait_var + 1;
                        end
                        //Variables have been read from SR4
                        if(wait_var == 6) begin
                            CEN4 <= 1'b1;
                            next_state <= S_INIT_PRNG;
                            wait_var <= 0;
                        end
                        else begin
                            next_state <= S_INIT_MEM;
                        end
                    end
                end

然而，当我需要以类似的方式编写复杂的逻辑时，基于计数器的延迟方法变得过于复杂。例如。假设我想从一个 SRAM (SR1) 读取数据并将其写入另一个 SRAM (SR3)。

                    CEN1 = 1'b0;
                    A1 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
                    if (CEN1 == 1'b0) begin
                        CEN3 = 1'b0;
                        WEN3 = 1'b0;
                        A3 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
                        if(WEN3 == 1'b0) begin
                            D3 = Q1;
                            WEN3 = 1'b1;
                            CEN3 = 1'b1;
                        end
                        CEN1 = 1'b1;
                    end

我知道这仍然使用阻塞分配，并且我需要将它们转换为非阻塞分配，但是如果我这样做并且不使用计数器手动引入 1 个时钟周期延迟，它将无法按预期工作。有没有办法以更简单的方式解决这个问题？

任何帮助将不胜感激。

原文

I am working on a Verilog design where I am using SRAM inside a FSM. I need to synthesize it later on since I want to fabricate the IC. My question is that I have a fully working code using reg registers where I use blocking assignment for concurrent operation. Since there is no clock in this system, it works fine. Now, I want to replace these registers with SRAM based memory, which brings in clock into the system. My first thought is to use non-blocking assignment and changing the dependency list from always @(*) to always @ (negedge clk).

In the code snippet below, I want to read 5 sets of data from the SRAM (SR4). So what I do is I place a counter that counts till 5 (wait_var) for this to happen. By introducing additional counter, this code ensures that at 1st clock edge it enters the counter and at subsequent clock edges, the five sets of data is read from SRAM. This technique works for simple logic such as this.

    S_INIT_MEM: begin
                    // ******Off-Chip (External) Controller will write the data into SR4. Once data is written, init_data will be raised to 1.******
                    if (init_data == 1'b0) begin
                        CEN4            <= CEN;
                        WEN4            <= WEN;
                        RETN4           <= RETN;
                        EMA4            <= EMA;
                        A4              <= A_in;
                        D4              <= D_in;
                    end
                    else begin
                        CEN4            <= 1'b0;    //SR4 is enabled
                        EMA4            <= 3'b0;    //EMA set to 0
                        WEN4            <= 1'b1;    //SR4 set to read mode
                        RETN4           <= 1'b1;    //SR4 RETN is turned ON
                        A4              <= 8'b0000_0000;
                        if (wait_var < 6) begin
                            if (A4 == 8'b0000_0000 ) begin
                                NUM_DIMENSIONS <= Q4;
                                A4 <= 8'b0000_0001;
                            end
                            if (A4 == 8'b0000_0001 ) begin
                                NUM_PARTICLES <= Q4;
                                A4 <= 8'b0000_0010;
                            end
                            if (A4 == 8'b0000_0010 ) begin
                                n_gd_iterations <= Q4;
                                A4 <= 8'b0000_0011;
                            end
                            if (A4 == 8'b0000_0011 ) begin
                                iterations  <= Q4;
                                A4 <= 8'b0000_0100;
                            end
                            if (A4 == 8'b0000_0100 ) begin
                                threshold_val   <= Q4;
                                A4 <= 8'b0000_0101;
                            end
                            wait_var <= wait_var + 1;
                        end
                        //Variables have been read from SR4
                        if(wait_var == 6) begin
                            CEN4 <= 1'b1;
                            next_state <= S_INIT_PRNG;
                            wait_var <= 0;
                        end
                        else begin
                            next_state <= S_INIT_MEM;
                        end
                    end
                end

However, when I need to write a complex logic in the similar fashion, the counter based delay method gets too complex. Eg. say I want to read data from one SRAM (SR1) and want to write it to another SRAM (SR3).

                    CEN1 = 1'b0;
                    A1 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
                    if (CEN1 == 1'b0) begin
                        CEN3 = 1'b0;
                        WEN3 = 1'b0;
                        A3 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
                        if(WEN3 == 1'b0) begin
                            D3 = Q1;
                            WEN3 = 1'b1;
                            CEN3 = 1'b1;
                        end
                        CEN1 = 1'b1;
                    end

I know this still uses blocking assignments and I need to convert them to non-blocking assignments, but if I do and I do not introduce 1 clock cycle delay manually using counter, it will not work as desired. Is there a way to get around this in a simpler manner?

Any help would be highly appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往事随风而去 2025-01-17 10:57:38

主要部分是，非阻塞分配只是一个模拟工件，并提供了一种模拟匹配硬件行为的方法。如果使用不当，您可能会出现模拟时间竞赛以及与硬件不匹配的情况。在这种情况下，您的验证工作将归零。

业界有一套通用做法来处理这种情况。一种是对所有顺序设备的输出使用非阻塞分配。这可以避免竞争，并确保顺序触发器和锁存器的行为以与真实硬件中相同的方式传输数据。

因此，由非阻塞分配引起的一个周期延迟是一个神话。如果您在第二个触发器锁存第一个触发器的数据时设计顺序触发器，则数据将在每个周期按顺序跨触发器移动：

        clk ------v----------------v
        in1 -> [flop1] -> out1 -> [flop2] -> out2
clk 1    1                  1                  0 
clk 3    1                  1                  1
clk 4    0                  0                  1
clk 5    0                  0                  0

在上面的示例中，数据在每个下一个时钟周期中从 out1 传播到 out2，可以表示为或者

   always @(posedge clk)
      out1 <= in1;
 
   always @(posedge clk)
      out2 <= out1;

您可以将它们组合起来。

   always @(posedge clk) begin
        out1 <= in1;
        out2 <= out1;
   end

因此，您的设计任务是将顺序逻辑与组合逻辑完全分开，从而将具有阻塞和非阻塞分配的块分开。

在某些情况下，可以而且必须与顺序块内的阻塞赋值一起使用，如注释中所述：如果您使用临时变量来简化顺序块内的表达式，假设这些变量从未在其他任何地方使用过。

除了上述之外，从不在单个always块中混合阻塞和非阻塞分配。

另外，通常由于综合方法的原因，如果不鼓励“negedge”，则使用。除非您的合成方法不在乎，否则请避免使用它。

您应该浏览以获取更多信息以及阻塞/非阻塞分配及其使用的示例。

The main part is, that non-blocking assignments are a simulation only artifact and provides a way for simulation to match hardware behavior. If you use them incorrectly, you might end up with simulation time races and mismatch with hardware. In this case your verification effort goes to null.

There is a set of common practices used in the industry to handle this situation. One is to use non-blocking assignments for outputs of all sequential devices. This avoids races and makes sure that the behavior of sequential flops and latches pipes data the same way as in real hardware.

Hence, one cycle delay caused by the non-blocking assignments is a myth. If you design sequential flops when the second one latches the data from the first, then the data will be moved across flops sequentially every cycle:

        clk ------v----------------v
        in1 -> [flop1] -> out1 -> [flop2] -> out2
clk 1    1                  1                  0 
clk 3    1                  1                  1
clk 4    0                  0                  1
clk 5    0                  0                  0

In the above example data is propagated from out1 to out2 in the every next clock cycle which can be expressed in verilog as

   always @(posedge clk)
      out1 <= in1;
 
   always @(posedge clk)
      out2 <= out1;

Or you can combine those

   always @(posedge clk) begin
        out1 <= in1;
        out2 <= out1;
   end

So, the task of your design is to cleanly separate sequential logic from combinatorial logic and therefore separate blocks with blocking and non-blocking assignments.

There are cases which can and must be used with blocking assignments inside sequential blocks, as mentioned in comments: if you use temporary vars to simplify your expressions inside sequential blocks assuming that those vars are never used anywhere else.

Other than above never mix blocking and non-blocking assignments in a single always block.

Also, usually due to synthesis methodologies, use if 'negedge' is discouraged. Avoid it unless your synthesis methodology does not care.

You should browse around to get more information and example of blocking/non-blocking assignments and their use.

回复收藏 0 原文

~没有更多了~