Verilog 中的复杂浮点时序逻辑

发布于 2024-09-08 13:35:39 字数 968 浏览 9 评论 0原文

我正在尝试用 Verilog/SystemVerilog 编写一个可综合的 3D 光栅器。现在的光栅化器并不是真正的 3D 光栅化器:它只接收用于顶点位置的 6 个 32 位浮点数(vertA_pos_x、vertA_pos_y、vertB_pos_x、vertB_pos_y、vertC_pos_x、vertC_pos_y)和用于顶点着色的 9 个 8 位整数(vertA_color_r、vertA_color_g、vertA_color_b) 、vertB_color_r、vertB_color_g、vertB_color_b、vertC_color_r、vertC_color_g、vertC_color_b)。

位置范围为 0.0f ~ 1.0f,0.0f 代表屏幕的上/左侧,0.5f 代表屏幕中间,1.0f 代表下/右侧。

光栅工作首先是计算需要多少光栅线。假设帧缓冲区高度为 240 像素,顶点 A 是顶部顶点,B 是左下顶点,C 是右下顶点,X 是最底部顶点(B 或 C;必须计算),栅格线数由 (vertX_pos_y - vertA_pos_y) / 240 给出

光栅化过程的这一部分足够复杂,足以暴露我的疑虑,所以我将不再解释我将如何进行这里。

现在我想知道的是如何在Verilog中实现这样的“复杂”逻辑(它之所以“复杂”是因为它是顺序的并且需要多个时钟周期,这并不是用硬件设计最令人愉快的一种事情描述语言)。

我正在使用 Altera 的 Quartus,因此我主要对 Altera 解决方案感兴趣。

Quartus 附带的浮点运算宏功能都需要多个时钟周期才能完成,因此,要实现像 (vertX_pos_y - vertA_pos_y) / 240 这样的“简单”计算,我假设一个相当枯燥且容易出错的状态机是必要的。我最大的期望是有人会告诉我我不需要那个,但如果情况并非如此,我仍然想知道人们通常如何设计这样的东西。

另请注意,我对 Verilog 和一般硬件设计非常陌生,所以如果我说了一些愚蠢的话,我很抱歉。有想法吗?

I'm trying to write a synthesizable 3D rasterizer in Verilog/SystemVerilog. The rasterizer right now is not really a 3D rasterizer: it just receives six 32-bits floats for vertex position (vertA_pos_x, vertA_pos_y, vertB_pos_x, vertB_pos_y, vertC_pos_x, vertC_pos_y) and nine 8-bits integers for vertex coloring (vertA_color_r, vertA_color_g, vertA_color_b, vertB_color_r, vertB_color_g, vertB_color_b, vertC_color_r, vertC_color_g, vertC_color_b).

Positions' ranges are 0.0f ~ 1.0f, 0.0f representing the top/left side of the screen, 0.5f the middle of it and 1.0f the bottom/right side.

The raster work would be to, first, count how many raster lines are required. Given that the framebuffer height is 240 pixels, vertex A is the top vertex, B is the bottom-left one, C is the bottom-right one and X is the bottommost vertex (either B or C; this has to be calculated), the number of raster lines is given by (vertX_pos_y - vertA_pos_y) / 240.

This part of the rasterization process is complex enough to expose my doubts, so I'll stop explaining how I would proceed here.

Now what I want to know is how to implement such "complex" logic in Verilog (it is "complex" because it is sequential and takes more than one clock cycle, which is not exactly the most pleasant kind of thing to design with a hardware description language).

I am using Altera's Quartus and so I'm mainly interested in Altera solutions.

The floating-point operation megafunctions that come with Quartus all require more than one clock cycle to finish, so, to implement "simple" calculations like (vertX_pos_y - vertA_pos_y) / 240, I'm assuming a fairly boring-to-write and error-prone state machine is necessary. My biggest expectation is that someone will tell me I don't need that, but if that's not the case, I still would like to know how people generally design things like these.

Also notice that I'm very new to Verilog and hardware design in general, so I'm sorry if I say something stupid. Ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

绮烟 2024-09-15 13:35:39

你听说过管道化吗?这就是数据路径通常的构建方式。

举个例子,假设您想要执行 (a*b) + c,其中 x*y 需要 3 个时钟周期,而 x+y > 需要 1 个时钟周期。流水线只是意味着插入寄存器组来排列延迟。在示例中,输入 c 被延迟以与乘法的延迟相匹配。因此总的来说,该操作将有 3 + 1 = 4 个时钟周期的延迟。

现在,如果您需要进行大量计算,可以将管道延迟“乐高”在一起,这样您就不需要状态机逻辑来安排数学运算。这意味着您必须等待几个周期才能得到答案(即延迟)——这在同步设计中实际上是不可避免的。

Have you heard of pipelining? This is how datapaths are often constructed.

To give an example, say you wanted to do (a*b) + c, where x*y takes 3 clock cycles and x+y takes 1 clock cycle. Pipelining simply means inserting banks of registers to line up the delays. In the example, the input c is delayed to match up with the latency of the multiply. So overall, the operation will have a latency of 3 + 1 = 4 clock cycles.

Now, if you need to do lots of calculations, the pipeline delays can be 'legoed' together so that you don't need state machine logic to schedule your math operations. It will mean that you'll have to wait a few cycles to get your answer (ie latency) - which is unavoidable really in synchronous designs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文