为什么 Bresenham 直线算法比 Naive 算法更有效

发布于 2024-12-23 06:07:56 字数 645 浏览 3 评论 0原文

在我的图形课程中,我们学习了 Naive 线光栅化算法,然后是 Bresenham 的线绘制算法。我们被告知计算机是整数机器,这就是为什么我们应该使用后者。

  1. 如果我们假设没有软件级别的优化,对于具有 mmx 和其他指令集的现代 CPU 来说也是如此吗?正如我查看过 Intel 的 64-ia-32-architectures-optimization-manual.pdf 一样,与 mmx 的 int 相比,加减乘法的延迟与 float 相同或更好。

  2. 如果算法在 GPU 中执行,这会重要吗?已检查NVIDIA CUDA 编程指南 1.0 (pdf),第 41 页,int 和 float 的时钟周期相同。

  3. 将 float 转换为 int 的效率低下是什么? load-hit-storestall 对我们来说是一个真正的问题吗?

  4. 向上/向下舍入数字的函数的效率如何? (我们可以想到c++ stl中的实现)

  5. 内循环中使用的乘法

For my graphics course we were taught the Naive line rasterizing algorithm then Bresenham's line drawing algorithm. We were told computers are integer machines which is why we should use the latter.

  1. If we assume no optimization on software level is this true for modern cpus with mmx and other instruction sets ? As I have looked at Intel’s 64-ia-32-architectures-optimization-manual.pdf and the latency for addition subtraction multiplication is the same or better for float compared to int for mmx.

  2. If the algorithm is executed in the gpu should this matter ? As having checked NVIDIA CUDA Programming Guide 1.0 (pdf), page 41, the clock cycles for int and float are the same.

  3. What is the de-efficiency of casting float to int? is load-hit-store stall a real issue for us?

  4. How efficient are functions which round up/down numbers? (we can think of the implementation in c++ stl)

  5. Is the efficiency gained from Bresenham algorithm due to addition instead of the multiplication used in the inner loop?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

挽清梦 2024-12-30 06:07:56

将计算机称为整数机器有点误导,但这种观点大多是正确的。据我所知,CPU 使用整数寄存器来生成要读取和写入的内存地址。将线条绘制保留在整数寄存器中意味着您可以避免从其他寄存器复制到整数寄存器以生成线条绘制期间写入像素的内存地址的开销。

至于您的具体问题:

  1. 由于您需要使用通用寄存器来访问内存,因此使用 SSE 或 FPU 来计算内存偏移量(指针)仍然会产生将数据从这些寄存器传输到通用寄存器的开销。所以这取决于从一个寄存器集转移到另一个寄存器集的开销是否大于使用特定指令集的性能。
  2. GPU 往往有统一的寄存器集,因此它应该没有那么重要。
  3. 将 float 转换为 int 本身并不昂贵。开销来自于将数据从一个寄存器组传输到另一组寄存器。通常这必须通过内存来完成,如果您的 CPU 有加载命中存储惩罚,则此传输是其中的一个重要来源。
  4. 向上或向下舍入的性能取决于 CPU 和编译器。在缓慢的一端,MSVC 曾经使用一个函数舍入到零,这会与 FPU 控制字混淆。在快速端,您可以使用特殊的 CPU 指令来直接处理舍入。
  5. Bresenham 的画线算法很快,因为它减少了确定在直线上画点的位置的过程,从简单的 y= m*x + b 公式减少为加法加分支(并且可以通过很好地消除分支)了解无分支整数技术)。 Brensenham 画线算法的运行切片版本甚至可以更快,因为它直接确定具有相同组件的像素的“运行”而不是迭代。

It's a little misleading to call computers integer machines, but the sentiment is mostly true. Since as far as I know, CPUs use integer registers to generate memory addresses to read from and write to. Keeping line drawing in integer registers means you avoid the overhead of copying from other registers into integer registers to generate the memory address to write pixels to during line drawing.

As for your specific questions:

  1. Since you need to use the general purpose registers to access memory, using SSE or the FPU to calculate memory offsets (pointers) will still have the overhead of transferring data from those registers to the general purpose ones. So it depends on whether the overhead of transferring from one register set to another is greater than the performance of using a particular instruction set.
  2. GPUs tend to have a unified register set, so it shouldn't matter nearly as much.
  3. Casting a float to an int in and of itself is not expensive. The overhead comes from transferring data from one register set to another. Usually that has to be done through memory, and if your CPU has load-hit-store penalties, this transfer is a big source of them.
  4. The performance of rounding up or down depends on the CPU and the compiler. On the slow end, MSVC used to use a function to round to zero which mucked with the FPU control word. On the fast end you have special CPU instructions that handle rounding directly.
  5. Bresenham's line drawing algorithm is fast because it reduces determining where to draw points on a line from the naive y= m*x + b formula to an addition plus a branch (and the branch can be eliminated through well know branchless integer techniques). The run-slice version of Brensenham's line drawing algorithm can be even faster as it determines "runs" of pixels with the same component directly rather than iterating.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文