为什么绘制小于 1.5 像素粗的线条比绘制 10 像素粗的线条慢两倍?
我只是在玩 FireMonkey,看看图形绘制是否比 GDI 或 Graphics32(我目前选择的库)更快。
为了看看它有多快,我进行了一些测试,但遇到了一些奇怪的行为:
与较粗的线条相比,绘制细线(<1.5 像素宽)似乎非常慢:
- 纵轴:绘制 1000 条线的 cpu
- 周期 横轴:线周期*
结果相当稳定;一旦线条粗细超过 1 像素宽,绘图总是会变得更快。
在其他库中,单线似乎有快速算法,而粗线则较慢,因为首先创建了多边形,那么为什么 FireMonkey 却相反呢?
我主要需要单像素线条,所以我应该以不同的方式绘制线条吗?
测试是使用以下代码运行的:
// draw random lines, and copy result to clipboard, to paste in excel
procedure TForm5.PaintBox1Paint(Sender: TObject; Canvas: TCanvas);
var
i,iWidth:Integer;
p1,p2: TPointF;
sw:TStopWatch;
const
cLineCount=1000;
begin
Memo1.Lines.Clear;
// draw 1000 different widths, from tickness 0.01 to 10
for iWidth := 1 to 1000 do
begin
Caption := IntToStr(iWidth);
Canvas.BeginScene;
Canvas.Clear(claLightgray);
Canvas.Stroke.Kind := TBrushKind.bkSolid;
Canvas.Stroke.Color := $55000000;
Canvas.StrokeThickness :=iWidth/100;
sw := sw.StartNew;
// draw 1000 random lines
for I := 1 to cLineCount do
begin
p1.Create(Random*Canvas.Width,Random*Canvas.Height);
p2.Create(Random*Canvas.Width,Random*Canvas.Height);
Canvas.DrawLine(p1,p2,0.5);
end;
Canvas.EndScene;
sw.Stop;
Memo1.Lines.Add(Format('%f'#9'%d', [Canvas.StrokeThickness, Round(sw.ElapsedTicks / cLineCount)]));
end;
Clipboard.AsText := Memo1.Text;
end;
更新
@Steve Wellens: 事实上,垂直线和水平线要快得多。 水平线和垂直线实际上是有区别的:
对角线:蓝色,水平线:绿色,垂直线:红色
对于垂直线,宽度小于 1 像素的线之间存在明显差异。对于对角线,斜率介于 1.0 和 1.5 之间。
奇怪的是,绘制 1 个像素的水平线和绘制 20 个像素的水平线几乎没有任何区别。我想这就是硬件加速开始发挥作用的地方?
I'm just playing around with FireMonkey to see if graphical painting is any faster than GDI or Graphics32 (my library of choice at the moment).
To see how fast it is, I've performed some tests, but I run into some odd behaviour:
Drawing thin lines (<1.5 pixel wide) seems to be extremely slow compared thicker lines:
- Vertical axis: cpu ticks to paint 1000 lines
- Horizontal axis: line tickness*
The results are quite stable; drawing always becomes much faster once line thickness is more than 1 pixel wide.
In other libraries there seem to be fast algorithms for single lines, and thick lines are slower because a polygon is created first, so why is FireMonkey the other way around?
I mostly need single-pixel lines, so should I paint lines in a different way maybe?
The tests were run with this code:
// draw random lines, and copy result to clipboard, to paste in excel
procedure TForm5.PaintBox1Paint(Sender: TObject; Canvas: TCanvas);
var
i,iWidth:Integer;
p1,p2: TPointF;
sw:TStopWatch;
const
cLineCount=1000;
begin
Memo1.Lines.Clear;
// draw 1000 different widths, from tickness 0.01 to 10
for iWidth := 1 to 1000 do
begin
Caption := IntToStr(iWidth);
Canvas.BeginScene;
Canvas.Clear(claLightgray);
Canvas.Stroke.Kind := TBrushKind.bkSolid;
Canvas.Stroke.Color := $55000000;
Canvas.StrokeThickness :=iWidth/100;
sw := sw.StartNew;
// draw 1000 random lines
for I := 1 to cLineCount do
begin
p1.Create(Random*Canvas.Width,Random*Canvas.Height);
p2.Create(Random*Canvas.Width,Random*Canvas.Height);
Canvas.DrawLine(p1,p2,0.5);
end;
Canvas.EndScene;
sw.Stop;
Memo1.Lines.Add(Format('%f'#9'%d', [Canvas.StrokeThickness, Round(sw.ElapsedTicks / cLineCount)]));
end;
Clipboard.AsText := Memo1.Text;
end;
Update
@Steve Wellens:
Indeed, vertical lines and horizontal lines are a lot faster.
There's actually a difference between horizontal ones and vertical ones:
Diagonal lines: blue, Horizontal lines: green, Vertical lines: red
With vertical lines, there's a sharp difference between lines that are less than 1 pixel wide. With diagonal lines there's a slope between 1.0 and 1.5.
The strange thing is that there's hardly any difference between painting a horizontal line of 1 pixel and painting one of 20 pixels. I guess this is where hardware acceleration starts making a difference?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
摘要: 抗锯齿子像素粗细线是一项艰苦的工作,需要使用许多肮脏的技巧来输出我们直观地期望看到的内容。
您看到的额外工作几乎肯定是由于抗锯齿造成的。当线条粗细小于一个像素并且线条没有正好位于一行设备像素的中心时,为线条绘制的每个像素都将是部分亮度像素。为了确保这些部分值足够亮以使线条不会消失,需要做更多的工作。
由于视频信号在水平扫描上运行(想想 CRT,而不是 LCD),图形操作传统上集中于一次处理一条水平扫描线。
我的猜测是:
为了解决某些棘手的问题,光栅化器有时会“微调”线条,以便更多的虚拟像素与设备像素对齐。如果 0.25 像素厚的水平线恰好位于设备扫描线 A 和 B 之间的中间,则该线可能会完全消失,因为它的注册强度不足以照亮扫描线 A 或 B 中的任何像素。因此,光栅化器可能会微移线在虚拟坐标中“向下”一点点,这样它将与扫描线 B 设备像素对齐并产生一条漂亮的强光水平线。
对于垂直线也可以执行相同的操作,但如果您的显卡/驱动程序过度关注水平扫描线操作(很多都是这样),则可能不会。
因此,在这种情况下,水平线的渲染速度会非常快,因为根本不需要执行抗锯齿功能,而且这一切都可以在一条扫描线上完成。
垂直线需要对穿过该线的每条水平扫描线进行抗锯齿分析。光栅化器可能有垂直线的特殊情况,只考虑左右像素来计算抗锯齿值。
对角线没有捷径。它到处都有锯齿,所以整个过程中需要做大量的抗锯齿工作。抗锯齿计算必须考虑(子采样)目标点周围的整个点矩阵(至少 4 个,可能 8 个),以决定为设备像素提供多少部分值。对于垂直或水平线,可以简化或完全消除矩阵,但对于对角线则不能。
还有一个实际上只关心子像素粗细线的问题:我们如何避免子像素粗细线完全消失或在该线不穿过设备像素的中心时出现明显的间隙?在扫描线上计算抗锯齿值后,如果虚拟线导致没有清晰的“信号”或足够亮的设备像素,则光栅化器可能必须返回并“更加努力”或应用一些增强启发式方法获得更强的信底比,使代表虚拟线的设备像素变得有形且连续。
两个相邻的设备像素亮度为 40% 就可以了。如果扫描线的唯一光栅器输出是 5% 的两个相邻像素,则眼睛将感知到线条中的间隙。不行。
当线的厚度超过 1.5 个设备像素时,每条扫描线上始终至少有一个照明良好的设备像素,无需返回并更加努力。
为什么 1.5 是线条粗细的神奇数字?问毕达哥拉斯。如果您的设备像素的宽度和高度均为 1 个单位,则方形设备像素的对角线长度为 sqrt(1^2 + 1^2) = sqrt(2) = 1.41ish。当您的线条粗细大于设备像素对角线的长度时,无论线条的角度如何,扫描线输出中都应始终有至少一个“光线充足”的像素。
无论如何,这就是我的理论。
Summary: Antialiasing subpixel thickness lines is hard work and requires a number of dirty tricks to output what we intuitively expect to see.
The extra effort you're seeing is almost certainly due to antialiasing. When the line thickness is less than one pixel and the line doesn't sit squarely at the center of a row of device pixels, every pixel drawn for the line will be a partial brightness pixel. To make sure that those partial values are bright enough so that the line doesn't disappear, more work is required.
Since video signals operate on a horizontal sweep (think CRT, not LCD), graphics operations traditionally focus on processing things one horizontal scanline at a time.
Here's my guess:
To solve certain sticky problems, rasterizers sometimes "nudge" lines so that more of their virtual pixels align with device pixels. If a .25 pixel thick horizontal line is exactly half way between device scanline A and B, that line may completely disappear because it doesn't register strongly enough to light up any pixels in scanline A or B. So, the rasterizer might nudge the line "down" a tiny bit in virtual coordinates so that it will align with scanline B device pixels and produce a nice strongly lit horizontal line.
The same can be done for vertical lines, but probably isn't if your graphics card/driver is hyperfocused on horizontal scanline operations (as many are).
So, in this scenario, a horizontal line would render very fast because there's no antialiasing to be performed at all, and it can all be done in one scanline.
A vertical line would require antialiasing analysis for every horizontal scanline that crosses the line. The rasterizer may have a special case for vertical lines to only consider the left and right pixels to calculate antialiasing values.
A diagonal line has no shortcuts. It has jaggies everywhere, so there is plenty of antialiasing work to do throughout. The antialias calculation must consider (subsample) a whole matrix of points (at least 4, probably 8) around the target point to decide how much of a partial value to give the device pixel. The matrix can be simplified or eliminated entirely for vertical or horizontal lines, but not for diagonals.
There is an additional item that is really only a concern for sub-pixel thickness lines: how do we avoid the subpixel thickness line from disappearing entirely or having noticeable gaps where the line does not cross the center of a device pixel? It is likely that after the antialias values are calculated on a scanline, if there is no clear "signal" or sufficiently lit device pixel caused by the virtual line, the rasterizer hast to go back and "try harder" or apply some boosting heuristics to get a stronger signal to floor ratio so that the device pixels representing the virtual line are tangible and continuous.
Two adjacent device pixels at 40% brightness is ok. If the only rasterizer output for the scanline is two adjacent pixels at 5%, the eye will perceive a gap in the line. Not ok.
When the line is more than 1.5 device pixels in thickness, you will always have at least one well lit device pixel on every scanline and don't need to go back and try harder.
Why is 1.5 the magic number for line thickness? Ask Pythagoras. If your device pixel is 1 unit in width and height, then the length of the diagonal of the square device pixel is sqrt(1^2 + 1^2) = sqrt(2) = 1.41ish. When your line thickness is greater than the length of the diagonal of a device pixel, you should always have at least one "well lit" pixel in the scanline output no matter what the angle of the line.
That's my theory, anyway.
在 Graphics32 中,Bresenham 的线条算法用于加速以 1px 宽度绘制的线条,这肯定会很快。 FireMonkey 没有自己的本机光栅化器,而是将绘画操作委托给其他 API(在 Windows 中,它将委托给 Direct2D 或 GDI+。)
您所观察到的实际上是 Direct2D 光栅化器的性能,我可以确认我之前做过类似的观察(我对许多不同的光栅器进行了基准测试。)这是一篇专门讨论 Direct2D 光栅器性能的文章(顺便说一句,细线并不是一般规则绘制速度较慢,尤其是在我自己的光栅器中):
http://www.graphics32 .org/news/newsgroups.php?article_id=10249
从图中可以看出,Direct2D 对于椭圆和粗线的性能非常好,但差很多其他基准测试中的性能(我自己的光栅化器速度更快。)
我实现了一个新的 FireMonkey 后端(一个新的 TCanvas 后代),它依赖于我自己的光栅引擎 VPR。对于细线和文本,它应该比 Direct2D 更快(即使它使用多边形光栅化技术)。为了使其作为 Firemonkey 后端 100% 无缝工作,可能仍然需要解决一些注意事项。更多信息请参见:
http://graphics32.org/news/newsgroups.php?article_id=11565
In Graphics32, Bresenham's line algorithm is used to speed up lines that are drawn with a 1px width and that should definitely be fast. FireMonkey does not have its own native rasterizer, instead it delegates painting operations to other APIs (in Windows, it will delegate to either Direct2D or GDI+.)
What you are observing is in fact the performance of the Direct2D rasterizer and I can confirm that I've made similar observations previously (I've benchmarked many different rasterizers.) Here's a post that talks specifically about the performance of the Direct2D rasterizer (btw, it's not a general rule that thin lines are drawn slower, especially not in my own rasterizer):
http://www.graphics32.org/news/newsgroups.php?article_id=10249
As you can see from the graph, Direct2D has very good performance for ellipses and thick lines, but much worse peformance in the other benchmarks (where my own rasterizer is faster.)
I implemented a new FireMonkey backend (a new TCanvas descendent), that relies on my own rasterizer engine VPR. It should be faster than Direct2D for thin lines and for text (even though it's using polygonal rasterization techniques.) There may still be some caveats that need to be addressed in order to make it work 100% seamlessly as a Firemonkey backend. More info here:
http://graphics32.org/news/newsgroups.php?article_id=11565