像素修改代码在主应用程序中运行速度很快,在 Delphi 6 DirectShow 过滤器中运行速度非常慢,并存在其他问题
我有一个 Delphi 6 应用程序,它以每秒 25 帧的速度实时将位图发送到 DirectShow DLL。 DirectShow DLL 也是我的代码,也是使用 DSPACK DirectShow 组件套件在 Delphi 6 中编写的。我有一个简单的代码块,如果设置了某个标志,它会遍历位图中的每个像素,修改图像的亮度和对比度,否则位图将不加修改地推出 DirectShow DLL(推送源视频过滤器)。代码曾经位于主应用程序中,然后我将其移至 DirectShow DLL 中。当它在主应用程序中时,它运行良好。我可以按照预期看到位图的变化。然而,现在代码驻留在 DirectShow DLL 中,它存在以下问题:
当下面的代码块处于活动状态时,DirectShow DLL 非常慢。我有一个四核 i5,它真的很慢。我还可以看到 CPU 消耗大幅增加。相比之下,在主应用程序中运行的完全相同的代码在旧的单核 P4 上运行良好。在那台旧机器上,它确实对 CPU 造成了明显的影响,但视频很流畅,没有任何问题。这些图像的大小只有 352 x 288 像素。
我没有看到可见位图的预期变化。我可以跟踪 DirectShow DLL 中的代码并查看由代码正确更改的每个像素的数值,但图形编辑 ActiveMovie 窗口中的可视图像看起来完全没有变化。
如果我停用代码(我可以实时执行此操作),ActiveMovie 窗口将显示像玻璃一样光滑的视频,几乎无需接触 CPU 即可完美渲染。如果我重新激活代码,视频现在会非常不稳定,可能每秒只显示 1 到 2 帧,在显示第一帧之前有很长的延迟,并且 CPU 会出现峰值。不完全,但比我预期的要多得多。
我尝试编译 DirectShow DLL,其中包含范围检查、溢出检查等所有内容,并且在运行时没有出现警告或错误。然后我尝试以最快的速度进行编译,但仍然存在上面列出的完全相同的问题。确实有什么不对劲,我不知道是什么。请注意,我确实在修改位图之前锁定了画布,并在完成后将其解锁。如果不是我上面提到的“一切都在”编译运行,我会说感觉就像是在每个像素计算中引发并默默地吞噬了 FPU 异常,但正如我所说,没有发生错误或异常。
更新:我将其放在这里,以便嵌入在 Roman R 的评论之一中的解决方案清晰可见。问题是我在访问 ScanLine 属性之前没有将 PixelFormat 属性设置为 pf24Bit 。正如 Roman 建议的那样,不这样做必须使 TBitmap 代码创建位图的临时副本。一旦我添加下面的代码行,问题就消失了,无论是更改不可见还是软页面错误。这是一个潜在的问题,因为唯一受影响的对象是用于访问 ScanLine 属性的指针,因为(假设)它包含指向位图临时副本的指针。这一定是后续 TextOut() 调用仍然有效的原因,因为它作用于位图的原始副本。
clip.PixelFormat := pf24bit; // The missing code line that fixes the problem.
这是我一直提到的代码块:
function IntToByte(i: Integer): Byte;
begin
if i > 255 then
Result := 255
else if i < 0 then
Result := 0
else
Result := i;
end;
// ---------------------------------------------------------------
procedure brightnessTurboBoost(var clip: TBitmap; rangeExpansionPowerOf2: integer; shiftValue: Byte);
var
p0: PByte;
x,y: Integer;
begin
if (rangeExpansionPowerOf2 = 0) and (shiftValue = 0) then
exit; // These parameter settings will not change the pixel values.
for y := 0 to clip.Height-1 do
begin
p0 := clip.scanline[y];
// Can't just do the whole buffer as a big block of bytes since the
// individual scan lines may be padded for CPU alignment.
for x := 0 to (clip.Width - 1) * 3 do
begin
if rangeExpansionPowerOf2 >= 1 then
p0^ := IntToByte((p0^ shl rangeExpansionPowerOf2) + shiftValue)
else
p0^ := IntToByte(p0^ + shiftValue);
Inc(p0);
end;
end;
end;
I have a Delphi 6 application that sends bitmaps to a DirectShow DLL in real-time, 25 frames a second. The DirectShow DLL is my code too and is also written in Delphi 6 using the DSPACK DirectShow component suite. I have a simple block of code that goes through each pixel in the bitmap modifying the brightness and contrast of the image, if a certain flag is set, otherwise the bitmap is pushed out the DirectShow DLL unmodified (push source video filter). The code used to be in the main application and then I just moved it into the DirectShow DLL. When it was in the main application it ran fine. I could see the changes in the bitmap as expected. However, now that the code resides in the DirectShow DLL it has the following problems:
When the code block below is active the DirectShow DLL is really slow. I have a quad core i5 and it's really slow. I can also see a big spike in the CPU consumption. In contrast, the very same code running in the main application ran fine on an old single core P4. It did hit the CPU noticeably on that old machine but the video was smooth and there were no problems. The images are only 352 x 288 pixels in size.
I don't see the expected changes to the visible bitmap. I can trace the code in the DirectShow DLL and see the numerical values of each pixel properly altered by the code, but the viewable image in the Graph Edit ActiveMovie window looks completely unchanged.
If I deactivate the code, which I can do in real-time, the ActiveMovie window shows video that is as smooth as glass, perfectly rendered with the CPU barely touched. If I reactivate the code the video is now really choppy, probably showing only 1 to 2 frames a second with a long delay before the first frame is shown, and the CPU spikes. Not completely, but a lot more than I would expect.
I tried compiling the DirectShow DLL with everything on including range checking, overflow checking, etc. and there were no warnings or errors during run-time. I then tried compiling for fastest speed and it still had the exact same problems listed above. Something is really wrong and I can't figure out what. Note, I do indeed lock the canvas before modifying the bitmap and unlock it after I'm done. If it weren't for the "everything on" compilation run I noted above I'd say it felt like an FPU Exception was being raised and silently swallowed with every pixel computation, but as I said, no errors or Exceptions are occurring.
UPDATE: I am putting this here so that the solution, which is embedded in one of Roman R's comment, is plainly visible. The problem that I was not setting the PixelFormat property to pf24Bit before accessing the ScanLine property. As Roman suggested, not doing this must make the TBitmap code create a temporary copy of the bitmap. As soon as I added the line of code below the problems went away, both that of changes not being visible and the soft page faults. It's an insidious problem because the only object that is affected is the pointer you use to access the ScanLine property, since (assumption) it contains a pointer to a temporary copy of the bitmap. That's must be why the subsequent TextOut() call still worked since it worked on the original copy of the bitmap.
clip.PixelFormat := pf24bit; // The missing code line that fixes the problem.
Here's the code block I've been referring to:
function IntToByte(i: Integer): Byte;
begin
if i > 255 then
Result := 255
else if i < 0 then
Result := 0
else
Result := i;
end;
// ---------------------------------------------------------------
procedure brightnessTurboBoost(var clip: TBitmap; rangeExpansionPowerOf2: integer; shiftValue: Byte);
var
p0: PByte;
x,y: Integer;
begin
if (rangeExpansionPowerOf2 = 0) and (shiftValue = 0) then
exit; // These parameter settings will not change the pixel values.
for y := 0 to clip.Height-1 do
begin
p0 := clip.scanline[y];
// Can't just do the whole buffer as a big block of bytes since the
// individual scan lines may be padded for CPU alignment.
for x := 0 to (clip.Width - 1) * 3 do
begin
if rangeExpansionPowerOf2 >= 1 then
p0^ := IntToByte((p0^ shl rangeExpansionPowerOf2) + shiftValue)
else
p0^ := IntToByte(p0^ + shiftValue);
Inc(p0);
end;
end;
end;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
关于此代码片段,有几件事要说。
首先,您正在使用
TBitmap
类的Scanline
属性。我已经很多年没有与 Delphi 保持一致了,所以我对此可能是错的,但我的印象是Scanline
实际上并不是一个瘦访问器,是吗?它可能在内部隐藏了一些会极大影响性能的东西,例如“如果他想访问图像的位,那么我们必须首先将其转换为 DIB,然后再返回指针”。因此,看起来如此简单的东西可能看起来是一个杀手。“if rangeExpansionPowerOf2 >= 1 then”在内循环体中?你真的不想一直比较这个。要么制作两个单独的函数,要么复制整个循环,而不用零和非零 rangeExpansionPowerOf2 的两个版本,并且只执行一次。
“for ... to (clip.Width - 1) * 3 do” 我不太确定Delphi是否优化了上边界评估以使其仅一次。您可能会对每个像素进行三次乘法,而您只能对整个图像执行一次。
为了获得最佳性能,
IntToByte
肯定是在 MMX 中实现的,以避免 ifs 并一次处理多个字节。尽管如此,正如你所说,图像只有 352x288,我怀疑#1 正在破坏性能。
There are a few things to say about this code snippet.
First of all, you are using
Scanline
property ofTBitmap
class. I have not been dealign with Delphi for many years, so I might be wrong about this but I am under impression thatScanline
is not actually a thin accessor, is it? It might be internally hiding things which can dramatically affect performance, such as "if he wants to access the bits of the image, then we have to first convert it to DIB before returning pointers". So a thing looking so simple might appear to be a killer."if rangeExpansionPowerOf2 >= 1 then" in the inner loop body? You don't really want to compare this all the way. Either make two separate functions or duplicate the whole loop without in two version for zero and non-zero rangeExpansionPowerOf2 and do this if only once.
"for ... to (clip.Width - 1) * 3 do" I am not really sure that Delphi optimizes the upper boundary evaluation to make it only once. You might be doing those multiplication thrice for every pixel, while you could do it only once the whole image.
For top perofrmance
IntToByte
is definitely implemented in MMX to avoid ifs and process multiple bytes at once.Still as you say that images are only 352x288, I would suspect that #1 is ruining the performance.