这个 Delphi 6 位图修改代码可以用 SIMD 或其他方法加速吗?

发布于 2024-12-23 06:43:23 字数 1171 浏览 1 评论 0原文

我有一个可以实时修改位图的 Delphi 6 应用程序。目前,我正在使用下面所示的代码来进行快速亮度增强和对比度更改。如果操作只是加法或乘法,我可以看到如何使用 SIMD,但是由于同时涉及加法和乘法,并且还存在 Trunc() 操作将其限制在 a 的范围内字节,我不确定这里是否可以使用SIMD。我的问题如下:

  1. SIMD 可以与此代码一起使用吗?您知道我可以使用的良好代码示例吗?我可以期待什么样的速度提升?
  2. 扫描线的(潜在)填充会成为问题吗?
  3. 有什么关于加速代码的一般优化技巧吗?

;

// A fast version of this function would be to only allow range reductions
//  as a power of 2 and then use shl operations instead of divisions.
procedure doBrightnessAndContrast(var clip: tbitmap; compressionRatio: double; shiftValue: Byte);
var
  p0: PByte;
  x,y: Integer;
begin
  for y := 0 to clip.Height-1 do
  begin
    p0 := clip.scanline[y];

    // Can't just do the whole buffer as a big block of bytes since the
    //  individual scan lines may be padded for CPU alignment.
    for x := 0 to clip.Width - 1 do
    begin
      // Red
      p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
      Inc(p0);
      // Green
      p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
      Inc(p0);
      // Green
      p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
      Inc(p0);
    end;
  end;
end;

I have a Delphi 6 application that modifies bitmaps in real time. Currently I am using the code shown below to do quickie brightness boost and contrast changes. If the operation were just an addition or just a multiplication, I could see how SIMD could be used, but since both an addition and a multiplication are involved, and since there is also the Trunc() operation to restrict it to the range of a Byte, I'm not sure if SIMD could be used here. Here are my questions:

  1. Can SIMD be used with this code and do you know of a good code sample I could work from? What kind of a speed boost could I expect?
  2. Would the (potential) padding of the scan lines be a problem?
  3. Any general optimization tips on speeding up the code?

;

// A fast version of this function would be to only allow range reductions
//  as a power of 2 and then use shl operations instead of divisions.
procedure doBrightnessAndContrast(var clip: tbitmap; compressionRatio: double; shiftValue: Byte);
var
  p0: PByte;
  x,y: Integer;
begin
  for y := 0 to clip.Height-1 do
  begin
    p0 := clip.scanline[y];

    // Can't just do the whole buffer as a big block of bytes since the
    //  individual scan lines may be padded for CPU alignment.
    for x := 0 to clip.Width - 1 do
    begin
      // Red
      p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
      Inc(p0);
      // Green
      p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
      Inc(p0);
      // Green
      p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
      Inc(p0);
    end;
  end;
end;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

单身狗的梦 2024-12-30 06:43:23

当然,SSE 或 MMX 是可能的。

然而,在您的情况下,如果您使用方程式预先计算 256 个条目表,则可能会获得几乎相同的速度改进。

然后用简单的表查找替换所有计算。我最好的选择是,在现代处理器上,这将提供与 MMX/SSE 几乎相同的速度。

Sure, SSE or MMX is possible.

In your case however you may get almost the same speed improvement if you precompute a 256 entry table using your equations.

Then replace all computations with a simple table lookup. My best bet is, that on modern processors this will give nearly the same speed as MMX/SSE.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文