这个 Delphi 6 位图修改代码可以用 SIMD 或其他方法加速吗?
我有一个可以实时修改位图的 Delphi 6 应用程序。目前,我正在使用下面所示的代码来进行快速亮度增强和对比度更改。如果操作只是加法或乘法,我可以看到如何使用 SIMD,但是由于同时涉及加法和乘法,并且还存在 Trunc() 操作将其限制在 a 的范围内字节,我不确定这里是否可以使用SIMD。我的问题如下:
- SIMD 可以与此代码一起使用吗?您知道我可以使用的良好代码示例吗?我可以期待什么样的速度提升?
- 扫描线的(潜在)填充会成为问题吗?
- 有什么关于加速代码的一般优化技巧吗?
;
// A fast version of this function would be to only allow range reductions
// as a power of 2 and then use shl operations instead of divisions.
procedure doBrightnessAndContrast(var clip: tbitmap; compressionRatio: double; shiftValue: Byte);
var
p0: PByte;
x,y: Integer;
begin
for y := 0 to clip.Height-1 do
begin
p0 := clip.scanline[y];
// Can't just do the whole buffer as a big block of bytes since the
// individual scan lines may be padded for CPU alignment.
for x := 0 to clip.Width - 1 do
begin
// Red
p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
Inc(p0);
// Green
p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
Inc(p0);
// Green
p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
Inc(p0);
end;
end;
end;
I have a Delphi 6 application that modifies bitmaps in real time. Currently I am using the code shown below to do quickie brightness boost and contrast changes. If the operation were just an addition or just a multiplication, I could see how SIMD could be used, but since both an addition and a multiplication are involved, and since there is also the Trunc() operation to restrict it to the range of a Byte, I'm not sure if SIMD could be used here. Here are my questions:
- Can SIMD be used with this code and do you know of a good code sample I could work from? What kind of a speed boost could I expect?
- Would the (potential) padding of the scan lines be a problem?
- Any general optimization tips on speeding up the code?
;
// A fast version of this function would be to only allow range reductions
// as a power of 2 and then use shl operations instead of divisions.
procedure doBrightnessAndContrast(var clip: tbitmap; compressionRatio: double; shiftValue: Byte);
var
p0: PByte;
x,y: Integer;
begin
for y := 0 to clip.Height-1 do
begin
p0 := clip.scanline[y];
// Can't just do the whole buffer as a big block of bytes since the
// individual scan lines may be padded for CPU alignment.
for x := 0 to clip.Width - 1 do
begin
// Red
p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
Inc(p0);
// Green
p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
Inc(p0);
// Green
p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue);
Inc(p0);
end;
end;
end;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当然,SSE 或 MMX 是可能的。
然而,在您的情况下,如果您使用方程式预先计算 256 个条目表,则可能会获得几乎相同的速度改进。
然后用简单的表查找替换所有计算。我最好的选择是,在现代处理器上,这将提供与 MMX/SSE 几乎相同的速度。
Sure, SSE or MMX is possible.
In your case however you may get almost the same speed improvement if you precompute a 256 entry table using your equations.
Then replace all computations with a simple table lookup. My best bet is, that on modern processors this will give nearly the same speed as MMX/SSE.