Delphi 2010 中单精度数组到双精度数组的高效转换

发布于 2024-10-14 19:55:30 字数 1073 浏览 7 评论 0原文

我需要使用稍微不同的类型在高级应用程序和低级子系统之间实现一个包装层:

应用程序生成一个单向量数组:

unit unApplication
type

TVector = record
  x, y, z : single;
end;

TvectorArray = array of Tvector;

procedure someFunc(): tvectorArray;
[...]

而子系统需要一个双向量数组。我还实现了从 tvector 到 Tvectord 的类型转换:

unit unSubSystem
type

TVectorD = record
  x, y, z : double;
  class operator Implicit(value : t3dVector):t3dvectorD;inline;
end;

TvectorDArray = array of TvectorD;

procedure otherFunc(points: tvectorDArray);

implementation 
    class operator T3dVecTorD.Implicit(value : t3dVector):t3dvectorD;
begin
  result.x := value.x;
  result.y := value.y;
  result.z := value.z;
end;

我目前正在做的是这样的:

uses unApplication, unsubsystem,...
procedure ConvertValues
var
  singleVecArr : TvectorArray;
  doubleveArr :  TvectorDArray; 
begin
  singleVecArr := somefunc;
  setlength(doubleVecArray, lenght(singlevecArr));
  for i := 0 to length(singlevecArr) -1 do
    doubleVecArray[i] := singleVecArr[i];
end;

是否有更有效的方法来执行这些类型的转换?

I need to implement a wrapper layer between a high level application and a low level sub-system using slightly different typing:

The application produces an array of single vectors:

unit unApplication
type

TVector = record
  x, y, z : single;
end;

TvectorArray = array of Tvector;

procedure someFunc(): tvectorArray;
[...]

while the subsystem expects an array of double vectors. I also implemented typecasting from tvector to Tvectord:

unit unSubSystem
type

TVectorD = record
  x, y, z : double;
  class operator Implicit(value : t3dVector):t3dvectorD;inline;
end;

TvectorDArray = array of TvectorD;

procedure otherFunc(points: tvectorDArray);

implementation 
    class operator T3dVecTorD.Implicit(value : t3dVector):t3dvectorD;
begin
  result.x := value.x;
  result.y := value.y;
  result.z := value.z;
end;

What I am currently doing is like this:

uses unApplication, unsubsystem,...
procedure ConvertValues
var
  singleVecArr : TvectorArray;
  doubleveArr :  TvectorDArray; 
begin
  singleVecArr := somefunc;
  setlength(doubleVecArray, lenght(singlevecArr));
  for i := 0 to length(singlevecArr) -1 do
    doubleVecArray[i] := singleVecArr[i];
end;

Is there a more efficient way to perform these kinds of conversion?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

痴骨ら 2024-10-21 19:55:31

首先我想说的是,在没有第一次计时的情况下,你不应该尝试任何优化。在这种情况下,我并不是指对替代算法进行计时,而是对有问题的代码进行计时并评估在那里花费的总时间的比例。

我的直觉告诉我,您显示的代码将运行总时间的一小部分,因此优化它不会产生明显的好处。我认为,如果您对该数组的每个元素执行任何有意义的操作,那么这一定是正确的,因为与浮点运算相比,从单精度转换为双精度的成本会很小。

最后,如果这段代码可能是瓶颈,您应该考虑根本不转换它。我的假设是您正在使用映射到 8087 FPU 的标准 Delphi 浮点运算。所有此类浮点运算都发生在 8087 浮点堆栈内部。值在输入时转换为 64 位或更高(通常为 80 位)精度。我不认为加载一个单数会比加载一个双数慢——事实上,由于内存读取性能,它甚至可能更快。

First of all I would say that you should not attempt any optimisation without first timing. In this case I don't mean timing alternative algorithms, I mean timing the code in question and assessing what proportion of the total time is spent there.

My instincts tell me that the code you show will run for a tiny proportion of the overall time and so optimising it will yield no discernible benefits. I think if you do anything meaningful with each element of this array then that must be true since the cost of converting from single to double will be small compared to floating point operations.

Finally, if perchance this code is a bottleneck, you should consider not converting it at all. My assumption is that you are using standard Delphi floating point operations which map to the 8087 FPU. All such floating point operations happen inside the 8087 floating point stack. Values are converted on entry to either 64 or more normally 80 bit precision. I don't think it would be any slower to load a single than to load a double – in fact it may even be faster due to memory read performance.

半岛未凉 2024-10-21 19:55:31

假设转换确实是瓶颈,那么加速转换的一种方法可能是使用 SSE# 而不是 FPU,前提是可以假设必要的指令集存在于计算机上该代码将运行。

例如,以下代码会将一个单向量转换为一个双向量:

procedure SingleToDoubleVector (var S: TVector; var D: TVectorD);
// @S in EAX
// @D in EDX
asm
  movups    xmm0, [eax]     ;// Load S in xmm0
  movhlps   xmm1,  xmm0     ;// Copy High 2 singles of xmm0 into xmm1
  cvtps2pd  xmm2,  xmm0     ;// Convert Low two singles of xmm0 into doubles in xmm2
  cvtss2sd  xmm3,  xmm1     ;// Convert Lowes single in xmm1 into double in xmm1
  movupd   [edx],  xmm2     ;// Move two doubles in xmm2 into D (.X and .Y)
  movsd    [edx+16],xmm3    ;// Move one double from xmm3 into D.Z
end;

我并不是说这段代码是最有效的方法,并且一般使用汇编代码(特别是此代码)有许多注意事项。请注意,此代码对记录中字段的对齐方式做出了假设。 (它不会对整个记录的对齐方式做出假设。)

此外,为了获得最佳结果,您可以控制内存中数组/记录元素的对齐方式,并在汇编中编写整个转换循环,以减少开销。这是否是你想要/能做的是另一个问题。

Assuming that the conversion indeed is the bottleneck, then one way of speeding up the conversion may be to use SSE# instead of the FPU, provided the necessary instruction sets can be assumed to be present on the computers on which this code will run.

For instance, the following would convert one single Vector into one double Vector:

procedure SingleToDoubleVector (var S: TVector; var D: TVectorD);
// @S in EAX
// @D in EDX
asm
  movups    xmm0, [eax]     ;// Load S in xmm0
  movhlps   xmm1,  xmm0     ;// Copy High 2 singles of xmm0 into xmm1
  cvtps2pd  xmm2,  xmm0     ;// Convert Low two singles of xmm0 into doubles in xmm2
  cvtss2sd  xmm3,  xmm1     ;// Convert Lowes single in xmm1 into double in xmm1
  movupd   [edx],  xmm2     ;// Move two doubles in xmm2 into D (.X and .Y)
  movsd    [edx+16],xmm3    ;// Move one double from xmm3 into D.Z
end;

I am not saying that this bit of code is the most efficient way to do it and there are many caveats with using assembly code in general and this code in particular. Note that this code makes assumptions about the alignment of the fields in your records. (It does not make assumptions regarding the alignment of the record as a whole.)

Also, for best results, you would control the alignment of your array/record elements in memory and write the entire conversion loop in assembly, to reduce overheads. Whether this is what you want/can do is another question.

傾城如夢未必闌珊 2024-10-21 19:55:31

如果无法修改源代码以生成双精度数而不是单精度数,则可以尝试线程化该过程。尝试将 TArray 分成两个或四个大小相等的块(取决于处理器数量)并让每个线程执行转换。这样做将实现几乎两倍或四倍的速度。

另外,每个循环都会计算“长度”调用吗?也许将其放入变量中以避免计算。

If modifying the source to produce doubles rather than singles is not possible you can try threading out the process. Try dividing the TArray into two or four equal sized chunks (depending on processor count) and have each thread do the conversion. Doing this will realize almost double or quadruple speed.

Also, is the 'length' call calculated each loop? Maybe place that into a variable to avoid the calculation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文