使用 SSE2 的 Delphi 中的内联汇编程序效率低下

发布于 2024-12-12 00:25:07 字数 2639 浏览 2 评论 0原文

我有一个简单的基于浮点的操作,它总是执行两次。所以我尝试将其翻译为 SSE 但失败了。高级语言是Delphi,因此由于它不支持内部函数,所以我必须编写整个内容。 基本上我只有参数加载/卸载以及一些乘法和加法。:

Procedure TLP1Poly2.Process(Const _a1, _b1, _OldIn1, _OldIn2, _OldOut1, _OldOut2:     Double; Var Sample1, Sample2: Double);
Asm
  MOVLPD  XMM4, _a1
  MOVHPD  XMM4, _a1
  MOVLPD  XMM3, _b1
  MOVHPD  XMM3, _b1
  //
  MOVLPD  XMM0, [Sample1]
  MOVHPD  XMM0, [Sample2]
  MULPD   XMM0, XMM4
  //
  MOVLPD  XMM1, _OldIn1
  MOVHPD  XMM1, _OldIn2
  MULPD   XMM1, XMM4
  //
  MOVLPD  XMM2, _OldOut1
  MOVHPD  XMM2, _OldOut2
  MULPD   XMM2, XMM3
  //
  ADDPD   XMM0, XMM1
  ADDPD   XMM0, XMM2
  //
  MOVLPD  [Sample1], XMM0
  MOVHPD  [Sample2], XMM0
  //
  // which stands for twice this:
  // Sample:= Sample*a1 + oldinp*a1 + oldout*b1;
  // 
End;

但是这个过程不起作用,如果我“nop” Sample1/Sample2 加载/保存之间的所有内容,那就可以了,但否则我的过滤器将保持沉默。我在 SSE 中没有得到什么基本的东西?

附录:

旧类 类:

constructor TLP1.create;
begin
  oldfreq := -1 ;
end;
procedure TLp1.process(inp,Frq,SR :single);
begin
  if Frq<>oldfreq then
    begin
      a := 2* SR;
      t := Frq * _ppi;
      n := 1/ (a+t) ;
      b1:= (a - t) * n;
  a1:= t * n;
  oldfreq := frq;
    end;
   outlp   := (inp+_kd)*a1 + oldinp*a1 + oldout*b1;
   oldout  := outlp ;
   oldinp  := inp;
 end;

新类:

Procedure TLP2Poly2.SetSamplerate(Const Value: Single);
Begin
  If Value = FSamplerate Then Exit;
  FSamplerate := Value;
  UpdateCoefficients;
End;

Procedure TLP2Poly2.SetFrequency(Const Value: Single);
Begin
 If Value = FFrequency Then Exit;
  FFrequency := Value;
  UpdateCoefficients;
End;

Procedure TLP2Poly2.UpdateCoefficients;
Var
  a,t,n: Single;
Begin
  a := 2 * FSamplerate ;
  t := FFrequency * 2 * pi;
  n := 1/ (a+t) ;
  b1:= (a - t) * n;
  a1:= t * n;
End;

Procedure TLP2Poly2.Process(Var Sample1, Sample2: Double);
Var
  o1, o2: Double;
Begin
  o1 := Sample1;
  o2 := Sample2;
  IntProcess( a1, b1, OldIn1, OldIn2, OldOut1, OldOut2, Sample1, Sample2);
  OldOut1 := Sample1;
  OldOut2 := Sample2;
  OldIn1  := o1;
  OldIn2  := o2;
End;

Procedure TLP2Poly2.IntProcess(Const _a1, _b1, _OldIn1, _OldIn2, _OldOut1, _OldOut2:    Double; Var Sample1, Sample2: Double);
Asm
  MOVLPD  XMM4, _a1
  MOVHPD  XMM4, _a1
  MOVLPD  XMM3, _b1
  MOVHPD  XMM3, _b1
  //
  MOVLPD  XMM0, [Sample1]
  MOVHPD  XMM0, [Sample2]
  MULPD   XMM0, XMM4
  //
  MOVLPD  XMM1, _OldIn1
  MOVHPD  XMM1, _OldIn2
  MULPD   XMM1, XMM4
  //
  MOVLPD  XMM2, _OldOut1
  MOVHPD  XMM2, _OldOut2
  MULPD   XMM2, XMM3
  //
  ADDPD   XMM0, XMM1
  ADDPD   XMM0, XMM2
  //
  MOVLPD  [Sample1], XMM0
  MOVHPD  [Sample2], XMM0
End;

I have a simple floating-point based operation that is always executed twice. So I've tried to translat it to SSE but it just fails. The high level language is Delphi, so as it doesn't support Intrinsics functions, I have to write the whole thing.
Basically I just have parameter load/unload and some multiplications and addditions.:

Procedure TLP1Poly2.Process(Const _a1, _b1, _OldIn1, _OldIn2, _OldOut1, _OldOut2:     Double; Var Sample1, Sample2: Double);
Asm
  MOVLPD  XMM4, _a1
  MOVHPD  XMM4, _a1
  MOVLPD  XMM3, _b1
  MOVHPD  XMM3, _b1
  //
  MOVLPD  XMM0, [Sample1]
  MOVHPD  XMM0, [Sample2]
  MULPD   XMM0, XMM4
  //
  MOVLPD  XMM1, _OldIn1
  MOVHPD  XMM1, _OldIn2
  MULPD   XMM1, XMM4
  //
  MOVLPD  XMM2, _OldOut1
  MOVHPD  XMM2, _OldOut2
  MULPD   XMM2, XMM3
  //
  ADDPD   XMM0, XMM1
  ADDPD   XMM0, XMM2
  //
  MOVLPD  [Sample1], XMM0
  MOVHPD  [Sample2], XMM0
  //
  // which stands for twice this:
  // Sample:= Sample*a1 + oldinp*a1 + oldout*b1;
  // 
End;

but this procedure doesn't work, If I 'nop' everything between Sample1/Sample2 loading/saving it's ok but otherwise my filter is silent. What is the basic thing I don't get with SSE in this ?

Addenum:

old class class:

constructor TLP1.create;
begin
  oldfreq := -1 ;
end;
procedure TLp1.process(inp,Frq,SR :single);
begin
  if Frq<>oldfreq then
    begin
      a := 2* SR;
      t := Frq * _ppi;
      n := 1/ (a+t) ;
      b1:= (a - t) * n;
  a1:= t * n;
  oldfreq := frq;
    end;
   outlp   := (inp+_kd)*a1 + oldinp*a1 + oldout*b1;
   oldout  := outlp ;
   oldinp  := inp;
 end;

New class:

Procedure TLP2Poly2.SetSamplerate(Const Value: Single);
Begin
  If Value = FSamplerate Then Exit;
  FSamplerate := Value;
  UpdateCoefficients;
End;

Procedure TLP2Poly2.SetFrequency(Const Value: Single);
Begin
 If Value = FFrequency Then Exit;
  FFrequency := Value;
  UpdateCoefficients;
End;

Procedure TLP2Poly2.UpdateCoefficients;
Var
  a,t,n: Single;
Begin
  a := 2 * FSamplerate ;
  t := FFrequency * 2 * pi;
  n := 1/ (a+t) ;
  b1:= (a - t) * n;
  a1:= t * n;
End;

Procedure TLP2Poly2.Process(Var Sample1, Sample2: Double);
Var
  o1, o2: Double;
Begin
  o1 := Sample1;
  o2 := Sample2;
  IntProcess( a1, b1, OldIn1, OldIn2, OldOut1, OldOut2, Sample1, Sample2);
  OldOut1 := Sample1;
  OldOut2 := Sample2;
  OldIn1  := o1;
  OldIn2  := o2;
End;

Procedure TLP2Poly2.IntProcess(Const _a1, _b1, _OldIn1, _OldIn2, _OldOut1, _OldOut2:    Double; Var Sample1, Sample2: Double);
Asm
  MOVLPD  XMM4, _a1
  MOVHPD  XMM4, _a1
  MOVLPD  XMM3, _b1
  MOVHPD  XMM3, _b1
  //
  MOVLPD  XMM0, [Sample1]
  MOVHPD  XMM0, [Sample2]
  MULPD   XMM0, XMM4
  //
  MOVLPD  XMM1, _OldIn1
  MOVHPD  XMM1, _OldIn2
  MULPD   XMM1, XMM4
  //
  MOVLPD  XMM2, _OldOut1
  MOVHPD  XMM2, _OldOut2
  MULPD   XMM2, XMM3
  //
  ADDPD   XMM0, XMM1
  ADDPD   XMM0, XMM2
  //
  MOVLPD  [Sample1], XMM0
  MOVHPD  [Sample2], XMM0
End;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

裂开嘴轻声笑有多痛 2024-12-19 00:25:07

当为 Delphi 编写汇编程序时,尤其是在 64 位模式下,您应该始终了解参数是如何传递的。我从不使用前 4 个参数的名称,因为它们无论如何都在寄存器中。我直接使用这些寄存器。

请注意,_a1_b1_oldIn1_oldIn2XMM0 中传递 - < em>XMM3 分别,因此代码的第一部分会覆盖其中一些寄存器。例如,使用 _b1 加载 XMM3 将覆盖 _oldIn2XMM2 也会发生同样的情况,其中包含 _oldIn1

重新安排寄存器的使用是有意义的,这样您就不必使用内存存储作为中间媒介。

IOW,尝试类似的东西(未经测试):

asm
        MOVDDUP XMM0,XMM0
        MOVDDUP XMM1,XMM1

        MOVLPD  XMM4,[Sample1]
        MOVHPD  XMM4,[Sample2]
        MULPD   XMM4,XMM0

        // etc...

When writing assembler for Delphi, especially in 64 bit mode, you should always be aware of how parameters are passed. I never use the names of the first 4 parameters, as these are in registers anyway. I use these registers directly.

Note that _a1, _b1, _oldIn1 and _oldIn2 are passed in XMM0 - XMM3 respectively, so the first part of your code overwrites some of these registers. For instance, loading XMM3 with _b1 would overwrite _oldIn2. The same happens with XMM2, which holds _oldIn1.

It would make sense to rearrange your register usage so you don't have to use memory storage as an inbetween.

IOW, try something like (untested):

asm
        MOVDDUP XMM0,XMM0
        MOVDDUP XMM1,XMM1

        MOVLPD  XMM4,[Sample1]
        MOVHPD  XMM4,[Sample2]
        MULPD   XMM4,XMM0

        // etc...
祁梦 2024-12-19 00:25:07

在 Delphi 中,有一个调试器窗格(“FPU”),它显示 SSE 寄存器。因此,如果您向过滤器提供一些非零值,您应该能够找到静默输出的来源。

In Delphi there's a debugger pane ("FPU") which shows the SSE registers. So if you feed your filter some non-zero values you should be able to find where the silent output comes from.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文