使用 GNAT 优化 Ada 95 中浮点数数组的数学运算

发布于 2024-08-26 06:18:50 字数 1460 浏览 10 评论 0原文

考虑下面的代码。该代码应该以固定速率(一秒一批)处理数据,它是整个系统的一部分,不会占用太多时间。

当运行超过 100 批每 1 秒的数据时,程序需要 35 秒(或 35%),在循环中执行此函数。测试循环专门使用 Ada.RealTime 进行计时。数据是预先生成的,因此大部分执行时间都在这个循环中。

如何改进代码以将处理时间降至最低?

该代码将在 Intel Pentium-M(带有 SSE2 的 P3)上运行。

package FF is new Ada.Numerics.Generic_Elementary_Functions(Float);

N : constant Integer := 820;
type A is array(1 .. N) of Float;
type A3 is array(1 .. 3) of A;

procedure F(state  : in out A3;
            result :    out A3;
            l      : in     A;
            r      : in     A) is
   s : Float;
   t : Float;
begin
   for i in 1 .. N loop
      t := l(i) + r(i);
      t := t / 2.0;
      state(1)(i) := t;
      state(2)(i) := t * 0.25 + state(2)(i) * 0.75;
      state(3)(i) := t * 1.0 /64.0 + state(2)(i) * 63.0 /64.0;
      for r in 1 .. 3 loop
         s := state(r)(i);
         t := FF."**"(s, 6.0) + 14.0;
         if t > MAX then
            t := MAX;
         elsif t < MIN then
            t := MIN;
         end if;
         result(r)(i) := FF.Log(t, 2.0);
      end loop;
   end loop;
end;

用于测试的伪代码

create two arrays of 80 random A3 arrays, called ls and rs;
init the state and result A3 array
record the realtime time now, called last
for i in 1 .. 100 loop
   for j in 1 .. 80 loop
      F(state, result, ls(j), rs(j));
   end loop;
end loop;
record the realtime time now, called curr
output the duration between curr and last

Consider the bellow code. This code is supposed to be processing data at a fixed rate, in one second batches, It is part of an overal system and can't take up too much time.

When running over 100 lots of 1 seconds worth of data the program takes 35 seconds (or 35%), executing this function in a loop. The test loop is timed specifically with Ada.RealTime. The data is pregenerated so the majority of the execution time is definatetly in this loop.

How do I improce the code to get the processing time down to a minimum?

The code will be running on an Intel Pentium-M which is a P3 with SSE2.

package FF is new Ada.Numerics.Generic_Elementary_Functions(Float);

N : constant Integer := 820;
type A is array(1 .. N) of Float;
type A3 is array(1 .. 3) of A;

procedure F(state  : in out A3;
            result :    out A3;
            l      : in     A;
            r      : in     A) is
   s : Float;
   t : Float;
begin
   for i in 1 .. N loop
      t := l(i) + r(i);
      t := t / 2.0;
      state(1)(i) := t;
      state(2)(i) := t * 0.25 + state(2)(i) * 0.75;
      state(3)(i) := t * 1.0 /64.0 + state(2)(i) * 63.0 /64.0;
      for r in 1 .. 3 loop
         s := state(r)(i);
         t := FF."**"(s, 6.0) + 14.0;
         if t > MAX then
            t := MAX;
         elsif t < MIN then
            t := MIN;
         end if;
         result(r)(i) := FF.Log(t, 2.0);
      end loop;
   end loop;
end;

psuedocode for testing

create two arrays of 80 random A3 arrays, called ls and rs;
init the state and result A3 array
record the realtime time now, called last
for i in 1 .. 100 loop
   for j in 1 .. 80 loop
      F(state, result, ls(j), rs(j));
   end loop;
end loop;
record the realtime time now, called curr
output the duration between curr and last

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

乜一 2024-09-02 06:18:50

我会在这里支持Marc C(他通常知道他的东西)。我之前曾使用过 Gnat 的 gprof 。设置起来可能很困难,但效果却很出色。如果您愿意,您可以使用它来获取上面代码的每一行使用的运行时间百分比。

我可以提出一些建议(例如预先计算 63.0/64.0),但一个好的优化器应该已经完成​​了其中的大部分建议。您需要弄清楚在特别消耗 CPU 的线路中它没有做什么,并加快速度。

查看代码,我猜分析器会向您显示求幂和日志操作占用了大部分时间。如果你能找到一种方法来预先计算一些可能有帮助的东西。不过,这有点超前了。 个人资料

I'll back up Marc C here (he generally knows his stuff). I've used gprof with Gnat before. It can be tough to setup, but it works like a champ. If you like, you can use it to get a % of your runtime used by every line of code above.

I could make some suggestions (like precalculating 63.0/64.0) but a good optimizer should already be doing most of them. You need to figure out what it is not doing in the particularly CPU-consuming lines, and speed up that.

Looking over the code, I'm guessing the profiler will show you that the exponentiation and log operations are chewing up the lion's share of the time. If you could find a way to precalcuate some of that stuff that might help. That's getting ahead of myself though. Profile!

枕头说它不想醒 2024-09-02 06:18:50

首先让我尝试纠正我的答案:

那应该是 FF."****"(s, 6.0) 和 s ** 6,(不是 FF."*"(s, 6.0) 和 s * 6)我的回答。 [这很奇怪..编辑器仍在尝试从我的文本中删除*。]

我刚刚检查了gad Marc C指向的源代码..,确实如此
做**6!

我只想补充一点,我预计会有一些改进
自己做 s**6,使用 s2 := s*s, 和 s_to_the_6 := s2 * s2 * s2;
——乔纳森

First let me try to correct my answer:

That should be FF."****"(s, 6.0), and s ** 6, (not FF."*"(s, 6.0) and s * 6) in my answer. [That's odd .. the editor is still trying to remove *'s from my text.]

I just checked the source code Marc C pointed to .. by gad, it does
do s ** 6!

I'll add only that I expect that some improvement would come from
doing s**6 yourself, using s2 := s*s, and s_to_the_6 := s2 * s2 * s2;
-- Jonathan

野侃 2024-09-02 06:18:50

更换可能会更快
t := FF."**"(s, 6.0) + 14.0;

t := s ** 6 + 14.0;
浮点求幂可能是用
log 和 exp 。 ——乔纳森

It might be faster to replace the
t := FF."**"(s, 6.0) + 14.0;
with
t := s ** 6 + 14.0;
The floating point exponentiation is probably done with
log's and exp's. -- Jonathan

抱猫软卧 2024-09-02 06:18:50

以下代码改进将运行时间缩短至 8 秒。

然后,添加命令行选项 -O3 和 -mtune=pentium-m 和 -msse2 将运行时间降至 0.8 秒。

我的怀疑是:

  • Log(x, y) 被实现为 Log(x) / Log(y) ,如果 y 是常数,那么它会很昂贵。 (7sec)
  • Power(x, y) 作为循环实现;条件和跳转的成本很高。 (20秒)

程序“**”可能是...

r := x; for i in 2 .. y loop; r := r * x; end loop; return r;

修改后的功能

package FF is new Ada.Numerics.Generic_Elementary_Functions(Float); 

N : constant Integer := 820; 
type A is array(1 .. N) of Float; 
type A3 is array(1 .. 3) of A; 

procedure F(state  : in out A3; 
            result :    out A3; 
            l      : in     A; 
            r      : in     A) is 
   -- Keep the Log of 2 so it is not recalculated
   ONE_ON_LOG_TWO : constant Float := 1 / FF.Log(2.0); 
   M1 : constant Float := 1.0 / 64.0;
   M2 : constant Float := 63.0 / 64.0;
   s : Float; 
   t : Float; 
begin 
   for i in 1 .. N loop 
      t := l(i) + r(i); 
      -- Multiply Not Divide
      t := t * 0.5; 
      state(1)(i) := t; 
      state(2)(i) := t * 0.25 + state(2)(i) * 0.75; 
      state(3)(i) := t * M1 + state(2)(i) * M2; 
      for r in 1 .. 3 loop 
         s := state(r)(i);
         -- Since we know the power hared code the multiply.
         t := s * s * s * s * s * s + 14.0; 
         if t > MAX then 
            t := MAX; 
         elsif t < MIN then 
            t := MIN; 
         end if; 
         -- Don't use Log(x,y) in a loop when y is constant. '
         result(r)(i) := FF.Log(t) * ONE_ON_LOG_TWO; 
      end loop; 
   end loop; 
end; 

The following code improvements got the run-time down to 8 seconds.

Then, adding the command line options -O3 and -mtune=pentium-m and -msse2 got that runtime down to 0.8 seconds.

My suspicions are:

  • Log(x, y) is implemented as Log(x) / Log(y) which is expencsive if y is constant. (7sec)
  • Power(x, y) is implemented as a loop; and conditionals and jumps are expensive. (20sec)

procedure "**" may be ...

r := x; for i in 2 .. y loop; r := r * x; end loop; return r;

Revised Function

package FF is new Ada.Numerics.Generic_Elementary_Functions(Float); 

N : constant Integer := 820; 
type A is array(1 .. N) of Float; 
type A3 is array(1 .. 3) of A; 

procedure F(state  : in out A3; 
            result :    out A3; 
            l      : in     A; 
            r      : in     A) is 
   -- Keep the Log of 2 so it is not recalculated
   ONE_ON_LOG_TWO : constant Float := 1 / FF.Log(2.0); 
   M1 : constant Float := 1.0 / 64.0;
   M2 : constant Float := 63.0 / 64.0;
   s : Float; 
   t : Float; 
begin 
   for i in 1 .. N loop 
      t := l(i) + r(i); 
      -- Multiply Not Divide
      t := t * 0.5; 
      state(1)(i) := t; 
      state(2)(i) := t * 0.25 + state(2)(i) * 0.75; 
      state(3)(i) := t * M1 + state(2)(i) * M2; 
      for r in 1 .. 3 loop 
         s := state(r)(i);
         -- Since we know the power hared code the multiply.
         t := s * s * s * s * s * s + 14.0; 
         if t > MAX then 
            t := MAX; 
         elsif t < MIN then 
            t := MIN; 
         end if; 
         -- Don't use Log(x,y) in a loop when y is constant. '
         result(r)(i) := FF.Log(t) * ONE_ON_LOG_TWO; 
      end loop; 
   end loop; 
end; 
苏辞 2024-09-02 06:18:50

您可能想考虑使

类型 A3 为 Float 的 array(1 .. N, 1 .. 3) ;

这样,外循环中的每个操作都将访问相邻的内存位置
并且您应该从缓存中获得更好的支持:

  state(i)(1) := t; 
  state(i)(2) := t * 0.25 + state(i)(2) * 0.75; 
  state(i)(3) := t * M1 + state(i)(2) * M2; 

使用重命名为

  cur_state : array (1..3) of Float renames state(i);

和 随后

  cur_state := (t, t * 0.25 + cur_state(2) * 0.75, t * M1 + cur_state(2) * M2)

可能会给编译器一些更多的优化提示。

You might want to look into making

type A3 is array(1 .. N, 1 .. 3) of Float;

That way, each operation in the outer loop will access neighbouring memory locations
and you should get much better support from the cache:

  state(i)(1) := t; 
  state(i)(2) := t * 0.25 + state(i)(2) * 0.75; 
  state(i)(3) := t * M1 + state(i)(2) * M2; 

Using a rename as

  cur_state : array (1..3) of Float renames state(i);

and subsequently

  cur_state := (t, t * 0.25 + cur_state(2) * 0.75, t * M1 + cur_state(2) * M2)

might give the compiler some more hints for optimisations.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文