使用 GNAT 优化 Ada 95 中浮点数数组的数学运算

发布于 2024-08-26 06:18:50 字数 1460 浏览 10 评论 0原文

考虑下面的代码。该代码应该以固定速率（一秒一批）处理数据，它是整个系统的一部分，不会占用太多时间。

当运行超过 100 批每 1 秒的数据时，程序需要 35 秒（或 35%），在循环中执行此函数。测试循环专门使用 Ada.RealTime 进行计时。数据是预先生成的，因此大部分执行时间都在这个循环中。

如何改进代码以将处理时间降至最低？

该代码将在 Intel Pentium-M（带有 SSE2 的 P3）上运行。

package FF is new Ada.Numerics.Generic_Elementary_Functions(Float);

N : constant Integer := 820;
type A is array(1 .. N) of Float;
type A3 is array(1 .. 3) of A;

procedure F(state  : in out A3;
            result :    out A3;
            l      : in     A;
            r      : in     A) is
   s : Float;
   t : Float;
begin
   for i in 1 .. N loop
      t := l(i) + r(i);
      t := t / 2.0;
      state(1)(i) := t;
      state(2)(i) := t * 0.25 + state(2)(i) * 0.75;
      state(3)(i) := t * 1.0 /64.0 + state(2)(i) * 63.0 /64.0;
      for r in 1 .. 3 loop
         s := state(r)(i);
         t := FF."**"(s, 6.0) + 14.0;
         if t > MAX then
            t := MAX;
         elsif t < MIN then
            t := MIN;
         end if;
         result(r)(i) := FF.Log(t, 2.0);
      end loop;
   end loop;
end;

用于测试的伪代码

create two arrays of 80 random A3 arrays, called ls and rs;
init the state and result A3 array
record the realtime time now, called last
for i in 1 .. 100 loop
   for j in 1 .. 80 loop
      F(state, result, ls(j), rs(j));
   end loop;
end loop;
record the realtime time now, called curr
output the duration between curr and last

原文

Consider the bellow code. This code is supposed to be processing data at a fixed rate, in one second batches, It is part of an overal system and can't take up too much time.

When running over 100 lots of 1 seconds worth of data the program takes 35 seconds (or 35%), executing this function in a loop. The test loop is timed specifically with Ada.RealTime. The data is pregenerated so the majority of the execution time is definatetly in this loop.

How do I improce the code to get the processing time down to a minimum?

The code will be running on an Intel Pentium-M which is a P3 with SSE2.

package FF is new Ada.Numerics.Generic_Elementary_Functions(Float);

N : constant Integer := 820;
type A is array(1 .. N) of Float;
type A3 is array(1 .. 3) of A;

procedure F(state  : in out A3;
            result :    out A3;
            l      : in     A;
            r      : in     A) is
   s : Float;
   t : Float;
begin
   for i in 1 .. N loop
      t := l(i) + r(i);
      t := t / 2.0;
      state(1)(i) := t;
      state(2)(i) := t * 0.25 + state(2)(i) * 0.75;
      state(3)(i) := t * 1.0 /64.0 + state(2)(i) * 63.0 /64.0;
      for r in 1 .. 3 loop
         s := state(r)(i);
         t := FF."**"(s, 6.0) + 14.0;
         if t > MAX then
            t := MAX;
         elsif t < MIN then
            t := MIN;
         end if;
         result(r)(i) := FF.Log(t, 2.0);
      end loop;
   end loop;
end;

psuedocode for testing

create two arrays of 80 random A3 arrays, called ls and rs;
init the state and result A3 array
record the realtime time now, called last
for i in 1 .. 100 loop
   for j in 1 .. 80 loop
      F(state, result, ls(j), rs(j));
   end loop;
end loop;
record the realtime time now, called curr
output the duration between curr and last

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乜一 2024-09-02 06:18:50

我会在这里支持Marc C（他通常知道他的东西）。我之前曾使用过 Gnat 的 gprof 。设置起来可能很困难，但效果却很出色。如果您愿意，您可以使用它来获取上面代码的每一行使用的运行时间百分比。

我可以提出一些建议（例如预先计算 63.0/64.0），但一个好的优化器应该已经完成了其中的大部分建议。您需要弄清楚在特别消耗 CPU 的线路中它没有做什么，并加快速度。

查看代码，我猜分析器会向您显示求幂和日志操作占用了大部分时间。如果你能找到一种方法来预先计算一些可能有帮助的东西。不过，这有点超前了。 个人资料！

回复收藏 0 原文

枕头说它不想醒 2024-09-02 06:18:50

首先让我尝试纠正我的答案：

那应该是 FF."****"(s, 6.0) 和 s ** 6，（不是 FF."*"(s, 6.0) 和 s * 6）我的回答。 [这很奇怪..编辑器仍在尝试从我的文本中删除*。]

我刚刚检查了gad Marc C指向的源代码..，确实如此
做**6！

我只想补充一点，我预计会有一些改进
自己做 s**6，使用 s2 := s*s, 和 s_to_the_6 := s2 * s2 * s2;
——乔纳森

回复收藏 0 原文

野侃 2024-09-02 06:18:50

更换可能会更快
t := FF."**"(s, 6.0) + 14.0;
和
t := s ** 6 + 14.0;
浮点求幂可能是用
log 和 exp 。 ——乔纳森

回复收藏 0 原文

抱猫软卧 2024-09-02 06:18:50

以下代码改进将运行时间缩短至 8 秒。

然后，添加命令行选项 -O3 和 -mtune=pentium-m 和 -msse2 将运行时间降至 0.8 秒。

我的怀疑是：

Log(x, y) 被实现为 Log(x) / Log(y) ，如果 y 是常数，那么它会很昂贵。 (7sec)
Power(x, y) 作为循环实现；条件和跳转的成本很高。（20秒）

程序“**”可能是...

r := x; for i in 2 .. y loop; r := r * x; end loop; return r;

修改后的功能

package FF is new Ada.Numerics.Generic_Elementary_Functions(Float); 

N : constant Integer := 820; 
type A is array(1 .. N) of Float; 
type A3 is array(1 .. 3) of A; 

procedure F(state  : in out A3; 
            result :    out A3; 
            l      : in     A; 
            r      : in     A) is 
   -- Keep the Log of 2 so it is not recalculated
   ONE_ON_LOG_TWO : constant Float := 1 / FF.Log(2.0); 
   M1 : constant Float := 1.0 / 64.0;
   M2 : constant Float := 63.0 / 64.0;
   s : Float; 
   t : Float; 
begin 
   for i in 1 .. N loop 
      t := l(i) + r(i); 
      -- Multiply Not Divide
      t := t * 0.5; 
      state(1)(i) := t; 
      state(2)(i) := t * 0.25 + state(2)(i) * 0.75; 
      state(3)(i) := t * M1 + state(2)(i) * M2; 
      for r in 1 .. 3 loop 
         s := state(r)(i);
         -- Since we know the power hared code the multiply.
         t := s * s * s * s * s * s + 14.0; 
         if t > MAX then 
            t := MAX; 
         elsif t < MIN then 
            t := MIN; 
         end if; 
         -- Don't use Log(x,y) in a loop when y is constant. '
         result(r)(i) := FF.Log(t) * ONE_ON_LOG_TWO; 
      end loop; 
   end loop; 
end;

The following code improvements got the run-time down to 8 seconds.

Then, adding the command line options -O3 and -mtune=pentium-m and -msse2 got that runtime down to 0.8 seconds.

My suspicions are:

Log(x, y) is implemented as Log(x) / Log(y) which is expencsive if y is constant. (7sec)
Power(x, y) is implemented as a loop; and conditionals and jumps are expensive. (20sec)

procedure "**" may be ...

r := x; for i in 2 .. y loop; r := r * x; end loop; return r;

Revised Function

package FF is new Ada.Numerics.Generic_Elementary_Functions(Float); 

N : constant Integer := 820; 
type A is array(1 .. N) of Float; 
type A3 is array(1 .. 3) of A; 

procedure F(state  : in out A3; 
            result :    out A3; 
            l      : in     A; 
            r      : in     A) is 
   -- Keep the Log of 2 so it is not recalculated
   ONE_ON_LOG_TWO : constant Float := 1 / FF.Log(2.0); 
   M1 : constant Float := 1.0 / 64.0;
   M2 : constant Float := 63.0 / 64.0;
   s : Float; 
   t : Float; 
begin 
   for i in 1 .. N loop 
      t := l(i) + r(i); 
      -- Multiply Not Divide
      t := t * 0.5; 
      state(1)(i) := t; 
      state(2)(i) := t * 0.25 + state(2)(i) * 0.75; 
      state(3)(i) := t * M1 + state(2)(i) * M2; 
      for r in 1 .. 3 loop 
         s := state(r)(i);
         -- Since we know the power hared code the multiply.
         t := s * s * s * s * s * s + 14.0; 
         if t > MAX then 
            t := MAX; 
         elsif t < MIN then 
            t := MIN; 
         end if; 
         -- Don't use Log(x,y) in a loop when y is constant. '
         result(r)(i) := FF.Log(t) * ONE_ON_LOG_TWO; 
      end loop; 
   end loop; 
end;

回复收藏 0 原文

苏辞 2024-09-02 06:18:50

您可能想考虑使

类型 A3 为 Float 的 array(1 .. N, 1 .. 3) ；

这样，外循环中的每个操作都将访问相邻的内存位置
并且您应该从缓存中获得更好的支持：

  state(i)(1) := t; 
  state(i)(2) := t * 0.25 + state(i)(2) * 0.75; 
  state(i)(3) := t * M1 + state(i)(2) * M2;

使用重命名为

  cur_state : array (1..3) of Float renames state(i);

和随后

  cur_state := (t, t * 0.25 + cur_state(2) * 0.75, t * M1 + cur_state(2) * M2)

可能会给编译器一些更多的优化提示。

You might want to look into making

type A3 is array(1 .. N, 1 .. 3) of Float;

That way, each operation in the outer loop will access neighbouring memory locations
and you should get much better support from the cache:

  state(i)(1) := t; 
  state(i)(2) := t * 0.25 + state(i)(2) * 0.75; 
  state(i)(3) := t * M1 + state(i)(2) * M2;

Using a rename as

  cur_state : array (1..3) of Float renames state(i);

and subsequently

  cur_state := (t, t * 0.25 + cur_state(2) * 0.75, t * M1 + cur_state(2) * M2)

might give the compiler some more hints for optimisations.

回复收藏 0 原文

~没有更多了~

关于作者

辞别

暂无简介

文章

544 人气

关注发私信

友情链接

文江博客

使用 GNAT 优化 Ada 95 中浮点数数组的数学运算

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

使用 GNAT 优化 Ada 95 中浮点数数组的数学运算

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。