使用 GNAT 优化 Ada 95 中浮点数数组的数学运算
考虑下面的代码。该代码应该以固定速率(一秒一批)处理数据,它是整个系统的一部分,不会占用太多时间。
当运行超过 100 批每 1 秒的数据时,程序需要 35 秒(或 35%),在循环中执行此函数。测试循环专门使用 Ada.RealTime 进行计时。数据是预先生成的,因此大部分执行时间都在这个循环中。
如何改进代码以将处理时间降至最低?
该代码将在 Intel Pentium-M(带有 SSE2 的 P3)上运行。
package FF is new Ada.Numerics.Generic_Elementary_Functions(Float);
N : constant Integer := 820;
type A is array(1 .. N) of Float;
type A3 is array(1 .. 3) of A;
procedure F(state : in out A3;
result : out A3;
l : in A;
r : in A) is
s : Float;
t : Float;
begin
for i in 1 .. N loop
t := l(i) + r(i);
t := t / 2.0;
state(1)(i) := t;
state(2)(i) := t * 0.25 + state(2)(i) * 0.75;
state(3)(i) := t * 1.0 /64.0 + state(2)(i) * 63.0 /64.0;
for r in 1 .. 3 loop
s := state(r)(i);
t := FF."**"(s, 6.0) + 14.0;
if t > MAX then
t := MAX;
elsif t < MIN then
t := MIN;
end if;
result(r)(i) := FF.Log(t, 2.0);
end loop;
end loop;
end;
用于测试的伪代码
create two arrays of 80 random A3 arrays, called ls and rs;
init the state and result A3 array
record the realtime time now, called last
for i in 1 .. 100 loop
for j in 1 .. 80 loop
F(state, result, ls(j), rs(j));
end loop;
end loop;
record the realtime time now, called curr
output the duration between curr and last
Consider the bellow code. This code is supposed to be processing data at a fixed rate, in one second batches, It is part of an overal system and can't take up too much time.
When running over 100 lots of 1 seconds worth of data the program takes 35 seconds (or 35%), executing this function in a loop. The test loop is timed specifically with Ada.RealTime. The data is pregenerated so the majority of the execution time is definatetly in this loop.
How do I improce the code to get the processing time down to a minimum?
The code will be running on an Intel Pentium-M which is a P3 with SSE2.
package FF is new Ada.Numerics.Generic_Elementary_Functions(Float);
N : constant Integer := 820;
type A is array(1 .. N) of Float;
type A3 is array(1 .. 3) of A;
procedure F(state : in out A3;
result : out A3;
l : in A;
r : in A) is
s : Float;
t : Float;
begin
for i in 1 .. N loop
t := l(i) + r(i);
t := t / 2.0;
state(1)(i) := t;
state(2)(i) := t * 0.25 + state(2)(i) * 0.75;
state(3)(i) := t * 1.0 /64.0 + state(2)(i) * 63.0 /64.0;
for r in 1 .. 3 loop
s := state(r)(i);
t := FF."**"(s, 6.0) + 14.0;
if t > MAX then
t := MAX;
elsif t < MIN then
t := MIN;
end if;
result(r)(i) := FF.Log(t, 2.0);
end loop;
end loop;
end;
psuedocode for testing
create two arrays of 80 random A3 arrays, called ls and rs;
init the state and result A3 array
record the realtime time now, called last
for i in 1 .. 100 loop
for j in 1 .. 80 loop
F(state, result, ls(j), rs(j));
end loop;
end loop;
record the realtime time now, called curr
output the duration between curr and last
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我会在这里支持Marc C(他通常知道他的东西)。我之前曾使用过 Gnat 的 gprof 。设置起来可能很困难,但效果却很出色。如果您愿意,您可以使用它来获取上面代码的每一行使用的运行时间百分比。
我可以提出一些建议(例如预先计算 63.0/64.0),但一个好的优化器应该已经完成了其中的大部分建议。您需要弄清楚在特别消耗 CPU 的线路中它没有做什么,并加快速度。
查看代码,我猜分析器会向您显示求幂和日志操作占用了大部分时间。如果你能找到一种方法来预先计算一些可能有帮助的东西。不过,这有点超前了。 个人资料!
I'll back up Marc C here (he generally knows his stuff). I've used gprof with Gnat before. It can be tough to setup, but it works like a champ. If you like, you can use it to get a % of your runtime used by every line of code above.
I could make some suggestions (like precalculating 63.0/64.0) but a good optimizer should already be doing most of them. You need to figure out what it is not doing in the particularly CPU-consuming lines, and speed up that.
Looking over the code, I'm guessing the profiler will show you that the exponentiation and log operations are chewing up the lion's share of the time. If you could find a way to precalcuate some of that stuff that might help. That's getting ahead of myself though. Profile!
首先让我尝试纠正我的答案:
那应该是 FF."****"(s, 6.0) 和 s ** 6,(不是 FF."*"(s, 6.0) 和 s * 6)我的回答。 [这很奇怪..编辑器仍在尝试从我的文本中删除*。]
我刚刚检查了gad Marc C指向的源代码..,确实如此
做**6!
我只想补充一点,我预计会有一些改进
自己做 s**6,使用 s2 := s*s, 和 s_to_the_6 := s2 * s2 * s2;
——乔纳森
First let me try to correct my answer:
That should be FF."****"(s, 6.0), and s ** 6, (not FF."*"(s, 6.0) and s * 6) in my answer. [That's odd .. the editor is still trying to remove *'s from my text.]
I just checked the source code Marc C pointed to .. by gad, it does
do s ** 6!
I'll add only that I expect that some improvement would come from
doing s**6 yourself, using s2 := s*s, and s_to_the_6 := s2 * s2 * s2;
-- Jonathan
更换可能会更快
t := FF."**"(s, 6.0) + 14.0;
和
t := s ** 6 + 14.0;
浮点求幂可能是用
log 和 exp 。 ——乔纳森
It might be faster to replace the
t := FF."**"(s, 6.0) + 14.0;
with
t := s ** 6 + 14.0;
The floating point exponentiation is probably done with
log's and exp's. -- Jonathan
以下代码改进将运行时间缩短至 8 秒。
然后,添加命令行选项 -O3 和 -mtune=pentium-m 和 -msse2 将运行时间降至 0.8 秒。
我的怀疑是:
程序“**”可能是...
修改后的功能
The following code improvements got the run-time down to 8 seconds.
Then, adding the command line options -O3 and -mtune=pentium-m and -msse2 got that runtime down to 0.8 seconds.
My suspicions are:
procedure "**" may be ...
Revised Function
您可能想考虑使
类型 A3 为 Float 的 array(1 .. N, 1 .. 3) ;
这样,外循环中的每个操作都将访问相邻的内存位置
并且您应该从缓存中获得更好的支持:
使用重命名为
和 随后
可能会给编译器一些更多的优化提示。
You might want to look into making
type A3 is array(1 .. N, 1 .. 3) of Float;
That way, each operation in the outer loop will access neighbouring memory locations
and you should get much better support from the cache:
Using a rename as
and subsequently
might give the compiler some more hints for optimisations.