内联函数仍然显示在 .prof 文件中

发布于 2024-10-11 18:01:09 字数 1995 浏览 3 评论 0原文

我正在尝试找出如何优化一些代码。这是:


{-# OPTIONS_GHC -funbox-strict-fields #-}

data Vec3 a = Vec3  !a !a !a

vx :: Vec3 a -> a
vx (Vec3 x _ _) = x
{-# SPECIALIZE INLINE vx :: Vec3 Double -> Double #-}

vy :: Vec3 a -> a
vy (Vec3 _ y _) = y
{-# SPECIALIZE INLINE vy :: Vec3 Double -> Double #-}

vz :: Vec3 a -> a
vz (Vec3 _ _ z) = z
{-# SPECIALIZE INLINE vz :: Vec3 Double -> Double #-}


dot :: (Num a) => Vec3 a -> Vec3 a -> a
dot u v = (vx u * vx v) + (vy u * vy v) + (vz u * vz v)
{-# SPECIALIZE INLINE dot :: Vec3 Double -> Vec3 Double -> Double #-}


type Vec3D = Vec3 Double

-- just make a bunch of vecs to measure performance

n = 1000000 :: Double

v1s = [Vec3 x y z | (x, y, z) <- zip3 [1 .. n] [2 .. n + 1] [3 .. n + 2]]
      :: [Vec3D]

v2s = [Vec3 x y z | (x, y, z) <- zip3 [3 .. n + 2] [2 .. n + 1] [1 .. n]]
      :: [Vec3D]


dots = zipWith dot v1s v2s  :: [Double]    
theMax = maximum dots :: Double
main :: IO ()
main = putStrLn $ "theMax: " ++ show theMax

当我使用 ghc 6.12.1(i486 机器上的 ubuntu linux)进行编译时

ghc --make -O2 Vec.hs -prof -auto-all -fforce-recomp

并运行

Vec +RTS -p

查看 Vec .prof 文件中,


COST CENTRE                    MODULE               %time %alloc

v2s                            Main                  30.9   36.5
v1s                            Main                  27.9   31.3
dots                           Main                  27.2   27.0
CAF                            GHC.Float              4.4    5.2
vy                             Main                   3.7    0.0
vx                             Main                   2.9    0.0
theMax                         Main                   2.2    0.0

我发现函数 vx 和 vy 占用了很大一部分时间。

这是为什么?我认为 SPECIALIZE INLINE pragma 会使 这些功能消失了。

当使用非多态时,

data Vec3D = Vec3D {vx, vy, vz :: !Double} deriving Show

函数 vx、vy、vz 不显示为成本中心。

I'm trying to figure out how to optimize some code. Here it is:


{-# OPTIONS_GHC -funbox-strict-fields #-}

data Vec3 a = Vec3  !a !a !a

vx :: Vec3 a -> a
vx (Vec3 x _ _) = x
{-# SPECIALIZE INLINE vx :: Vec3 Double -> Double #-}

vy :: Vec3 a -> a
vy (Vec3 _ y _) = y
{-# SPECIALIZE INLINE vy :: Vec3 Double -> Double #-}

vz :: Vec3 a -> a
vz (Vec3 _ _ z) = z
{-# SPECIALIZE INLINE vz :: Vec3 Double -> Double #-}


dot :: (Num a) => Vec3 a -> Vec3 a -> a
dot u v = (vx u * vx v) + (vy u * vy v) + (vz u * vz v)
{-# SPECIALIZE INLINE dot :: Vec3 Double -> Vec3 Double -> Double #-}


type Vec3D = Vec3 Double

-- just make a bunch of vecs to measure performance

n = 1000000 :: Double

v1s = [Vec3 x y z | (x, y, z) <- zip3 [1 .. n] [2 .. n + 1] [3 .. n + 2]]
      :: [Vec3D]

v2s = [Vec3 x y z | (x, y, z) <- zip3 [3 .. n + 2] [2 .. n + 1] [1 .. n]]
      :: [Vec3D]


dots = zipWith dot v1s v2s  :: [Double]    
theMax = maximum dots :: Double
main :: IO ()
main = putStrLn $ "theMax: " ++ show theMax

When I compile with ghc 6.12.1 (ubuntu linux on an i486 machine)

ghc --make -O2 Vec.hs -prof -auto-all -fforce-recomp

and run

Vec +RTS -p

Looking at the Vec.prof file,


COST CENTRE                    MODULE               %time %alloc

v2s                            Main                  30.9   36.5
v1s                            Main                  27.9   31.3
dots                           Main                  27.2   27.0
CAF                            GHC.Float              4.4    5.2
vy                             Main                   3.7    0.0
vx                             Main                   2.9    0.0
theMax                         Main                   2.2    0.0

I see that the function vx and vy take a significant portion of the time.

Why is that? I thought that the SPECIALIZE INLINE pragma would make
those functions go away.

When using a non-polymorphic

data Vec3D = Vec3D {vx, vy, vz :: !Double} deriving Show

the functions vx, vy, vz do not show as a cost center.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

懒猫 2024-10-18 18:01:09

我怀疑这是使用 -auto-all 的副作用,它会抑制 GHC 通常执行的许多优化,包括内联。我怀疑非多态版本的差异实际上是由于 vxvyvz 是通过记录语法定义的,而不是因为多态性(但我可能是错的)。

不要使用 -auto-all,而是尝试将导出列表添加到模块并使用“-auto”进行编译,或者通过 SCC 编译指示手动设置成本中心。无论如何,我通常使用 SCC 编译指示,因为我经常想将它们设置在 let 绑定函数上,而 -auto-all 则不会这样做。

I suspect this is a side-effect of using -auto-all, which inhibits many optimizations GHC would normally perform, including inlining. I suspect the difference in your non-polymorphic version is actually due to vx, vy, and vz being defined via record syntax rather than because of polymorphism (but I could be wrong about this).

Instead of using -auto-all, try either adding an export list to the module and compiling with "-auto", or manually setting cost centers via SCC pragmas. I usually use SCC pragmas anyway because I often want to set them on let-bound functions, which -auto-all won't do.

寒江雪… 2024-10-18 18:01:09

我不知道如何对回复发表评论,所以我在这个答案中发表评论。

首先,感谢您的回答。

FUZxxl:我尝试了 -ddump-core,并收到一条错误消息,指出 -ddump-core 是一个无法识别的标志。也许您的意思是 -ddump-simpl,《Real World Haskell》一书推荐使用它,但恐怕我不知道如何读取输出。我在输出文件中查找“vx”等,但从未看到它们。我想我应该学习如何阅读核心。有什么好的指南吗?

John:根据 GHC 的标记参考文档< /a>,如果我没看错的话,-auto 和 -auto-all 都应该将 _scc_s 添加到标记为 INLINE 的函数中。为了查看 -auto 是否适合我,我创建了另一个测试用例,其中 Vec3 代码位于单独的文件/模块中,并导出了 Vec3(Vec3)、vx、vy、vz 和 dot。我将此模块导入到 Main.hs 文件中。使用 -auto 编译这些,我仍然在 .prof 文件中看到 vx、vy、vz。

回复:您的评论认为差异可能是由于记录语法而不是多态性造成的,我相信差异更有可能是由于多态性造成的,因为当我定义

data Vec3 a = Vec3 {vx, vy, vz :: !a}

vx、vy 和 vz 时仍然出现在 .prof 文件中。

泰德

I could not figure out how to make comments to the replies, so I'm making comments in this answer.

First, thanks for your answers.

FUZxxl: I tried -ddump-core, and got an error message that -ddump-core was an unrecognized flag. Perhaps you meant -ddump-simpl, which the book Real World Haskell recommended using, but I'm afraid I don't know how to read the output. I looked in the output file for "vx", etc, but never saw them. I guess I should learn how to read core. Are there any good guides for that?

John: According to GHC's flag reference documentation, if I'm reading it correctly, both -auto and -auto-all, are supposed add _scc_s to functions not marked INLINE. To see if -auto would work for me, I created another test case in which the Vec3 code was in a separate file/module, with Vec3(Vec3), vx, vy, vz, and dot exported. I imported this module into a Main.hs file. Compiling these with -auto, I still saw vx, vy, vz in the .prof file.

Re: your comment that the difference could be due to record syntax instead of polymorphism, I believe that the difference is more likely due to polymorphism, because when I defined

data Vec3 a = Vec3 {vx, vy, vz :: !a}

vx, vy and vz still showed up in the .prof file.

Tad

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文