内联函数仍然显示在 .prof 文件中
我正在尝试找出如何优化一些代码。这是:
{-# OPTIONS_GHC -funbox-strict-fields #-}
data Vec3 a = Vec3 !a !a !a
vx :: Vec3 a -> a
vx (Vec3 x _ _) = x
{-# SPECIALIZE INLINE vx :: Vec3 Double -> Double #-}
vy :: Vec3 a -> a
vy (Vec3 _ y _) = y
{-# SPECIALIZE INLINE vy :: Vec3 Double -> Double #-}
vz :: Vec3 a -> a
vz (Vec3 _ _ z) = z
{-# SPECIALIZE INLINE vz :: Vec3 Double -> Double #-}
dot :: (Num a) => Vec3 a -> Vec3 a -> a
dot u v = (vx u * vx v) + (vy u * vy v) + (vz u * vz v)
{-# SPECIALIZE INLINE dot :: Vec3 Double -> Vec3 Double -> Double #-}
type Vec3D = Vec3 Double
-- just make a bunch of vecs to measure performance
n = 1000000 :: Double
v1s = [Vec3 x y z | (x, y, z) <- zip3 [1 .. n] [2 .. n + 1] [3 .. n + 2]]
:: [Vec3D]
v2s = [Vec3 x y z | (x, y, z) <- zip3 [3 .. n + 2] [2 .. n + 1] [1 .. n]]
:: [Vec3D]
dots = zipWith dot v1s v2s :: [Double]
theMax = maximum dots :: Double
main :: IO ()
main = putStrLn $ "theMax: " ++ show theMax
当我使用 ghc 6.12.1(i486 机器上的 ubuntu linux)进行编译时
ghc --make -O2 Vec.hs -prof -auto-all -fforce-recomp
并运行
Vec +RTS -p
查看 Vec .prof 文件中,
COST CENTRE MODULE %time %alloc
v2s Main 30.9 36.5
v1s Main 27.9 31.3
dots Main 27.2 27.0
CAF GHC.Float 4.4 5.2
vy Main 3.7 0.0
vx Main 2.9 0.0
theMax Main 2.2 0.0
我发现函数 vx 和 vy 占用了很大一部分时间。
这是为什么?我认为 SPECIALIZE INLINE pragma 会使 这些功能消失了。
当使用非多态时,
data Vec3D = Vec3D {vx, vy, vz :: !Double} deriving Show
函数 vx、vy、vz 不显示为成本中心。
I'm trying to figure out how to optimize some code. Here it is:
{-# OPTIONS_GHC -funbox-strict-fields #-}
data Vec3 a = Vec3 !a !a !a
vx :: Vec3 a -> a
vx (Vec3 x _ _) = x
{-# SPECIALIZE INLINE vx :: Vec3 Double -> Double #-}
vy :: Vec3 a -> a
vy (Vec3 _ y _) = y
{-# SPECIALIZE INLINE vy :: Vec3 Double -> Double #-}
vz :: Vec3 a -> a
vz (Vec3 _ _ z) = z
{-# SPECIALIZE INLINE vz :: Vec3 Double -> Double #-}
dot :: (Num a) => Vec3 a -> Vec3 a -> a
dot u v = (vx u * vx v) + (vy u * vy v) + (vz u * vz v)
{-# SPECIALIZE INLINE dot :: Vec3 Double -> Vec3 Double -> Double #-}
type Vec3D = Vec3 Double
-- just make a bunch of vecs to measure performance
n = 1000000 :: Double
v1s = [Vec3 x y z | (x, y, z) <- zip3 [1 .. n] [2 .. n + 1] [3 .. n + 2]]
:: [Vec3D]
v2s = [Vec3 x y z | (x, y, z) <- zip3 [3 .. n + 2] [2 .. n + 1] [1 .. n]]
:: [Vec3D]
dots = zipWith dot v1s v2s :: [Double]
theMax = maximum dots :: Double
main :: IO ()
main = putStrLn $ "theMax: " ++ show theMax
When I compile with ghc 6.12.1 (ubuntu linux on an i486 machine)
ghc --make -O2 Vec.hs -prof -auto-all -fforce-recomp
and run
Vec +RTS -p
Looking at the Vec.prof file,
COST CENTRE MODULE %time %alloc
v2s Main 30.9 36.5
v1s Main 27.9 31.3
dots Main 27.2 27.0
CAF GHC.Float 4.4 5.2
vy Main 3.7 0.0
vx Main 2.9 0.0
theMax Main 2.2 0.0
I see that the function vx and vy take a significant portion of the time.
Why is that? I thought that the SPECIALIZE INLINE pragma would make
those functions go away.
When using a non-polymorphic
data Vec3D = Vec3D {vx, vy, vz :: !Double} deriving Show
the functions vx, vy, vz do not show as a cost center.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我怀疑这是使用
-auto-all
的副作用,它会抑制 GHC 通常执行的许多优化,包括内联。我怀疑非多态版本的差异实际上是由于vx
、vy
和vz
是通过记录语法定义的,而不是因为多态性(但我可能是错的)。不要使用 -auto-all,而是尝试将导出列表添加到模块并使用“-auto”进行编译,或者通过 SCC 编译指示手动设置成本中心。无论如何,我通常使用 SCC 编译指示,因为我经常想将它们设置在 let 绑定函数上,而 -auto-all 则不会这样做。
I suspect this is a side-effect of using
-auto-all
, which inhibits many optimizations GHC would normally perform, including inlining. I suspect the difference in your non-polymorphic version is actually due tovx
,vy
, andvz
being defined via record syntax rather than because of polymorphism (but I could be wrong about this).Instead of using -auto-all, try either adding an export list to the module and compiling with "-auto", or manually setting cost centers via SCC pragmas. I usually use SCC pragmas anyway because I often want to set them on let-bound functions, which -auto-all won't do.
我不知道如何对回复发表评论,所以我在这个答案中发表评论。
首先,感谢您的回答。
FUZxxl:我尝试了 -ddump-core,并收到一条错误消息,指出 -ddump-core 是一个无法识别的标志。也许您的意思是 -ddump-simpl,《Real World Haskell》一书推荐使用它,但恐怕我不知道如何读取输出。我在输出文件中查找“vx”等,但从未看到它们。我想我应该学习如何阅读核心。有什么好的指南吗?
John:根据 GHC 的标记参考文档< /a>,如果我没看错的话,-auto 和 -auto-all 都应该将 _scc_s 添加到未标记为 INLINE 的函数中。为了查看 -auto 是否适合我,我创建了另一个测试用例,其中 Vec3 代码位于单独的文件/模块中,并导出了 Vec3(Vec3)、vx、vy、vz 和 dot。我将此模块导入到 Main.hs 文件中。使用 -auto 编译这些,我仍然在 .prof 文件中看到 vx、vy、vz。
回复:您的评论认为差异可能是由于记录语法而不是多态性造成的,我相信差异更有可能是由于多态性造成的,因为当我定义
vx、vy 和 vz 时仍然出现在 .prof 文件中。
泰德
I could not figure out how to make comments to the replies, so I'm making comments in this answer.
First, thanks for your answers.
FUZxxl: I tried -ddump-core, and got an error message that -ddump-core was an unrecognized flag. Perhaps you meant -ddump-simpl, which the book Real World Haskell recommended using, but I'm afraid I don't know how to read the output. I looked in the output file for "vx", etc, but never saw them. I guess I should learn how to read core. Are there any good guides for that?
John: According to GHC's flag reference documentation, if I'm reading it correctly, both -auto and -auto-all, are supposed add _scc_s to functions not marked INLINE. To see if -auto would work for me, I created another test case in which the Vec3 code was in a separate file/module, with Vec3(Vec3), vx, vy, vz, and dot exported. I imported this module into a Main.hs file. Compiling these with -auto, I still saw vx, vy, vz in the .prof file.
Re: your comment that the difference could be due to record syntax instead of polymorphism, I believe that the difference is more likely due to polymorphism, because when I defined
vx, vy and vz still showed up in the .prof file.
Tad