ghc 运行时对分析的支持是如何实现的?
我在评论中没有找到太多文档。有没有好的博客文章或类似的文章?
I did not find much documentation in the commentery. Are there any good blog posts or similarly on this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有关分析框架的信息的最佳来源可能仍然是 原始论文帕特里克·桑塞姆和西蒙·佩顿·琼斯。其他详细信息可以在 Sansom 的博士论文以及后来的论文添加了正式的 规格。 Simon Marlow 还在 2011 年 Haskell 实施者研讨会 中谈到了 GHC 状态更新中的一些最新变化。
成本中心分析背后的想法是用“成本中心”节点注释表达式树,因此例如使用
-auto-all
程序将具有如下注释:在运行时输入
fib
,程序将查看当前的“成本中心堆栈”并将“foo”添加到顶部。一旦评估再次退出 SCC 注释的范围,这种情况就会逆转。一点魔法可以确保,如果n
碰巧是一个惰性值,并且程序需要执行其代码,则恢复适合该代码的成本中心,其中必要的。然后,该基础设施用于时间和空间分析:
计时器将定期检查成本中心堆栈。每次找到某个成本中心堆栈时,这都算作一个“tick”。最后,RTS 将根据其滴答数估算每个成本中心堆栈的时间量,从而为您提供时间配置文件。
每次分配一个对象时,程序都会保存一个指向当时当前的成本中心堆栈的指针。这使得垃圾收集器能够提供驻留字节数的统计信息,按分配站点细分。
正如评论中所要求的,关于优化的几句话:出于明显的原因,该框架不允许将非常数成本从一个成本中心转移到另一个成本中心的优化,有时迫使优化器变得非常悲观。例如,在上面的示例中,当前版本的 GHC 将无法对返回值进行拆箱,这意味着每个递归调用都会进行不必要的堆分配。
根据经验,不应指望跨 SCC 注释发生任何代码转换。如有疑问,最好在调用堆栈中足够高的位置注释函数,这样性能关键的位根本不会被注释。
The best source for information on the profiling framework might still be the original paper by Patrick Sansom and Simon Peyton Jones. Additional details can be found in Sansom's PhD thesis as well as the later paper adding a formal specification. Simon Marlow also spoke about a few recent changes in the GHC Status Update at Haskell Implementors' Workshop 2011.
The idea behind cost-centre profiling is to annotate the expression tree with "cost centre" nodes, so for example with
-auto-all
the program will have annotations like follows:At runtime when entering
fib
, the program would look at the current "cost centre stack" and add "foo" to the top. This would be reversed once the evaluation exits the scope of the SCC annotation again. A bit of magic ensures that if, say,n
happens to be a lazy value and the program needs to execute its code, the cost centre appropriate for that code is restored where necessary.This infrastructure is then used for both time as well as space profiling:
A timer will check the cost-centre stack periodically. Every time a certain cost-centre stack is found, this counts as a "tick". In the end, the RTS will estimate the amount of time per cost-centre stack from the count of its ticks, giving you a time profile.
Every time an object is allocated, the program saves back a pointer to the cost-centre stack that was current at that point in time. This enables the garbage collector to provide a statistic of how many bytes were resident, broken down by allocation site.
As requested in the comment, a few words on optimization: For obvious reasons the framework can not allow optimizations that move non-constant costs from one cost centre to the other, forcing the optimizer to be quite pessimistic at times. For example, in the above example the current release GHC will not be able to unbox the return value, meaning that each recursive call do an unnecessary heap-allocation.
As a rule of thumb, one should not count on any code transformations happening across a SCC annotation. When in doubt, it is better to annotate a function sufficiently high in the call-stack, so the performance-critical bits do not get annotated at all.
您可能会找到 Jones、Marlow 和 Johnny 撰写的这篇论文。辛格很有用,取决于你想要完成什么。它包括在并行上下文中分析 GHC 程序的实践,并包含一些您可能会发现有用的案例研究。
You might find this paper by Jones, Marlow & Singh useful, depending on what you want to accomplish. It includes practices for profiling GHC programs in a parallel context and contains some case studies that you might find useful.