关于 GHC 实施的好的介绍性文字吗?
在 Haskell 中编程时(尤其是在解决 Project Euler 问题时,次优解决方案往往会对 CPU 或内存需求造成压力),我经常困惑为什么程序会这样运行。我查看配置文件,尝试引入一些严格性,选择另一种数据结构,...但大多数情况下都是在黑暗中摸索,因为我缺乏良好的直觉。
另外,虽然我知道 Lisp、Prolog 和命令式语言通常是如何实现的,但我不知道如何实现惰性语言。我也有点好奇。
因此我想更多地了解从程序源到执行模型的整个链条。
我想知道的事情:
应用了哪些典型优化?
当有多个评估候选者时,执行顺序是什么(虽然我知道它是由所需的输出驱动的,但首先评估 A 然后评估 B,或者首先评估 B 以检测到您不这样做之间仍然可能存在很大的性能差异根本不需要 A)
如何表示 thunk?
栈和堆是如何使用的?
什么是 CAF? (分析有时表明热点存在,但我不知道)
When programming in Haskell (and especially when solving Project Euler problems, where suboptimal solutions tend to stress the CPU or memory needs) I'm often puzzled why the program behaves the way it is. I look at profiles, try to introduce some strictness, chose another data structure, ... but mostly it's groping in the dark, because I lack a good intuition.
Also, while I know how Lisp, Prolog and imperative languages are typically implemented, I have no idea about implementing a lazy language. I'm a bit curious too.
Hence I would like to know more about the whole chain from program source to execution model.
Things I wonder about:
what typical optimizations are applied?
what is the execution order when there are multiple candidates for evaluation (while I know it's driven from the needed outputs, there may still be big performance differences between first evaluating A and then B, or evaluating B first to detect that you don't need A at all)
how are thunks represented?
how are the stack and the heap used?
what is a CAF? (profiling indicates sometimes that the hotspot is there, but I have no clue)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有关 GHC 系统的架构和方法的大部分技术信息都在他们的 wiki 中。我将链接到关键部分以及人们可能不知道的一些相关论文。
应用了哪些典型优化?
这方面的关键论文是:Haskell 的基于转换的优化器,
SL Peyton Jones 和 A Santos,1998,描述了 GHC 使用的模型,该模型应用类似 Haskell 的核心语言的类型保留转换(重构)来改善时间和内存使用。这个过程称为“简化”。
Haskell 编译器中完成的典型操作包括:
有时:
上述论文是开始理解大多数优化的关键地方。一些更简单的内容在早期的书中给出,实现函数式语言,Simon Peyton Jones 和 David Lester。
当有多个候选进行评估时,执行顺序是什么
假设您使用的是单处理器,那么答案是“编译器根据启发法静态选择的某种顺序,以及该计划”。如果您通过 Spark 使用推测性评估,那么“一些不确定的、无序的执行模式”。
一般来说,要查看执行顺序,请查看核心,例如 ghc-core工具。 Core 简介位于 RWH 关于优化的章节中。
如何表示 thunk?
thunk 表示为带有代码指针的堆分配数据。
请参阅 堆对象的布局。
具体来说,请参阅thunk 的表示方式。
如何使用堆栈和堆?
由Spineless Tagless G-machine 的设计,特别是自该论文发布以来进行了许多修改。概括地说,执行模型:(
要深入了解堆栈使用模型,请参阅"Push/输入与评估/应用”。
什么是 CAF?
“恒定应用形式”。例如,程序中为程序执行的生命周期分配的顶级常量。由于它们是静态分配的,因此它们必须是 由垃圾收集器专门处理。
参考资料和进一步阅读:
The majority of the technical information about the architecture and approach of the GHC system is in their wiki. I'll link to the key pieces, and some related papers that people may not know about.
What typical optimizations are applied?
The key paper on this is: A transformation-based optimiser for Haskell,
SL Peyton Jones and A Santos, 1998, which describes the model GHC uses of applying type-preserving transformations (refactorings) of a core Haskell-like language to improve time and memory use. This process is called "simplification".
Typical things that are done in a Haskell compiler include:
And sometimes:
The above-mentioned paper is the key place to start to understand most of these optimizations. Some of the simpler ones are given in the earlier book, Implementing Functional Languages, Simon Peyton Jones and David Lester.
What is the execution order when there are multiple candidates for evaluation
Assuming you're on a uni-processor, then the answer is "some order that the compiler picks statically based on heuristics, and the demand pattern of the program". If you're using speculative evaluation via sparks, then "some non-deterministic, out-of-order execution pattern".
In general, to see what the execution order is, look at the core, with, e.g. the ghc-core tool. An introduction to Core is in the RWH chapter on optimizations.
How are thunks represented?
Thunks are represented as heap-allocated data with a code pointer.
See the layout of heap objects.
Specifically, see how thunks are represented.
How are the stack and the heap used?
As determined by the design of the Spineless Tagless G-machine, specifically, with many modifications since that paper was released. Broadly, the execution model:
To deeply understand the stack use model, see "Push/Enter versus Eval/Apply".
What is a CAF?
A "Constant Applicative Form". E.g. a top level constant in your program allocated for the lifetime of your program's execution. Since they're allocated statically, they have to be treated specially by the garbage collector.
References and further reading:
这可能不是您想要的介绍性文本,但是 Edward Yang 有一系列博客文章正在讨论 Haskell 堆、thunk 的实现方式等。
它很有趣,无论是插图还是凭借对于 Haskell 新手来说,无需深入研究太多细节即可解释事物。该系列涵盖了您的许多问题:
在更技术的层面上,有许多论文(与其他内容一起)涵盖了您想知道的部分内容:
This is probably not what you had in mind in terms of an introductory text, but Edward Yang has an ongoing series of blog posts discussing the Haskell heap, how thunks are implemented, etc.
It's entertaining, both with the illustrations and also by virtue of explicating things without delving into too much detail for someone new to Haskell. The series covers many of your questions:
On a more technical level, there are a number of papers that cover (in concert with other things), parts of what you're wanting to know.: