关于 GHC 实施的好的介绍性文字吗?

发布于 2024-11-08 06:37:52 字数 515 浏览 2 评论 0原文

在 Haskell 中编程时(尤其是在解决 Project Euler 问题时,次优解决方案往往会对 CPU 或内存需求造成压力),我经常困惑为什么程序会这样运行。我查看配置文件,尝试引入一些严格性,选择另一种数据结构,...但大多数情况下都是在黑暗中摸索,因为我缺乏良好的直觉。

另外,虽然我知道 Lisp、Prolog 和命令式语言通常是如何实现的,但我不知道如何实现惰性语言。我也有点好奇。

因此我想更多地了解从程序源到执行模型的整个链条。

我想知道的事情:

  • 应用了哪些典型优化?

  • 当有多个评估候选者时,执行顺序是什么(虽然我知道它是由所需的输出驱动的,但首先评估 A 然后评估 B,或者首先评估 B 以检测到您不这样做之间仍然可能存在很大的性能差异根本不需要 A)

  • 如何表示 thunk?

  • 栈和堆是如何使用的?

  • 什么是 CAF? (分析有时表明热点存在,但我不知道)

When programming in Haskell (and especially when solving Project Euler problems, where suboptimal solutions tend to stress the CPU or memory needs) I'm often puzzled why the program behaves the way it is. I look at profiles, try to introduce some strictness, chose another data structure, ... but mostly it's groping in the dark, because I lack a good intuition.

Also, while I know how Lisp, Prolog and imperative languages are typically implemented, I have no idea about implementing a lazy language. I'm a bit curious too.

Hence I would like to know more about the whole chain from program source to execution model.

Things I wonder about:

  • what typical optimizations are applied?

  • what is the execution order when there are multiple candidates for evaluation (while I know it's driven from the needed outputs, there may still be big performance differences between first evaluating A and then B, or evaluating B first to detect that you don't need A at all)

  • how are thunks represented?

  • how are the stack and the heap used?

  • what is a CAF? (profiling indicates sometimes that the hotspot is there, but I have no clue)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

十级心震 2024-11-15 06:37:52

有关 GHC 系统的架构和方法的大部分技术信息都在他们的 wiki 中。我将链接到关键部分以及人们可能不知道的一些相关论文。

应用了哪些典型优化?

这方面的关键论文是:Haskell 的基于转换的优化器
SL Peyton Jones 和 A Santos,1998,描述了 GHC 使用的模型,该模型应用类似 Haskell 的核心语言的类型保留转换(重构)来改善时间和内存使用。这个过程称为“简化”。

Haskell 编译器中完成的典型操作包括:

  • 内联;
  • 贝塔减少;
  • 死代码消除;
  • 条件转换:案例分析、案例消除。
  • 拆箱;
  • 建筑产品退货;
  • 全面懒惰改造;
  • 专业化;
  • 埃塔扩张;
  • 拉姆达提升;
  • 严格分析。

有时:

  • 静态参数转换;
  • 构建/折叠或流融合;
  • 公共子表达式消除;
  • 构造函数专业化。

上述论文是开始理解大多数优化的关键地方。一些更简单的内容在早期的书中给出,实现函数式语言,Simon Peyton Jones 和 David Lester。

当有多个候选进行评估时,执行顺序是什么

假设您使用的是单处理器,那么答案是“编译器根据启发法静态选择的某种顺序,以及该计划”。如果您通过 Spark 使用推测性评估,那么“一些不确定的、无序的执行模式”。

一般来说,要查看执行顺序,请查看核心,例如 ghc-core工具。 Core 简介位于 RWH 关于优化的章节中。

如何表示 thunk?

thunk 表示为带有代码指针的堆分配数据。

堆对象

请参阅 堆对象的布局
具体来说,请参阅thunk 的表示方式

如何使用堆栈和堆?

Spineless Tagless G-machine 的设计,特别是自该论文发布以来进行了许多修改。概括地说,执行模型:(

  • 装箱的)对象在全局堆上分配;
  • 每个线程对象都有一个堆栈,由具有与堆对象的布局相同;
  • 当你进行函数调用时,你将值压入堆栈并跳转到函数;
  • 如果代码需要分配例如构造函数,则该数据将放置在堆上。

要深入了解堆栈使用模型,请参阅"Push/输入与评估/应用”

什么是 CAF?

“恒定应用形式”。例如,程序中为程序执行的生命周期分配的顶级常量。由于它们是静态分配的,因此它们必须是 由垃圾收集器专门处理


参考资料和进一步阅读

The majority of the technical information about the architecture and approach of the GHC system is in their wiki. I'll link to the key pieces, and some related papers that people may not know about.

What typical optimizations are applied?

The key paper on this is: A transformation-based optimiser for Haskell,
SL Peyton Jones and A Santos, 1998, which describes the model GHC uses of applying type-preserving transformations (refactorings) of a core Haskell-like language to improve time and memory use. This process is called "simplification".

Typical things that are done in a Haskell compiler include:

  • Inlining;
  • Beta reduction;
  • Dead code elimination;
  • Transformation of conditions: case-of-case, case elimiation.
  • Unboxing;
  • Constructed product return;
  • Full laziness transformation;
  • Specialization;
  • Eta expansion;
  • Lambda lifting;
  • Strictness analysis.

And sometimes:

  • The static argument transformation;
  • Build/foldr or stream fusion;
  • Common sub-expression elimination;
  • Constructor specialization.

The above-mentioned paper is the key place to start to understand most of these optimizations. Some of the simpler ones are given in the earlier book, Implementing Functional Languages, Simon Peyton Jones and David Lester.

What is the execution order when there are multiple candidates for evaluation

Assuming you're on a uni-processor, then the answer is "some order that the compiler picks statically based on heuristics, and the demand pattern of the program". If you're using speculative evaluation via sparks, then "some non-deterministic, out-of-order execution pattern".

In general, to see what the execution order is, look at the core, with, e.g. the ghc-core tool. An introduction to Core is in the RWH chapter on optimizations.

How are thunks represented?

Thunks are represented as heap-allocated data with a code pointer.

Heap object

See the layout of heap objects.
Specifically, see how thunks are represented.

How are the stack and the heap used?

As determined by the design of the Spineless Tagless G-machine, specifically, with many modifications since that paper was released. Broadly, the execution model:

  • (boxed) objects are allocated on the global heap;
  • every thread object has a stack, consisting of frames with the same layout as heap objects;
  • when you make a function call, you push values onto the stack and jump to the function;
  • if the code needs to allocate e.g. a constructor, that data is placed on the heap.

To deeply understand the stack use model, see "Push/Enter versus Eval/Apply".

What is a CAF?

A "Constant Applicative Form". E.g. a top level constant in your program allocated for the lifetime of your program's execution. Since they're allocated statically, they have to be treated specially by the garbage collector.


References and further reading:

生寂 2024-11-15 06:37:52

这可能不是您想要的介绍性文本,但是 Edward Yang 有一系列博客文章正在讨论 Haskell 堆、thunk 的实现方式等。

它很有趣,无论是插图还是凭借对于 Haskell 新手来说,无需深入研究太多细节即可解释事物。该系列涵盖了您的许多问题:

在更技术的层面上,有许多论文(与其他内容一起)涵盖了您想知道的部分内容:

  • SPJ、Simon Marlow 等人的论文on GC in Haskell - 我还没有读过它,但由于 GC 通常代表了 Haskell 所做工作的一个很好的部分,因此它应该提供见解。
  • Haskell 2010 报告 - 我相信您一定听说过这个,但它是太好了,无法链接到。可以在某些地方进行枯燥的阅读,但这是理解 Haskell 为何如此的最好方法之一,至少是我读过的部分。
  • Haskell 的历史 - 比名字所暗示的更具技术性,并为 Haskell 的设计以及设计背后的决策提供了一些非常有趣的观点。读完之后你不禁会更好地理解Haskell的实现。

This is probably not what you had in mind in terms of an introductory text, but Edward Yang has an ongoing series of blog posts discussing the Haskell heap, how thunks are implemented, etc.

It's entertaining, both with the illustrations and also by virtue of explicating things without delving into too much detail for someone new to Haskell. The series covers many of your questions:

On a more technical level, there are a number of papers that cover (in concert with other things), parts of what you're wanting to know.:

  • A paper by SPJ, Simon Marlow et al on GC in Haskell - I haven't read it, but since GC often represents a good porton of the work Haskell does, it should give insight.
  • The Haskell 2010 report - I'm sure you'll have heard of this, but it's too good not to link to. Can make for dry reading in places, but one of the best ways to understand what makes Haskell the way it is, at least the portions I've read.
  • A history of Haskell - is more technical than the name would suggest, and offers some very interesting views into Haskell's design, and the decisions behind the design. You can't help but better understand Haskell's implementation after reading it.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文