LLVM 与 C--; 对于 Haskell 来说,LLVM 从根本上来说怎么可能不比 C 更好呢?
我对 LLVM 足够低以模拟任何系统感到兴奋, 并认为苹果公司采用它是有希望的; 但苹果并没有专门支持 Haskell;
而且,有些人认为 Haskell 使用 C-- 会更好:
LLVM 的支持者还没有解决零开销垃圾收集的问题 并不太令人惊讶。 在不了解数据模型的情况下解决这个问题 是计算机科学中的一个悬而未决的问题。
I've been excited about LLVM being low enough to model any system,
and saw it as promising that Apple was adopting it; but then again Apple doesn't specifically support Haskell;
And, some think that Haskell would be better off with C--:
That LLVM'ers haven't solved the problem of zero-overhead garbage collection
isn't too surprising.
Solving this while staying agnostic of the data model
is an open question in computer science.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在对操作 C-- 的新代码生成后端进行了一些工作之后,我可以说 C-- 比 LLVM 更好的原因有很多,以及为什么它们根本不是同一件事。
C--在比 LLVM 更高的抽象级别上运行; 例如,我们可以用 C 语言生成代码,其中堆栈指针完全是隐式的,并且仅在稍后的编译过程中才显现出来。 这使得应用某些类型的优化变得更加容易,因为更高级别的表示允许更多的代码运动和更少的不变量。
虽然我们正在积极寻求解决此问题,但 LLVM 遇到了与 via-C 后端相同的问题:它要求我们创建过程点。什么是过程点? 本质上,因为 Haskell 不使用经典的 call/ret 调用约定,所以每当我们进行子过程调用的道德等价时,我们需要将延续推入堆栈,然后跳转到子过程。 这个延续通常是一个本地标签,但 LLVM 要求它是一个实际的过程,因此我们需要将函数分成更小的部分(每个部分称为一个过程点)。 这对于在过程级别上工作的优化来说是个坏消息。
C-- 和 LLVM 采用不同的方法来优化数据流。 LLVM 使用带有 phi 节点的传统 SSA 风格:C-- 使用一个名为 Hoopl 的很酷的框架,它不需要您维护 SSA 不变式。 我可以确认:Hoopl 中的编程优化非常有趣,尽管某些类型的优化(想到的是一次性使用的变量的内联)在这种数据流设置中并不是最自然的。
Having worked a bit with the new code generation backend which manipulates C--, I can say there are a number of reasons why C-- can be better than LLVM, and also why they’re not really at all the same thing.
C-- operates at a higher level of abstraction than LLVM; for example, we can generate code in C-- where the stack pointer is entirely implicit, and only manifest it later during the compilation process. This makes applying certain types of optimizations much easier, because the higher level representation allows for more code motion with less invariants.
While we’re actively looking to fix this, LLVM suffers from the same problem that the via-C backend suffered: it requires us to create proc points. What are proc points? Essentially, because Haskell does not use the classic call/ret calling convention, whenever we make the moral equivalent of a subprocedure call, we need to push a continuation onto the stack and then jump to the subprocedure. This continuation is usually a local label, but LLVM requires it to be an actual procedure, so we need to break functions into smaller pieces (each piece being called a proc point). This is bad news for optimizations, which work on a procedure-level.
C-- and LLVM take a different approach to dataflow optimization. LLVM uses traditional SSA style with phi-nodes: C-- uses a cool framework called Hoopl which doesn’t require you to maintain the SSA invariant. I can confirm: programming optimizations in Hoopl is a lot of fun, though certain types of optimizations (inlining of one-time used variables comes to mind) are not exactly the most natural in this dataflow setting.
嗯,UNSW 有一个项目将 GHC Core 转换为 LLVM
请记住:10 年前还不清楚 LLVM 是否能够构建 C-- 无法构建的所有基础设施。 不幸的是,LLVM 拥有可移植、优化代码的基础设施,但没有 C 拥有的高级语言支持的基础设施。
一个有趣的项目是针对 LLVM 从 C-- ...
更新,从 GHC 7 开始,GHC 使用 LLVM 进行代码生成。 使用
-fllvm
标志。 这提高了一些低级程序的数值性能。 除此之外,性能与旧的 GCC 后端类似。Well, there is a project at UNSW to translate GHC Core to LLVM
Remember: it wasn't clear 10 years ago that LLVM would build up all the infrastructure C-- wasn't able to. Unfortunately, LLVM has the infrastructure for portable, optimized code, but not the infrastructure for nice high level language support, that C-- ha(s)d.
An interesting project would be to target LLVM from C-- ...
Update, as of GHC 7, GHC uses LLVM for code generation. Use the
-fllvm
flag. This has improved numerical performance for some low level programs. Otherwise, performance is similar to the old GCC backend.GHC 现在正式拥有 LLVM 后端,事实证明它是 与 GCC 和 native-codegen 竞争,并且在某些情况下实际上更快。 并且 LLVM 项目已接受新的调用约定 David Terei 在 LLVM 上为 Haskell 创建了项目,令人惊讶的是,这两个项目现在实际上正在协同工作。
GHC now officially has an LLVM backend, and it turns out that it's competitive with the GCC and native-codegen and actually faster in some cases. And the LLVM project has accepted the new calling convention David Terei created for Haskell on LLVM, so amazingly, the two projects are actually working together now.
实践中的一个问题是 LLVM 更像是一个移动目标。
GHC 在尝试支持多个版本的 LLVM 时遇到了一些麻烦。
关于 ghc-dev 邮件有一个活跃的讨论列出有关此的信息。
顺便说一句,目前 ghc 中的 llvm 后端是在 Haskell 翻译为 cmm 语言之后(我相信这主要是 C——用 STG 语言的某些寄存器扩展),并且由于上述待解决的困难,正在进行冗余优化,这会减慢编译速度。
此外,从历史上看,目前据我所知,LLVM 项目并不优先考虑提供可移植平台,一些开发人员已经明确表示它 是编译器 IR,而不是可移植汇编语言的形式。
您为一个预期目标编写的 LLVM IR 对于其他预期目标可能根本没有用处。 作为比较,C--网站实际上将其称为便携式组装。 “您会对一种可移植的汇编语言感到更加满意,它可能是......”是他们的网站的一句话。 该网站还提到了一个运行时接口,以简化垃圾收集和异常处理的可移植实现。
因此,您可以将 C 视为所有前端的可移植共同点,它与 CIL 和 Java 字节码有更多共同点,而 LLVM IR 作为所有后端的表达性共同点,有助于统一低端代码。多个目标通用的级别优化。 LLVM IR 还提供了额外的好处,即 LLVM 项目将实现大量低级优化。 话虽这么说,在某些方面,LLVM IR 实际上可以被认为是比 C 更高的级别,例如 LLVM IR 有不同的类型,而在 C 中,一切都只是位向量。
One issue in practice is LLVM has been much more of a moving target.
GHC has had some trouble trying to support multiple versions of LLVM.
There is an active discussion on the ghc-dev mailing list about this.
Btw, currently the llvm backend in ghc is after the Haskell is translated to the cmm language(which I believe is mostly just C-- extended with certain registers from the STG Language), and due to the above to-be-addressed difficulties, there are redundant optimizations being done which slows down the compilation.
Also, historically, and currently AFAIK, the LLVM project doesn't prioritize providing a portable platform, and some developers have made a point of articulating that it is a compiler IR and not a form of portable assembly language.
The LLVM IR you write for one intendend target may not at all be useful for a different intended target. For comparison, the C-- website actually refers to it as portable assembly. "You would be much happier with one portable assembly language that could be ..." is a quote from their website. That website also mentions a runtime interface to ease portable implementation of garbage collection and exception handling.
So you could think of C-- as a portable common ground for all of the front ends that has a bit more in common with CIL and Java byte code and LLVM IR as an expressive common-ground for all of your backends that facilitates unifying low-level optimizations common to multiple targets. LLVM IR also provides the added bonus that the LLVM project will implement a lot of those low level optimization. That being said, in some ways LLVM IR could actually be considered higher level than C--, for example LLVM IR has different types where as in C-- everything is just bit vectors.