为什么编译器不翻译成更简单的语言?

发布于 2024-10-02 04:57:59 字数 517 浏览 0 评论 0原文

通常编译器将其支持的语言翻译为汇编语言。或者最多是类似汇编的语言(字节码),例如 GCC 的 GIMPLE/GENERIC 或 Python/Java/.NET 字节码。

对于编译器来说,将其转换为更简单的语言(该语言已经实现了其语法的一个大子集)不是更简单吗?

例如,与 C 100% 兼容的 Objective-C 编译器可以仅为其扩展至 C 的语法添加语义,将其翻译为 C。我可以看到这样做的许多优点;人们可以使用这个 Objective-C 编译器将其代码转换为 C,以便使用不支持 C++ 的不同编译器来编译生成的 C 代码(但可以优化更多,或者编译速度更快,或者能够编译更多架构) )。或者,可以在仅允许使用 C 语言的项目中使用生成的 C 代码。

我猜想/希望,如果事情像这样工作,为当前语言编写扩展会容易得多(例如:添加到 C++ 关键字以简化常见模式的实现,或者仍然在C++中,通过将内联成员函数移动到头文件的末尾来删除使用前声明规则)

会有什么样的惩罚?生成的代码很难被人类理解?编译器将无法像现在一样优化?还有什么?

Usually compilers translate from the language they support to assembly. Or at most to an assembly-like language (bytecode), like GIMPLE/GENERIC for GCC or Python/Java/.NET bytecode.

Wouldn't it be simpler for a compiler translate to a simpler language, which already implement a big subset of their grammar?

For example an Objective-C compiler, which is 100% compatible with C, could add the semantics only for the syntax it extends to C's, translating it into C. I can see many advantages of doing this; one could use this Objective-C compiler to translate its code into C in order to compile the generated C code with a different compiler that doesn't support C++ (but that optimizes more, or that compiles quicker, or able to compile for more architectures). Or one would be able to use the generated C code in a project where only C is allowed.

I guess/hope that if things were working like this, it would have been a lot easier to write extensions for current languages (eg: adding to C++ keywords to ease the implementation of common patterns, or, still in C++, removing the declare before use rule by moving inline member functions to the end of header files)

What kind of penalties would there be? Generated code would be very difficult to be understood by humans? Compilers wouldn't be able to optimize as much as they can now? What else?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

冰雪梦之恋 2024-10-09 04:57:59

实际上,许多语言都通过使用中间语言来使用这一点。最大的例子是 Pascal,它有 Pascal-P 系统:Pascal 被编译成一种假设的汇编语言。移植 pascal 仅意味着为该汇编语言制作一个编译器:这项任务比移植整个 pascal 编译器简单得多。编写此编译器后,您只需要编译在此编写的(与机器无关的)pascal 编译器。

Bootstrapping 在编程语言设计中也经常使用。许多语言的编译器都是用同一种语言编写的(这里想到了 Haskell)。通过这样做,为该语言编写新功能只是意味着将该想法转换为当前语言,将其放入编译器中,然后重新编译。

我不认为这种方法的问题实际上是生成代码的可读性(我个人不会筛选通过编译器生成的汇编字节码),而是优化之一。高级编程语言中的许多想法(例如弱类型)很难自动转换为较低级系统语言(例如 C)。GCC 倾向于在代码生成之前进行优化是有原因的。

但在大多数情况下,编译器确实会翻译成更简单的语言,除了最基本的系统语言之外。

This is actually used by a lot of languages, through the use of Intermediate languages. The biggest example for this would be Pascal, which had the Pascal-P system: Pascal was compiled into a hypothetical assembly language. To port pascal would only mean making a compiler for this assembly language: a task a lot simpler than porting the entire pascal compiler. After writing this compiler, you'd only need to compile the (machine-independent) pascal compiler that was written in this.

Bootstrapping is also used quite often in programming language design. Many languages have their compilers written in the same language(Haskell comes to mind here). By doing this, writing a new functionality for the language simply means translating that idea into the current language, putting it into the compiler, then recompiling.

I don't think the problem with this method is really the readability of generated code(I don't sift through assembly byte-code generated through compilers, personally), but one of optimization. Many ideas in higher-level programming languages( weak-typing comes to mind) are hard to automatically translate into lower-level system languages such as C. There's a reason why GCC tends to do its optimization before code generation.

But for the most part, compilers do translate into simpler languages except for maybe the most basic of system languages.

囚你心 2024-10-09 04:57:59

顺便说一句,作为一个反例,众所周知,Tcl 是一种非常非常难以(如果不是完全不可能)翻译成 C 的语言。在过去 20 年里,有几个项目尝试过这一点,甚至有一个承诺一个商业产品,但尚未实现。

部分原因是 Tcl 是一种非常动态的语言(就像任何具有 eval 函数的语言一样)。部分原因是知道某物是代码还是数据的唯一方法是运行程序。

Incidentally, as a counterexample, Tcl is one language that is known to be very-very hard (if not totally impossible) to translate to C. Over the last 20 years there have been a couple of projects that tried this, even one promise of a commercial product but none have materialized.

In part it is because Tcl is a very dynamic language (as any language with an eval function is). In part it is because the only way to know if something is code or data is to run the program.

世态炎凉 2024-10-09 04:57:59

由于 Objective-C 是 C 的严格超集,而 C++ 包含大量与 C 非常相似的内容,因此要有效地解析任何一个,您都需要能够解析 C。在这种情况下,输出到机器代码并输出到更多C 代码在处理成本方面没有显着差异,用户的主要成本是现在的编译时间与最初的时间一样长加上第二个编译器所需的时间。

任何复制和粘贴看起来像 C 的内容并翻译它周围的其余内容的尝试都容易出现问题。首先,C++ 并不是 C 的严格超集,因此看起来像 C 的东西不一定编译得完全相同(尤其是与 C99 相比)。即使他们这样做了,假设用户在他们的 C 内容中犯了错误,编译器也不会倾向于以机器可读的格式提供错误信息,因此 Objective-C 到 C 层很难给出错误信息用户在收到“第 99 行错误”等有意义的错误后。

也就是说,许多编译器套件(例如 GCC,甚至像即将推出的 Clang + LLVM)使用中间形式将了解一种架构细节的位与了解特定语言细节的位解耦。然而,它更像是一种数据结构,而不是故意易于表达为书面语言的东西。

所以:纯粹出于实际原因,编译器不会这样工作。

Since Objective-C is a strict superset of C and C++ contains a very large amount that is a lot like C, to parse either you effectively already need to be able to parse C. In which case, outputting to machine code and outputting to more C code aren't substantially different in processing cost, the main cost to the user being that compiling now takes as long as it originally did plus the amount of time a second compiler takes.

Any attempt to copy and paste the stuff that looks like C and translate the rest around it would be prone to problems. Firstly, C++ isn't a strict superset of C so things that look like C don't necessarily compile exactly the same anyway (especially versus C99). And even if they did, supposing a user made an error in their C stuff, compilers don't tend to provide error information in a machine readable format, so it'd be really hard for the Objective-C to C layer to give the user a meaningful error after receiving e.g. "error at line 99".

That said, many compiler suites, like GCC and even more so like the upcoming Clang + LLVM, use an intermediate form to decouple the bit that knows about the specifics of one architecture from the bit that knows the specifics of a particular language. However, it tends to be more of a data structure than something intentionally easy to express as a written language.

So: compilers don't work like this for purely practical reasons.

一梦浮鱼 2024-10-09 04:57:59

Haskell 实际上是这样编译的:GHC 编译器首先将源代码翻译为中间函数式语言(它不如 Haskell 本身丰富),执行优化,然后将整个代码降低为 C 代码,然后由 GCC 编译。该解决方案存在一些棘手的问题,因此开始了一些项目来替换该后端。

http://blog.llvm.org/2010/05 /glasgow-haskell-compiler-and-llvm.html

Haskell is actually compiled this way: the GHC compiler first translates the source code to an intermediary functional language (which is less rich than Haskell self), performs optimizations and then lowers the whole thing to C code which is then compiled by GCC. This solutions has problems tough, and projects were started to replace this backend.

http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html

巨坚强 2024-10-09 04:57:59

有一个编译器构造堆栈完全基于这个想法。任何新语言都是通过简单翻译为较低级别的语言或已在此堆栈中定义的语言组合来实现的。

http://www.meta-alternative.net/mbase.html

但是,为了为了能够做到这一点,您添加到层次结构的每种小语言至少需要一些元编程功能。这一要求对语言语义增加了一些严格的限制。

There is a compilers construction stack which is fully based on this idea. Any new language is implemented as a trivial translation into a lower level language or a combination of languages which are already defined within this stack.

http://www.meta-alternative.net/mbase.html

However, in order to be able to do so, you'd need at least some metaprogramming capabilities in every little language you add to a hierarchy. This requirement adds some severe limitations on languages semantics.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文