一种语言如何能够被自身解释(如鲁比尼乌斯)?
我已经用 Ruby 编程有一段时间了,只使用 Ruby 的标准 MRI 实现,但我一直对我经常听到的其他实现感到好奇。
前几天我读到了有关 Rubinius 的文章,这是一个用 Ruby 编写的 Ruby 解释器。我尝试在不同的地方查找它,但我很难弄清楚这样的东西到底是如何工作的。我在编译器或语言编写方面从来没有太多经验,但我真的很想弄清楚。
一种语言究竟如何能够被自身解释?编译过程中是否有一个我不明白其意义的基本步骤?有人可以像我是个白痴一样向我解释一下吗(因为无论如何,这不会离基地太远)
I've been programming in Ruby for a while now with just the standard MRI implementation of Ruby, but I've always been curious about the other implementations I hear so much about.
I was reading about Rubinius the other day, a Ruby interpreter written in Ruby. I tried looking it up in various places, but I was having a hard time figuring out exactly how something like this works. I've never had much experience in compilers or language writing but I'm really interested to figure it out.
How exactly can a language be interpreted by itself? Is there a basic step in compiling that I don't understand where this makes sense? Can someone explain this to me like I'm an idiot (because that wouldn't be too far off base anyways)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这比你想象的要简单。
Rubinius 并不是 100% 用 Ruby 编写,只是大部分是用 Ruby 编写的。
来自 http://rubini.us/
It's simpler than you think.
Rubinius is not 100% written in Ruby, just mostly.
From http://rubini.us/
您正在寻找的概念是编译器引导。
基本上,引导意味着用 x 语言为 x 语言编写编译器(或解释器)。这可以通过手动编写较低级别的基本编译器(即用汇编语言编写 C 编译器)或使用不同的高级语言来完成。
在 wikipedia 上了解有关引导的更多信息。 Greg 关于元循环评估器的回答也强烈推荐,包括 SICP 中的相关章节。
The concept you are looking for is compiler bootstrapping.
Basically bootstrapping means writing a compiler (or an interpreter) for language x in language x. This is done either by writing a basic compiler on a lower level by hand (i.e. writing a C compiler in Assembly), or by using a different high-level language.
Read more about bootstrapping on wikipedia. Greg's answer regarding meta-circular evaluators is also highly recommended, including the relevant chapter in SICP.
对于 Rubinius,VM 是用 C++ 编写的,处理所有低级(操作系统相关)内容和基本操作。 VM 有自己的字节码格式(就像 JVM 也有自己的字节码格式一样),当 Rubinius 启动时,它会启动执行字节码的 VM。
然而,与 C (MRI) 或 Java (JRuby) 相比,Rubinius 的大部分标准库(属于 Ruby 语言的一部分)都是用 Ruby 实现的。此外,Rubinius 字节码编译器也是用 Ruby 编写的。
所以,是的,在一开始的某个时刻,他们必须使用标准 Ruby 解释器 (MRI) 来引导 Rubinius。但这种情况不应该再出现了(尽管我不确定您是否仍然需要它,因为它的构建系统使用 rake)。
In case of Rubinius, the VM is written in C++ and deals with all the lowlevel (operating system related) stuff and base operations. The VM has it's own bytecode format (like the JVM has its own as well) and when Rubinius is started it starts the VM which executes the bytecode.
Most of Rubinius' standard library (which is part of Ruby the language) is implemented in Ruby however, compared to C (MRI) or Java (JRuby). Also, the Rubinius bytecode compiler is also written in Ruby.
So yeah, at some point early on in the beginning they had to use the standard Ruby interpreter (MRI) to bootstrap Rubinius. But this shouldn't be the case anymore (although I'm not sure if you still might need it since its build-system uses rake).
假设您正在使用的语言是某种语言,例如 Lisp,但这并不重要。 (可以是 C++、Java、Ruby,任何东西。)
好吧,你有一个 Lisp 的实现。将此实现称为 Imp(只是 IMPlementation 的一些虚构名称缩写)。由于 Imp 本身就是一个程序,因此您的计算机可以运行它。现在,您可以用 Lisp 编写自己的 Lisp 实现,并将其称为 Circ。 Circ 只是一个从 Lisp 代码编译(或解释,如果你愿意的话)的程序。您编写的代码是为了读取文件、解析它(将其处理成有意义的数据),并对数据执行某些操作。这是什么东西?对于 Circ,它执行数据。
但它是如何做到的呢?
假设对于一个简单的情况,Circ 读入并解析的代码很简单,比如做一些数学运算并输出结果。 Circ 将代码处理成易于使用的数据(对于像 Lisp 这样的语言来说,很容易开始,但这超出了重点)并存储它。在 Lisp 中,你可以编写代码来处理数字,因此为 Circ 编写的代码也可以这样做,因为它是用 Lisp 编写的。因此,处理后的数据被插入一些附加处理代码中......瞧!你已经得到了数值结果!然后你的 Circ 程序输出结果。
同样的事情可以用比简单数学更复杂的事情来完成。事实上,您可以编译/解释该语言的其他方面。写足够多的这些“其他方面”并将它们粘合在一起,你就得到了一个用 Lisp 编写的 Lisp 编译器。
由于编译器是由 Imp 编译的,因此它可以在您的机器上运行,并且很快!你完成了。
Suppose the language you are working with is some language, say Lisp, though it doesn't matter. (Could be C++, Java, Ruby, anything.)
Well you have an implementation of Lisp. Call this implementation Imp (just some made up name short for IMPlementation). Since Imp is a program in itself, your computer can run it. Now you write your own implementation for Lisp written in Lisp and you call it Circ. Circ is just a program compiled (or interpreted if you will) from Lisp code. Your code is written so it reads in a file, parses it (processes it into meaningful data), and does something with the data. What is this something? In the case of Circ, it executes the data.
But how does it do so?
Well suppose for a simple case that the code Circ reads in and parses is something simple like doing some math and outputting the result. Circ processes the code into easy to use data (well for a language like Lisp it's easy to begin with, but that's beyond the point) and stores it. Well in Lisp you can write code to crunch numbers, so the code written for Circ can do so too because it is written in Lisp. So the processed data is plugged into some addition processing code... and voila! You have the numerical result! Then your Circ program outputs the result.
The same thing can be done with more complex things than simple math. In fact you can compile/interpret other aspects of the language. Write enough of these 'other aspects' and glue them together, you get a a compiler for Lisp written in Lisp.
Since the compiler is compiled by Imp, it can be run by your machine, and presto! You are done.
该技术通常称为元循环评估器,并在几十年前首次在上下文中引入Lisp 的。
该技术的详细描述可以在计算机程序的结构和解释,第 4 章中找到。
This technique is generally called a metacircular evaluator and was first introduced several decades ago in the context of Lisp.
A good description of the technique can be found in Structure and Interpretation of Computer Programs, chapter 4.