如今如何创建一种语言?
我需要抽出时间来编写我一直想写的编程语言。你们这些天孩子们是怎么做到的?我已经脱离圈子十多年了。你们现在的做法与我们在互联网出现之前、Windows 出现之前的时代相比,有什么不同吗?您知道吗,当“真正的”程序员用 C 语言编写代码、使用命令行并争论哪个 shell 更好时?
只是为了澄清,我的意思是,不是如何设计一种语言(我可以很容易地弄清楚),而是如何构建编译器和标准库等等?你们孩子现在使用什么工具?
I need to get around to writing that programming language I've been meaning to write. How do you kids do it these days? I've been out of the loop for over a decade; are you doing it any differently now than we did back in the pre-internet, pre-windows days? You know, back when "real" coders coded in C, used the command line, and quibbled over which shell was superior?
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily) but how do you build the compiler and standard libraries and so forth? What tools do you kids use these days?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(19)
自打孔卡时代以来,一个新的考虑因素是已经大量提供“标准库”的虚拟机的存在。以 JVM 或 .NET CLR 为目标,而不是老式的“语言围墙花园”,可以节省大量的引导工作。如果您正在创建编译语言,您可能还会发现 Java 字节码或 MSIL 是比机器代码更容易的编译目标(当然,如果您是为了创建一个严格优化编译器的乐趣,那么您会看到这个作为错误而不是功能)。
不利的一面是,JVM 或 CLR 的习惯用法可能不是您想要的语言。因此,您最终可能仍然构建“标准库”,只是为了在平台设施上提供惯用的接口。 (一个例子是,每种语言及其狗似乎都提供了自己的写入控制台的方法,而不是让用户手动调用 System.out.println 或 Console.WriteLine。)尽管如此,它仍然可以逐步开发惯用的方法库,并且意味着您从未抽出时间为其构建惯用接口的更晦涩的库仍然可以访问,即使是以一种丑陋的方式。
如果您正在考虑使用解释性语言,.NET 还可以通过动态语言运行时 (DLR) 支持高效解释。 (我不知道 JVM 是否有等效的东西。)这应该可以帮助您腾出时间来专注于语言设计,而不必过多担心解释器的优化。
One consideration that's new since the punched card era is the existence of virtual machines already bountifully provided with "standard libraries." Targeting the JVM or the .NET CLR instead of ye olde "language walled garden" saves you a lot of bootstrapping. If you're creating a compiled language, you may also find Java byte code or MSIL an easier compile target than machine code (of course, if you're in this for the fun of creating a tight optimising compiler then you'll see this as a bug rather than a feature).
On the negative side, the idioms of the JVM or CLR may not be what you want for your language. So you may still end up building "standard libraries" just to provide idiomatic interfaces over the platform facility. (An example is that every languages and its dog seems to provide its own method for writing to the console, rather than leaving users to manually call System.out.println or Console.WriteLine.) Nevertheless, it enables an incremental development of the idiomatic libraries, and means that the more obscure libraries for which you never get round to building idiomatic interfaces are still accessible even if in an ugly way.
If you're considering an interpreted language, .NET also has support for efficient interpretation via the Dynamic Language Runtime (DLR). (I don't know if there's an equivalent for the JVM.) This should help free you up to focus on the language design without having to worry so much about the optimisation of the interpreter.
我现在已经用 Haskell 为小型特定领域语言编写了两个编译器,并且发现这是一种非常高效的体验。 parsec 库使语法变得简单,解释器很容易在 Haskell 数据结构上编写。有一个关于用 Haskell 编写 Lisp 解释器的描述,我发现它很有帮助。
如果您对高性能后端感兴趣,我推荐 LLVM。它具有简洁优雅的字节码和您能找到的最好的 x86/amd64 生成后端。有一个可选的垃圾收集器,以及一些针对 JVM 和 CLR 的实验性后端。
您可以使用任何生成 LLVM 字节码的语言编写编译器。如果您有足够的冒险精神学习 Haskell 但想要 LLVM,这里有一组 Haskell-LLVM 绑定。
I've written two compilers now in Haskell for small domain-specific languages, and have found it to be an incredibly productive experience. The parsec library makes playing with syntax easy, and interpreters are very simple to write over a Haskell data structure. There is a description of writing a Lisp interpreter in Haskell that I found helpful.
If you are interested in a high-performance backend, I recommend LLVM. It has a concise and elegant byte-code and the best x86/amd64 generating backend you can find. There is an optional garbage collector, and some experimental backends that target the JVM and CLR.
You can write a compiler in any language that produces LLVM bytecode. If you are adventurous enough to learn Haskell but want LLVM, there are a set of Haskell-LLVM bindings.
发生了很大变化但尚未提及的是 IDE 支持和互操作性:
现在我们非常期待智能感知、“在编辑器窗口中”的逐步执行和状态检查,以及告诉调试器如何处理的新类型它们以及相当有用的诊断消息。旧的“compile .x -> .y”可执行文件不足以再创建语言。环境并不是首先要关注的,而是影响采用的意愿。
此外,库已经变得更加强大,没有人愿意用另一种语言来实现所有这些。尽量借用,让已有的代码方便调用,也方便被其他代码调用。
正如 itowlson 所建议的那样,以虚拟机为目标可能是一个很好的入门方法。如果这出现问题,它仍然可以被本机编译器替换。
What has changed considerably but hasn't been mentioned yet is IDE support and interoperability:
Nowadays we pretty much expect Intellisense, step-by-step execution and state inspection "right in the editor window", new types that tell the debugger how to treat them and rather helpful diagnostic messages. The old "compile .x -> .y" executable is not enough to create a language anymore. The environment is nothing to focus on first, but affects willingness to adopt.
Also, libraries have become much more powerful, noone wants to implement all that in yet another language. Try to borrow, make it easy to call existing code, and make it easy to be called by other code.
Targeting a VM - as itowlson suggested - is probably a good way to get started. If that turns out a problem, it can still be replaced by native compilers.
我很确定你会做一直以来都在做的事情。
编写一些代码,并向世界展示您的结果。
与以前相比,有一些工具可以让您的工作变得更轻松。我可以建议 ANTLR 来解析您的语言语法吗?
I'm pretty sure you do what's always been done.
Write some code, and show your results to the world.
As compared to the olden times, there are some tools to make your job easier though. Might I suggest ANTLR for parsing your language grammar?
作为刚刚构建了一个非常简单的程序集(如语言和解释器)的人,我会从 .NET 框架或类似框架开始。在尝试编写大多数内容时,没有什么可以比得上 C# 强大的语法 + 整个 .NET 社区的支持。从这里我设计了一个简单的字节码格式和汇编语法,并继续编写我的解释器+汇编器。
就像我说的,这是一种非常简单的语言。
Speaking as someone who just built a very simple assembly like language and interpreter, I'd start out with the .NET framework or similar. Nothing can beat the powerful syntax of C# + the backing of the entire .NET community when attempting to write most things. From here i designed a simple bytecode format and assembly syntax and proceeeded to write my interpreter + assembler.
Like i said, it was a very simple language.
您不应该接受像使用最新工具这样的软弱解决方案。您应该通过使用 Visual Basic for Applications 或类似语言编写最小编译器来引导该语言,然后使用新语言编写所有编译工具,然后仅使用该语言本身进行自编译。
另外,建议的语言名称是什么?
我想最近还没有像 COBOL 和 FORTRAN 这样的全大写字母名称的语言,所以我希望你能把它称为像 MIKELANG 这样的全大写字母的语言。
You should not accept wimpy solutions like using the latest tools. You should bootstrap the language by writing a minimal compiler in Visual Basic for Applications or a similar language, then write all the compilation tools in your new language and then self-compile it using only the language itself.
Also, what is the proposed name of the language?
I think recently there have not been languages with ALL CAPITAL LETTER names like COBOL and FORTRAN, so I hope you will call it something like MIKELANG with all capital letters.
与其说是实现,不如说是影响实现的设计决策 - 如果您使语言的每个语句都有一个没有上下文的唯一解析树,那么您将得到一些可以轻松手动编码解析器的东西,并且不需要提供语法突出显示的大量工作。同样简单的事情,比如对模块命名空间和对象命名空间使用不同的符号(与 Java 不同,Java 对包和类命名空间都使用
.
)意味着您可以解析代码,而无需加载它引用的每个模块。标准库 - 包括 C99 标准库中除 setjmp 之外的所有内容的等效项。添加您的域所需的任何其他内容。找出一种简单的方法来做到这一点,可以是 SWIG 之类的东西,也可以是内联 FFI,例如 Ruby 的 [无法记住模块名称] 和 Python 的 ctypes。
用该语言构建尽可能多的语言是一种选择,但是开始做的项目要么放弃(rubinius 移动使用 C++ 作为其标准库的一部分),或仅用于研究目的 (Mozilla Narcissus< /a>)
Not so much an implementation but a design decision which effects implementation - if you make every statement of your language have a unique parse tree without context, you'll get something that it's easy to hand-code a parser, and that doesn't require large amounts of work to provide syntax highlighting for. Similarly simple things like using a different symbol for module namespaces and object namespaces ( unlike Java which uses
.
for both package and class namespaces ) means you can parse the code without loading every module that it refers to.Standard libraries - include the equivalent of everything in C99 standard libraries other than setjmp. Add whatever else you need for your domain. Work out an easy way to do this, either something like SWIG or an in-line FFI such as Ruby's [can't remember module name] and Python's ctypes.
Building as much of the language in the language is an option, but projects which start out doing either give up (rubinius moved to using C++ for parts of its standard library), or is only for research purposes (Mozilla Narcissus)
其实我还是个孩子,哈哈。我以前从未编写过实际的编译器或设计过语言,但我已经完成了《红龙书》,所以我想我有一些想法(我希望)。
这首先取决于语法。如果是 LR 或 LALR,我想像 Bison/Flex 效果很好。如果它更 LL,我会使用 Spirit,它是 Boost 的一个组件。它允许您以类似 EBNF 的语法在 C++ 中编写该语言的语法,因此无需费力地使用代码生成器; C++ 编译器为您编译语法。如果其中任何一个失败,我会在纸上写一个 EBNF 语法,然后继续进行一些繁重的递归下降解析,这似乎有效;如果 C++ 可以使用 RDP 很好地解析(就像 GCC 那样),那么我想只要有足够的单元测试和耐心,您就可以使用 RDP 编写整个编译器。
一旦我运行了解析器和某种中间表示,它就取决于它的运行方式。如果它是一些字节码或本机代码编译器,我将使用 LLVM 或 libJIT 来处理它。 LLVM 更适合一般编译,但我更喜欢 libJIT API 和文档。或者,如果我真的很懒,我将生成 C 代码并让 GCC 进行实际编译。另一种选择是针对现有的 VM,例如 Parrot、JVM 或 CLR。 Parrot 是为 Perl 设计的 VM。如果它只是一个解释器,我会遍历语法树。
一个根本性的替代方案是使用 Prolog,它的语法功能可以显着模拟 EBNF。不过,我没有这方面的经验,如果我没有错的话(我几乎肯定会错),如果用于解析具有大量语法结构和怪癖的重型编程语言,Prolog 会非常慢(阅读:C++和 Perl)。
所有这些我都会用 C++ 来做,只是因为我比 C 更习惯用它编写。对于实际的生产代码,我会远离 Java/Python 或任何类似的东西(用 C/C++ 编写编译器有助于使其可移植),但我可以看到自己将它们用作原型语言,尤其是我偏爱的 Python。当然,我以前从未真正做过这些,所以我不敢说。
I am actually a kid, haha. I've never written an actual compiler before or designed a language, but I have finished The Red Dragon Book, so I suppose I have somewhat of an idea (I hope).
It would depend firstly on the grammar. If it's LR or LALR I suppose tools like Bison/Flex would work well. If it's more LL, I'd use Spirit, which is a component of Boost. It allows you to write the language's grammar in C++ in an EBNF-like syntax, so no muddling around with code generators; the C++ compiler compiles the grammar for you. If any of these fail, I'd write an EBNF grammar on paper, and then proceed to do some heavy recursive descent parsing, which seems to work; if C++ can be parsed pretty well using RDP (as GCC does it), then I suppose with enough unit tests and patience you could write entire compilers using RDP.
Once I have a parser running and some sort of intermediate representation, it then depends on how it runs. If it's some bytecode or native code compiler, I'll use LLVM or libJIT to process it. LLVM is more suited for general compilation, but I like the libJIT API and documentation better. Alternatively, if I'm really lazy, I'll generate C code and let GCC do the actual compilation. Another alternative, is to target an existing VM, like Parrot or the JVM or the CLR. Parrot is the VM being designed for Perl. If it's just an interpreter, I'll walk the syntax tree.
A radical alternative is to use Prolog, which has syntax features which remarkably simulate EBNF. I have no experience with it though, and if I am not wrong (which I am almost certainly going to be), Prolog would be quite slow if used to parse heavy duty programming languages with a lot of syntactical constructs and quirks (read: C++ and Perl).
All this I'll do in C++, if only because I am more used to writing in it than C. I'd stay away from Java/Python or anything of that sort for the actual production code (writing compilers in C/C++ help to make it portable), but I could see myself using them as a prototyping language, especially Python, which I am partial towards. Of course, I've never actually done any of this before, so I'm not one to say.
在 lambda-the-ultimate 上有一个指向 创建您自己的编程语言,作者:Marc-André Cournoyer,它似乎描述了如何利用一些现代工具来创建小语言。
On lambda-the-ultimate there's a link to Create Your Own Programming Language by Marc-André Cournoyer, which appears to describe how to leverage some modern tools for creating little languages.
只是一个提示:在设计一种新语言(即语言)之前,首先看看一些完全不同的语言具有非常不同的评估策略)。 Haskell 和
Just a hint: Look at some quite different languages first, before designing a new languge (i.e. languages with a very different evaluation strategy). Haskell and Oz come to mind. Though you should also know Prolog and Scheme. A year ago I also was like "hey, let's design a language that behaves exactly as I want", but fortunatly I looked at those other languages first (or you could also say unfortunatly, because now I don't know how I want a language to behave anymore...).
在开始创建语言之前,您应该阅读以下内容:
Hanspeter Moessenboeck,Niklaus Wirth 的艺术
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf
Before you start creating a language you should read this:
Hanspeter Moessenboeck, The Art of Niklaus Wirth
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf
实现一种语言有一个很大的捷径,我在这里的其他答案中没有看到。如果您使用 Lukasiewicz 的“无括号”形式之一(即正向波兰语或反向波兰语),您根本不需要解析器!通过反向抛光,依赖关系从右到左,因此您只需在扫描每个令牌时执行它即可。对于正向抛光,情况正好相反,因此您实际上“向后”执行程序,简化子表达式直到到达起始标记。
要理解其原理,您应该研究 3 种主要的树遍历算法:前序、中序、后序。这三个遍历是语言阅读器(即解析器)必须执行的解析任务的逆过程。只有有序符号“需要”递归体来重新构造表达式树。有了另外两个,你就可以只用一堆。
这可能需要更多的“思考”和更少的“实施”。
顺便说一句,如果您已经找到答案(这个问题已有一年了),您可以发布并接受它。
There's a big shortcut to implementing a language that I don't see in the other answers here. If you use one of Lukasiewicz's "unparenthesized" forms (ie. Forward Polish or Reverse Polish) you don't need a parser at all! With reverse polish, the dependencies go right-to-left so you simply execute each token as it's scanned. With forward polish, it's the reverse of that, so you actually execute the program "backwards", simplifying subexpressions until reaching the starting token.
To understand why this works, you should investigate the 3 primary tree-traversal algorithms: pre-order, in-order, post-order. These three traversals are the inverse of the parsing task that a language reader (i. parser) has to perform. Only the in-order notation "requires" a recursive decent to re-construct the expression tree. With the other two, you can get away with just a stack.
This may require more "thinking' and less "implementing".
BTW, if you've already found an answer (this question is a year old), you can post that and accept it.
真正的程序员仍然用 C 语言编写代码。只是它更清晰一些。
嗯...语言设计?或者写一个编译器?
如果你想编写一个编译器,你可以使用 Flex + Bison。 (谷歌)
Real coders still code in C. Just that it's a litte sharper.
Hmmm... language design? or writing a compiler?
If you want to write a compiler, you'd use Flex + Bison. (google)
这不是一个简单的答案,但是......
您本质上想要定义一组用文本(标记)编写的规则,然后定义一些解析器来检查这些规则并将它们组装成片段。
http://www.mactech.com/articles/mactech/Vol .16/16.07/UsingFlexandBison/
人们可能会在这方面花费数年时间,上面的文章讨论了使用两种工具(Flex 和 Bison),它们可用于将文本转换为可以提供给编译器的代码。
Not an easy answer, but..
You essentially want to define a set of rules written in text (tokens) and then some parser that checks these rules and assembles them into fragments.
http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/
People can spend years on this, The above article talks about using two tools (Flex and Bison) That can be used to turn text into code you can feed to a compiler.
首先,我花了一年左右的时间来实际思考该语言应该是什么样子。同时我帮助开发了 Ioke (www.ioke.org) 来学习语言的内部原理。
我选择 Objective-C 作为实现平台,因为它是快速(足够)、简单且丰富的语言。它还提供测试框架,因此敏捷方法是可行的。它还有一个丰富的标准库可供我构建。
由于我的语言在语法层面上很简单(没有关键字,只有文字、运算符和消息),我可以使用 Ragel ( http://www.complang.org/ragel/)用于构建扫描仪。它速度极快且易于使用。
现在我有一个工作对象模型、扫描仪和简单的操作符改组以及标准库引导代码。我什至可以运行一个简单的程序 - 只要它们适合一个文件即可:)
First I spent a year or so to actually think how the language should look like. At the same time I helped in developing Ioke (www.ioke.org) to learn language internals.
I have chosen Objective-C as implementation platform as it's fast (enough), simple and rich language. It also provides test framework so agile approach is a go. It also has a rich standard library I can build upon.
Since my language is simple on syntactic level (no keywords, only literals, operators and messages) I could go with Ragel (http://www.complang.org/ragel/) for building scanner. It's fast as hell and simple to use.
Now I have a working object model, scanner and simple operator shuffling plus standard library bootstrap code. I can even run a simple programs - as long as they fit in one file that is :)
当然,较旧的技术仍然很常见(例如使用 Flex 和 Bison),许多较新的语言实现通过使用基于解析表达式语法 (PEG) 的解析器将词法分析和解析阶段结合起来。这适用于使用组合器创建的递归下降解析器或记忆 Packrat 解析器。许多编译器也是使用 Antlr 框架构建的。
Of course older techniques are still common (e.g. using Flex and Bison) many newer language implementations combine the lexing and parsing phase, by using a parser based on a parsing expression grammar (PEG). This works for recursive descent parsers created using combinators, or memoizing Packrat parsers. Many compilers are built using the Antlr framework also.
使用 bison/flex,它是 yacc/lex 的 gnu 版本。这本书非常有帮助。
使用 bison 的原因是它可以捕获语言中的任何冲突。我使用了它,它让我的生活变得更轻松了很多年(好吧,我已经是第二年了,但前 6 个月是几年前用 C++ 编写的,解析/冲突/结果非常糟糕!:(。)
Use bison/flex which is the gnu version of yacc/lex. This book is extremely helpful.
The reason to use bison is it catches any conflicts in the language. I used it and it made my life many years easier (ok so i'm on my 2nd year but the first 6months was a few years ago writing it in C++ and the parsing/conflicts/results were terrible! :(.)
如果你想编写一个编译器,显然你需要阅读 Dragon Book ;)
这是我刚刚读过的另一本好书。比龙书实用,更容易理解:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation +模式&x=0&y=0
If you want to write a compiler obviously you need to read the Dragon Book ;)
Here is another good book that I have just read. It is practical and easier to understand than the Dragon Book:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0
Mike——
如果您对 Windows 的高效本机代码生成编译器感兴趣,这样您就可以了解自己的方向——而不必费力地浏览所有不必要的小部件、小工具和其他使当今机器变得混乱的废话——我推荐 Osmosian Order 的简单英语开发系统。它包括一个独特的界面、一个简化的文件管理器、一个友好的文本编辑器、一个方便的十六进制转储器、编译器/链接器(当然)以及一个用于文档的所见即所得页面布局应用程序。它完全用简单英语编写,可以快速下载(不到 1 MB),足够小,可以在短时间内理解(大约 25,000 行简单英语代码,编译器/链接器中只有 4,000 行),但功能强大,足以自我复制在不到三秒的时间内就可以在最底层的戴尔上完成操作。真的:三秒钟。所有编写并索取副本的人都可以免费使用它,其中包括源代码和一本相当幽默、半开玩笑的 100 页手册。有关如何获取副本的详细信息,请访问 www.osmosian.com,或者直接写信给我提出问题或意见:[电子邮件受保护]
Mike --
If you're interested in an efficient native-code-generating compiler for Windows so you can get your bearings -- without wading through all the unnecessary widgets, gadgets, and other nonsense that clutter today's machines -- I recommend the Osmosian Order's Plain English development system. It includes a unique interface, a simplified file manager, a friendly text editor, a handy hexadecimal dumper, the compiler/linker (of course), and a wysiwyg page-layout application for documentation. Written entirely in Plain English, it is a quick download (less than a megabyte), small enough to understand in short order (about 25,000 lines of Plain English code, with just 4,000 in the compiler/linker), yet powerful enough to reproduce itself on a bottom-of-the-line Dell in less than three seconds. Really: three seconds. And it's free to all who write and ask for a copy, including the source code and and a rather humorous tongue-in-cheek 100-page manual. See www.osmosian.com for details on how to get a copy, or write to me directly with questions or comments: [email protected]