如何创建 JVM 编程语言?
我已经为动态类型编程语言创建了一个 C 编译器(使用 Lex 和 Bison),该语言支持循环、函数内的函数声明、递归调用等。我还创建了一个运行编译器创建的中间代码的虚拟机。
我正在考虑将其编译为 Java 字节码而不是我自己的中间代码。
我看到关于创建 JVM 语言的问题已经被问到,但我认为答案不是很丰富。
所以这是我的问题:
- 我想为 JVM 创建一种语言必须阅读 JVM规范一本书,您还能推荐哪些其他书籍(当然除了龙书)?我最关心的是关于如何创建 JVM 语言的书籍或教程,而不是一般的编译器。
- 有许多 Java 库可以读取、写入和更改
.class
文件,例如 jclasslib,bcel,bcel gnu.org/software/kawa/api/gnu/bytecode/package-summary.html" rel="noreferrer">gnu 字节码 等。您会建议哪一个?另外,您是否知道执行相同工作的 C 库? - 我正在考虑看看另一种针对 JVM 的语言,例如 Clojure、Jython 或 JRuby。但所有这些语言都非常高级且复杂(为它们创建编译器)。我一直在寻找一种更简单(我不介意它是否未知或未使用)的编程语言,该语言针对 JVM 并带有开源编译器。有什么想法吗?
I have created a compiler in C (using Lex & Bison) for a dynamic typed programming language that supports loops, functions declarations inside functions, recursive calls, etc. I also created a virtual machine that runs the intermediate code created by the compiler.
I was thinking about compiling it to Java bytecode instead of my own intermediate code.
I saw that the question about creating a JVM language has already been asked, but I don’t find the answer very informative.
So here are my questions:
- I guess to create a language for JVM a must is to read the JVM specification book, what other books can you suggest (except Dragon Book of course)? I’m mostly concerned about books or tutorials on how to create a JVM language, not a compiler in general.
- There are many Java libraries to read, write and change
.class
files like jclasslib, bcel, gnu bytecode, etc. Which one would you suggest? Also, are you aware of C libraries that do the same job? - I was thinking about having a look at maybe another language that targets the JVM like Clojure, Jython or JRuby. But all these languages are very high level and complicated (to create a compiler for them). I was looking for a simpler (I don't mind if it's unknown or unused) programming language that targets the JVM and with an open-source compiler. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我也会推荐 ASM,但是看看 Jasmin,我使用了它(或者更确切地说,必须使用它)用于一个大学项目,并且效果很好。我使用 Java 和 Jasmin 为编程语言编写了一个词法分析器-解析器-分析器-优化器-生成器组合,因此它生成 JVM 代码。我在此处上传了代码;有趣的部分应该是源代码本身。在
bytecode/InsanelyFastByteCodeCreator.java
文件夹中,您可以找到一段代码,它将 AST Tree 转换为 Jasmin 汇编器的输入格式。这非常简单。源语言(由词法分析器分析器转换为 AST)是 Java 的一个子集,称为 MiniJava。它缺乏一些“复杂”的功能,如继承、构造函数、静态方法、私有字段和方法。这些功能都不难实现,但还有另一个任务是编写 x86 后端(以便生成机器汇编程序),如果没有 JVM 来处理其中一些事情,这些事情往往会变得困难。
如果您想知道这个奇怪的类名:大学项目的任务是将 AST 转换为 SSA 图(代表输入代码),优化图,然后将其转为Java字节码。这大约是项目工作的 3/4,而
InsanlyFastByteCodeCreator
只是测试所有内容的捷径。看看 Jon Meyer 和 Troy 写的“Java Virtual Machine”一书唐宁.本书大量引用了 Jasmin Assembler;它对于理解 JVM 内部结构非常有帮助。
I would also recommend ASM, but have a look at Jasmin, I used it (or, rather, had to use it) for a university project, and it worked quite well. I wrote a lexer-parser-analyzer-optimizer-generator combination for a programing language using Java and Jasmin, so it was generating JVM Code. I uploaded the code here; the interesting part should be the source code itself. In the folder
bytecode/InsanelyFastByteCodeCreator.java
, you find a piece of code which transforms an AST Tree into the input format of Jasmin assembler. It is quite straightforward.The source language (which was transformed to the AST by the lexer-parser-analyzer) is a subset of Java called MiniJava. It lacks some “complicated” features like inheritance, constructors, static methods, private fields and methods. None of those features are difficult to implement, but there was another task to write an x86 backend (so to generate machine assembler), and those things tend to get difficult if you got no JVM which handles some of those things.
In case you wonder about the strange class name: The task of the university project was to transform the AST into an SSA Graph (representing the input code), optimize the graph, and then turn it into Java bytecode. That was about ¾ of the work of the project and the
InsanlyFastByteCodeCreator
was just a short-cut to test everything.Have a look at the “Java Virtual Machine” book from Jon Meyer and Troy Downing. This book heavily references the Jasmin Assembler; it’s quite helpful for understanding the JVM internals.
上学期我参加了“编译器构建”课程。我们的项目正是您想做的。
我用来编写语言的语言是 Scala。它在 JVM 上运行,但支持许多 Java 不支持的高级功能(仍然与纯 java JVM 完全兼容)。
为了输出 java 字节码,我使用了 Scala CAFEBABE 库。文档齐全,您无需深入了解 java 类即可了解要做什么。
除了这本书之外,我认为您可以通过我们的实验室找到很多信息课程期间做过的。
Last semester I have attended a "Compiler Construction" course. Our project was exactly what you want to do.
The language I've used to write my language was Scala. It runs on a JVM but supports a lot of advanced features that Java doesn't (still fully compatible with a pure java JVM).
To output java bytecode I've used the Scala CAFEBABE library. Well documented and you don't have to go deep inside java classes to understand what to do.
Beside the book, I think you can find a lot of infos by going trough the labs we've done during the course.
ASM 可以作为生成字节码的解决方案。首先,请查看手册中有关生成元素的主题。
ASM can be a solution for generating bytecode. To start, check the topics on generating elements from the manual.
上周末,我问自己同样的问题,将我的玩具语言移植到 JVM。
我只花了几个小时搜索信息,所以对这些参考资料持保留态度。
语言实现模式。
我讨厌antlr,但这本书看起来很好。如果你也不喜欢antlr,有一本关于解析的非常好的“解析技术。实用指南”。
<块引用>
学习构建配置文件读取器、数据读取器、模型驱动的代码生成器、源到源翻译器、源分析器和解释器。您不需要计算机科学背景——ANTLR 创建者 Terence Parr 通过将语言实现分解为最常见的设计模式,揭开了语言实现的神秘面纱。您将逐步学习实现自己的计算机语言所需的关键技能。
第 10 章用 30 页内容涵盖了这个主题(在我看来是为了快速)。但还有其他章节您可能会感兴趣。
<块引用>
http://pragprog.com/titles/tpdsl/language-implementation-patterns
Lua 5.0 的实现这是一篇关于寄存器的很棒的论文-
基于字节码的机器。即使只是为了它而去阅读它。
Lisp in Small Pieces. 这本书教如何编写一个 2 个 schme 编译器来编译为 C。从这本书中可以学到很多东西。我拥有这本书的副本,对于任何对 lisp 感兴趣的人来说,这确实有好处,但也许不是你喜欢的。
<块引用>
这是对整个 Lisp 家族语言(即 Lisp、Scheme 和相关方言)的语义和实现的全面说明。它描述了 11 个解释器和 2 个编译器...
http://www.amazon.com/Lisp-Small-Pieces-Christian -Queinnec/dp/0521562473
检查 Dalvik7 VM,一个基于寄存器的 VM。 DVM 对由 Java 编译器编译的 Java 类文件转换而来的字节码进行操作。
有一个关于 jvm-languages 主题的邮件列表。
您打算将代码上传到任何地方吗?我想看一下。
Last weekend, I was asking myself the same question to port my toy language to the JVM.
I spend only few hours searching information,so take this references with a grain of salt.
Language Implementation Patterns.
I hate antlr but this book looks very good. If you dont like antlr neither, there is a very good about parsing "Parsing Techniques. A Practical Guide."
Chapter 10 cover in 30 pages (to fast IMO) this topics. But there are other chapter that probably you will be interested.
The Implementation of Lua 5.0 This is a great paper about register-
based bytecode machines. Go an read it even for the sake of it.
Lisp in Small Pieces. This book teach how to write a 2 schme compailers that compile to C. So many lessons can be learned from this book. I own a copy of this book and it is really good for anyone interesting is lisp, but maybe not your cup of tea.
http://www.amazon.com/Lisp-Small-Pieces-Christian-Queinnec/dp/0521562473
Check the Dalvik7 VM, a register-based VM. The DVM operates on bytecodes that are transformed from the Java Class files compiled by a Java compiler.
There is a mailing list about the topic, jvm-languages.
Are you planning to upload the code to anyplace? I would like to take a look.
建议:你可以看看Lua编程语言,里面有JVM实现就像 LuaJ 一样。
(不要与使用 JNI 方法的本机库的 LuaJava 混淆。)
Suggestion: You could have a look at Lua Programming Language, there are JVM implementations of it like LuaJ.
(Not to be confused with LuaJava that uses a native libs with JNI approach.)
如果您还不了解,我建议您首先了解 JVM 汇编的工作原理。
许多指令的形式为
?name
,其中,如果指令使用整数类型和a
,则?
为i
> 如果它适用于引用类型。基本上,JVM 是一个没有寄存器的堆栈机,因此所有指令都直接处理堆栈上的数据。您可以使用
?push/?pop
推送/弹出数据,并使用?store/?load
在局部变量(由偏移量引用的堆栈位置)和堆栈顶部之间移动数据>。其他一些重要的指令是invoke???
和if_???
。对于我大学的编译器课程,我们使用了Jasmin 来编译程序。我不知道这是否是最好的方法,但至少这是一个简单的起点。
这里是旧版本的说明参考 JVM,它包含的指令可能比新的要少。
I would recommend that you first learn how JVM assembly works, if you don't already know it.
Many instructions have the form
?name
, where?
isi
if the instruction works on with an integer type anda
if it works with a reference type.Basically, JVM is a stack machine with no registers, so all instructions work with data directly on the stack. You can push/pop data with
?push/?pop
and move data between local variables (stack locations referenced by offsets) and the top of the stack using?store/?load
. Some other important instructions areinvoke???
andif_???
.For my university's compiler course we used Jasmin to assemble the programs. I don't know if this is the best way, but at least it is an easy place to start.
Here is an instruction reference for an old version of the JVM, which might contain fewer instructions than a new one.
当然曾经可以使用Java编写一门新语言。使用 Java 反射 API,您可以实现很多目标。如果速度不太重要,我会优先考虑 Java 而不是 ASM。使用 Java 进行编程更加容易并且不易出错(恕我直言)。看一下 RPN 语言 7th。
它完全是用Java 编写的。
Of course once could use Java to write a new language. With the Java reflection-API You can achive a llot. If speed don't matters too much, I would give Java the preference instead of ASM. Programming is easier and less error-prone in Java (IMHO). Take a look at the RPN language 7th.
It is entirely written in Java.
首先,我会退一步,修改我的编译器以输出实际的 Java 而不是 Java 字节代码(这意味着创建更多的翻译器而不是编译器),并使用任何方便的 Java 环境编译 Java 输出(这可能会生成更好的目标代码)比我自己的编译器)。
您可以使用相同的技术(例如,编译为 C#)来生成 CLI 字节代码,或编译为 Pascal 来生成 P 代码等。
目前尚不清楚为什么您要考虑 Java 代码而不是使用自己的 VM,但如果这是为了性能,那么您当然还应该考虑编译为实际的机器代码。
First I'd back off, modify my compiler to output actual Java instead of Java byte codes (which means creating more of a translator than compiler), and compile the Java output with whatever Java environment is convenient (which would probably generate better object code than my own compiler).
You could use the same technique (eg, compile to C#) to generate CLI byte codes, or compile to Pascal to generate P-code, etc.
It's not clear why you're considering Java codes instead of using your own VM, but if it's for performance then of course you should also consider compiling to actual machine code.
这些天,我建议 Truffle< /a> 作为完美的起点。一旦你有了 AST,你就可以使用 Truffle 的工具和 Graal 进行编译。从 JDK9 开始,Graal 编译器可以直接从 JDK 本身使用。恕我直言,Truffle 的 API 非常友好,通过利用 Graal,您将朝着与 Java 本身相同的方向前进。
These days, I'd suggest Truffle as the perfect starting point. Once you have your AST, you can use Truffle's tooling and Graal for compilation. And since JDK9, Graal compiler can be used straight from the JDK itself. Truffle's API is IMHO as friendly as it gets, and by utilizing Graal you go in the same direction as Java itself.