以可解析格式序列化 AST
我有一个带有 Java 前端的 DSL,我想以某种易于解析的格式序列化前端部分中获得的 AST,以便更轻松地编写生成不同编程语言代码的后端部分。对于这个目的,还有比 XML 更好的方法吗?
I have a DSL with Java front-end and I would like to serialize an AST that I get in front-end part in some easily parsable format to make it easier to write a back-end part that generates a code in different programming languages. Is there is anything better than XML for this purpose?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
XML 会生成大量文本,而 AST 可能会非常大。 (我构建解析器并且)我们的解析器将生成 XML,因为我们的客户需要它......但没有人真正使用它。恕我直言,最好设计一种自定义格式来对树进行密集编码,以避免读取和写入 AST 所需的时间。例如,您可能会满足:
其中 ( ) 是括号,用 ( 打开树节点,节点类型是表示节点类型的整数 [甚至可能采用高基数格式以最小化字符数],如果存在 =,则存在节点带有一个值[如果你考虑一下,你实际上并不需要 = 符号],并且该值是在结束之前内联的);如果它们存在,它们显然以左括号开头。无需空格!如果您有时确实需要阅读此内容,您可以构建一个简单的愚蠢工具来在需要查看括号时缩进括号。
[老朋友会认为这是 LISP S 表达式,自 20 世纪 50 年代末以来,它一直作为树编码存在]。
(我们几乎完全避免了这种情况,因为无论你怎么做,读取和写入树都是昂贵的,而且通常更容易、更有效地完成处理内存中的树并吐出最终答案)。
XML produces lots of text, and ASTs can be phenomenally big. (I build parsers and) our parsers will produce XML because our customers demanded it... but none of them actually use it. IMHO, it is better to design a custom format that encodes your tree densely, to avoid the time it takes to read and write the ASTs. For instance, you might settle for:
where ( ) are parentheses, with ( opening the tree node, the nodetype is an integer representing the node type [maybe even in a high-radix format to minimize character count], = is present if the node carries a value [you don't really need the = sign if you think about it], and the value is provided. The childnodes are inlined before the closing ); if they are present, they obviously start with a left paren. No spaces needed! If you actually need to read this sometimes, you can build a simple dumb tool to indent the parentheses when you need to see it.
[Old timers will recognize this as LISP S-expressions, and this has been around since the late 1950s as a tree encoding].
(We avoid this altogether pretty much because reading and writing trees is expensive no matter how you do it, and its often just easier and more efficient to finish processing the tree you have in memory and just spit out the final answer).