Xtext 可以用于解析通用编程语言吗？

发布于 2024-10-31 17:38:12 字数 1316 浏览 12 评论 0原文

我目前正在开发一种基于代理的通用编程语言（其语法在某种程度上受到Java的启发，并且我们也在这种语言中使用对象）。

自项目开始以来，我们对使用 ANTLR 或 Xtext。当时我们发现 Xtext 正在实现 ANTLR 功能的子集。因此，我们决定对我们的语言使用 ANLTR，从而失去了为我们的语言免费提供成熟的 Eclipse 编辑器的可能性（Xtext 提供了如此好的功能）。

然而，据我所知，今年夏天 Xtext 项目已经向前迈出了一大步。引用自链接：

Xtext 有哪些限制？
Sven：您可以实现几乎任何类型的编程语言或 DSL 与Xtext。有一个例外，那就是如果你需要使用so 称为“语义谓词”，这是一个相当复杂的事情认为不值得在这里解释。语言真的很少需要这个概念。然而，最突出的例子是 C/C++。我们想要研究下一个版本的该主题。

Xtext 文档也强化了这一点：

什么是 Xtext？ 无论您是想创建小型文本领域特定语言 (DSL) 还是想实现成熟的通用编程语言。使用Xtext，您可以创建您的瞬间就拥有了自己的语言。另外，如果您已经有一个现有的语言，但缺乏像样的工具支持，您可以使用 Xtext 来创建一个复杂的基于 Eclipse 的开发环境，提供现代 Java IDE 中的编辑体验非常短时间量。我们称 Xtext 为语言开发框架。

如果 Xtext 已经摆脱了过去的限制，为什么仍然无法为最著名的编程语言（Java、C# 等）找到复杂的 Xtext 语法？

在 ANTLR 网站上，您可以找到大量此类语法示例，对于 Xtext 而言，我能找到的唯一示例是文档中报告的示例。那么，也许 Xtext 还不够成熟，无法用于实现通用编程语言？我对此有点担心......我不会开始在 Xtext 中重写语法，然后认识到它不适合这样做。

原文

I'm currently developing a general-purpose agent-based programming language (its syntaxt will be somewhat inspired by Java, and we are also using object in this language).

Since the beginning of the project we were doubtful about the fact of using ANTLR or Xtext. At that time we found out that Xtext was implementing a subset of the feature of ANTLR. So we decided to use ANLTR for our language losing the possibility to have a full-fledged Eclipse editor for free for our language (such a nice features provided by Xtext).

However, as the best of my knowledge, this summer the Xtext project has done a big step forward. Quoting from the link:

What are the limitations of Xtext?
Sven: You can implement almost any kind of programming language or DSL
with Xtext. There is one exception, that is if you need to use so
called 'Semantic Predicates' which is a rather complicated thing I
don't think is worth being explained here. Very few languages really
need this concept. However the prominent example is C/C++. We want to
look into that topic for the next release.

And that is also reinforced in the Xtext documentation:

What is Xtext? No matter if you want to create a small textual domain-specific language (DSL) or you want to implement a full-blown
general purpose programming language. With Xtext you can create your
very own languages in a snap. Also if you already have an existing
language but it lacks decent tool support, you can use Xtext to create
a sophisticated Eclipse-based development environment providing
editing experience known from modern Java IDEs in a surprisingly short
amount of time. We call Xtext a language development framework.

If Xtext has got rid of its past limitations why is it still not possible to find a complex Xtext grammar for the best known programming languages (Java, C#, etc.)?

On the ANTLR website you can find tons of such grammar examples, for what concerns Xtext instead the only sample I was able to find is the one reported in the documentation. So maybe Xtext is still not mature to be used for implementing a general purpose programming language? I'm a bit worried about this... I would not start to re-write the grammar in Xtext for then to recognize that it was not suited for that.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凡尘雨 2024-11-07 17:38:12

我认为没有人实现 Java 或 C++，因为它需要大量工作（即使使用 Xtext），而且现有的工具和编译器非常出色。
但是，您可以查看 Xbase 和 Xtend，这是我们随 Xtext 提供的表达语言。它是使用 Xtext 构建的，并且很好地证明了您可以使用 Xtext 构建什么。我们在大约 4 个人月内完成了这项工作。

我在 Xtend 上做了一些截屏视频：

http:// blog.efftinge.de/2011/03/xtend-screencast-part-1-basics.html
http://blog.efftinge.de/2011/03 /xtend-screencast-part-2-switch.html
http://blog.efftinge.de /2011/03/xtend-screencast-part-3-rich-strings-ie.html

请注意，您可以简单地将 Xbase 表达式嵌入到您的语言中。

回复收藏 0 原文

罪歌 2024-11-07 17:38:12

我无法评价 Xtext 是什么或做得很好。

根据我们使用 DMS Software Reengineering Toolkit，我们将其想象为一个语言操作框架。

首先，由于这些语言的历史演变方式，实际语言的解析通常会涉及词法分析和/或解析中的一些混乱内容。 Java 非常干净。 C# 具有上下文相关的关键字和类似于 C 的基本预处理器。 C 有一个完整的预处理器。由于语法的歧义和模板语法的恶作剧，C++ 是出了名的“难以解析”。 COBOL 相当丑陋，没有任何参考语法，并且有多种方言。 PHP 的定义太差了，如果你仔细看它，你会觉得它像石头一样。（DMS 拥有所有这些的解析器，在实际应用程序中使用）。

然而，如果您足够努力，您可以使用大多数可用的解析技术来解析所有这些内容，通常是通过滥用词法分析器或解析器来实现您的目标（GNU 人员如何滥用 Bison 通过将词法分析与符号表查找结合在一起来解析 C++）这是一个很好的丑陋的例子）。但是要获得正确的语言细节需要付出很大的努力，并且参考手册只是编译器真正接受的事实的近似值。

如果 Xtext 有一个不错的解析引擎，那么人们很可能可以使用 Xtext 来做到这一点。简单浏览一下 Xtext 站点，听起来词法分析器和解析器相当不错。我没有看到任何关于“语义谓词”的内容；我们在 DMS 中有它们，它们是解析中一些真正黑暗角落的救星。即使使用非常好的解析技术（我们使用 GLR 解析器），如果没有它们，也很难解析 COBOL 数据声明（在解析过程中提取它们的嵌套结构）。

您有一个有趣的问题，因为您的语言尚未明确定义。这将使您最初的解析器有些混乱，并且您将对其进行大量修改。这就是强大的解析技术可以帮助您的地方：如果您可以轻松修改语法，您就可以专注于您想要的语言，而不是专注于与词法分析器和解析器作斗争。事实上，您可以更改语言定义，这意味着如果 Xtext 有一些限制，您可能可以调整语言语法以进行匹配，而无需付出巨大的痛苦。 ANTLR 确实具有经过验证的能力，可以像您想象的那样解析语言，以通常的解析器黑客攻击量为模。

从未讨论过真正处理语言还需要什么。您需要做的第一件事是构建 AST，ANTLR 和 YACC 将帮助您完成此操作；我想 Xtext 也是如此。您还需要符号表、控制和数据流分析（本地和全局）以及将您的语言转换为其他语言（可能更可执行）的机制。只做符号表你会发现出奇的困难； C++有几百页的“如何查找标识符”； Java 泛型的正确性比您想象的要困难得多。如果您想提供重构，您可能还想将 AST 漂亮地打印回源代码。（编辑：这里 ANTLR 和 Xtext 都提供了相当于文本模板驱动的代码生成）。

然而，这些都是复杂的机制，需要花费与构建解析器一样多的时间，甚至更多的时间。 DMS 存在的原因并不是因为它可以解析（我们将其视为扑克游戏中的赌注），而是因为所有其他事情都非常困难，我们希望分摊完成这一切的成本（DMS 具有，我们认为，对所有这些机制都提供了极好的支持，但 YMMV）。

在阅读 Xtext 概述时，听起来他们对符号表有一些支持，但不清楚其背后是什么样的假设（例如，对于 C++，你必须支持多重继承和命名空间）。

如果您已经开始走上 ANTLR 道路并正在运行一些东西，我会很想坚持到底；我怀疑 Xtext 是否会为您提供很多额外的帮助。如果您真的想要 Xtext 的编辑器，那么您可能可以以重构您所拥有的语法为代价进行切换（这是更改解析范式时要付出的非常典型的代价）。期望您的大部分工作会在您以特殊的方式获得正确的解析器后出现。我怀疑您会发现 Xtext 或 ANTLR 在这里有很大不同。

I can't speak for what Xtext is or does well.

I can speak to the problem of developing robust tools for processing real languages, based on our experience with the DMS Software Reengineering Toolkit, which we imagine is a language manipulation framework.

First, parsing of real languages usually involves something messy in lexing and/or parsing, due to the historical ways these languages have evolved. Java is pretty clean. C# has context-dependent keywords and a rudimentary preprocessor sort of like C's. C has a full blown preprocessor. C++ is famously "hard to parse" due to ambiguities in the grammar and shenanigans with template syntax. COBOL is fairly ugly, doesn't have any reference grammars, and comes in a variety of dialects. PHP will turn you to stone if you look at it because it is so poorly defined. (DMS has parsers for all of these, used in anger on real applications).

Yet you can parse all of these with most of the available parsing technologies if you try hard enough, usually by abusing the lexer or the parser to achieve your goals (how the GNU guys abused Bison to parse C++ by tangling lexical analysis with symbol table lookup is a nice ugly case in point). But it takes a lot of effort to get the language details right, and the reference manuals are only close approximations of the truth with respect to what the compilers really accept.

If Xtext has a decent parsing engine, one can likely do this with Xtext. A brief perusal of the Xtext site sounds like the lexers and parsers are fairly decent. I didn't see anything about the "Semantic Predicate"s; we have them in DMS and they are lifesavers in some of the really dark corners of parsing. Even using the really good parsing technology (we use GLR parsers), it would be very hard to parse COBOL data declarations (extracting their nesting structure during the parse) without them.

You have an interesting problem in that your language isn't well defined yet. That will make your initial parsers somewhat messy, and you'll revise them a lot. Here's where strong parsing technology helps you: if you can revise your grammar easily you can focus on what you want your language to look like, rather than focusing on fighting the lexer and parser. The fact that you can change your language definition means in fact that if Xtext has some limitations, you can probably bend your language syntax to match without huge amounts of pain. ANTLR does have the proven ability to parse a language pretty much as you imagine it, modulo the usual amount of parser hacking.

What is never discussed is what else is needed to process a language for real. The first thing you need to be able to do is to construct ASTs, which ANTLR and YACC will help you do; I presume Xtext does also. You also need symbol tables, control and data flow analysis (both local and global), and machinery to transform your language into something else (presumably more executable). Doing just symbol tables you will find surprisingly hard; C++ has several hundred pages of "how to look up an identifier"; Java generics are a lot tougher to get right than you might expect. You might also want to prettyprint the AST back to source code, if you want to offer refactorings. (EDIT: Here both ANTLR and Xtext offer what amounts to text-template driven code generation).

Yet these are complex mechanisms that take as much time, if not more than building the parser. The reason DMS exists isn't because it can parse (we view this just as the ante in a poker game), but because all of this other stuff is very hard and we wanted to amortize the cost of doing it all (DMS has, we think, excellent support for all of these mechanisms but YMMV).

On reading the Xtext overview, it sounds like they have some support for symbol tables but it is unclear what kind of assumption is behind it (e.g., for C++ you have to support multiple inheritance and namespaces).

If you are already started down the ANTLR road and have something running, I'd be tempted to stay the course; I doubt if Xtext will offer you a lot of additional help. If you really really want Xtext's editor, then you can probably switch at the price of restructuring what grammar you have (this is a pretty typical price to pay when changing parsing paradigms). Expect most of your work to appear after you get the parser right, in an ad hoc way. I doubt you will find Xtext or ANTLR much different here.

回复收藏 0 原文

帝王念 2024-11-07 17:38:12

我想你的问题最简单的答案是：许多通用语言可以使用 Xtext 来实现。但是，由于通用语言需要哪些解析器功能没有通用答案，因此您的问题也没有通用答案。

不过，我有一些提示：

在 Xtext 2.0（今年夏天发布）中，Xtext 支持句法谓词。这是在不启用 antlr 回溯的情况下处理不明确语法的最需要的功能之一。
您可能想看看全新的语言 Xbase 和 Xtend，它们（根据其功能判断）是通用的，并且是使用 Xtext 开发的。 Sven 在他的博客中有一些不错的屏幕截图：http://blog.efftinge.de/

关于您的问题为什么我们看不到 Java、C++ 等的 Xtext 语法：
对于 Xtext，语言不仅仅是一种语法，因此仅仅拥有描述语言语法的语法是一个很好的起点，但通常不是一个具有足够交付价值的工件。原因是，使用 Xtext 语法，您还可以定义 AST 的结构（抽象语法树，实际上是一个 Ecore 模型），包括真正的交叉引用。由于这个模型是你的语言的主要内部 API，人们通常会花很多心思来设计它。此外，要解决交叉引用（也称为链接），您需要实现范围界定（在 Xtext 中称为）。如果没有正确实现范围界定，您的模型中要么没有真正的交叉引用，要么会出现许多衬里错误。

我猜我的观点是，创建语法 + 设计 AST 模型 + 实现范围界定比从某些语言动物园中获取语法并将其翻译为 Xtext 的语法只需多一点努力。

回复收藏 0 原文

~没有更多了~