为编程语言编写代码格式化工具

发布于 2024-11-13 10:36:24 字数 351 浏览 9 评论 0 原文

我正在研究为 Apex 语言(Java 的 Salesforce.com 变体)及其基于标记的标记语言 perhams VisualForce 编写代码格式化工具的可行性。

除了感觉/知道从头开始编写语言解析器可能不是最好的方法之外,我不知道从哪里开始。

我对 Antlr 是什么以及它的作用有相当的了解,但从概念上讲,我想象人们可以“训练”antlr 来理解 Apex 的语法。然后,我可以在数据结构(AST?)中获取代码的结构化版本,然后我可以生成正确格式的代码。

这是正确的概念吗? Antlr 是一个可以做到这一点的工具吗?有任何关于此的简要概要的链接吗?我希望在这项任务上投入几天,而不是几个月,而且我不确定它是否可以模糊地实现。

I'm looking into the feasibility of writing a code formatting tool for the Apex language, a Salesforce.com variation on Java, and perhams VisualForce, its tag based markup language.

I have no idea on where to start this, apart from feeling/knowing that writing a language parser from scratch is probably not the best approach.

I have a fairly thin grasp of what Antlr is and what it does, but conceptually, I'm imagining one could 'train' antlr to understand the syntax of Apex. I could then get a structured version of the code in a data structure (AST?) which I could then walk to produce correctly formatted code.

Is this the right concept? Is Antlr a tool to do that? Any links to a brief synopsis on this? I'm looking for investing a few days in this task, not months, and I'm not sure if its even vaguely achievable.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

天气好吗我好吗 2024-11-20 10:36:24

由于 Apex 语法与 Java 类似,因此我会查看 Eclipse 的 JDT。编辑 Java 语法以匹配 Apex。使用格式规则/选项执行相同的操作。这不仅仅是几天的工作。

Since Apex syntax is similar to Java, I'd look at Eclipse's JDT. Edit down the Java grammar to match Apex. Do the same w/ formatting rules/options. This is more than a few days of work.

最单纯的乌龟 2024-11-20 10:36:24

史蒂文·希罗德写道:

...我想象人们可以“训练”antlr 来理解 Apex 的语法。 ...

“'train'antlr”是什么意思?人工智能中的“训练”(训练神经网络)?如果是这样,那么你就错了。

史蒂文·希罗德写道:

...在数据结构(AST?)中获取代码的结构化版本,然后我可以生成正确格式的代码。

这是正确的概念吗? Antlr 是一个可以做到这一点的工具吗?

是的,或多或少。您编写一个语法来精确定义您想要解析的语言。然后使用 ANTLR 它将根据语法文件生成词法分析器(分词器)和解析器。您可以让解析器从输入源创建 AST,然后遍历 AST 并发出(自定义)输出/代码。

史蒂文·希罗德写道:

...我希望在这项任务上投入几天而不是几个月,而且我不确定它是否可以模糊地实现。

好吧,我当然不认识你,但我想说的是,为类似于 Java 的语言编写语法,然后在短短几天内通过遍历 AST 来发出输出是不可能的,对于刚接触 Java 的人来说更是如此ANTLR。我对ANTLR还算熟悉,但是没几天就搞定了。请注意,我只讨论“解析部分”,完成此操作后,您需要将其集成到某些文本编辑器中。这一切看起来更像是一个为期几个月的项目,甚至不是几周,更不用说几天了。

因此,简而言之,如果您只想编写自定义代码荧光笔,那么 ANTLR 并不是您的最佳选择。

你可以看看Xtext,它在底层使用了ANTLR。引用他们的网站:

使用 Xtext,您可以轻松创建自己的编程语言和特定领域语言 (DSL)。该框架支持语言基础设施的开发,包括编译器和解释器以及基于 Eclipse 的全面 IDE 集成。 ...

但我怀疑您能否在短短几天内就安装并运行 Eclipse 插件。

无论如何,祝你好运!

Steven Herod wrote:

... I'm imagining one could 'train' antlr to understand the syntax of Apex. ...

What do you mean by "'train' antlr"? "Train" as in artificial intelligence (training a neural-net)? If so, then you are mistaken.

Steven Herod wrote:

... get a structured version of the code in a data structure (AST?) which I could then walk to produce correctly formatted code.

Is this the right concept? Is Antlr a tool to do that?

Yes, more or less. You write a grammar that precisely defines the language you want to parse. Then you use ANTLR which will generate a lexer (tokenizer) and parser based on the grammar file. You can let the parser create an AST from your input source and then walk the AST and emit (custom) output/code.

Steven Herod wrote:

... I'm looking for investing a few days in this task, not months, and I'm not sure if its even vaguely achievable.

Well, I don't know you of course, but I'd say writing a grammar for a language similar to Java, and then emitting output by walking the AST within just a couple of days is impossible, even more so for someone new to ANTLR. I am fairly familiar with ANTLR, but I couldn't do it in just a few days. Note that I'm only talking about the "parsing-part", after you've done that, you'll need to integrate this in some text editor. This all looks to be more a project of several months, not even weeks, let alone several days.

So, in short, if all you want to do is write a custom code highlighter, ANTLR isn't your best choice.

You could have a look at Xtext which uses ANTLR under the hood. To quote their website:

With Xtext you can easily create your own programming languages and domain-specific languages (DSLs). The framework supports the development of language infrastructures including compilers and interpreters as well as full blown Eclipse-based IDE integration. ...

But I doubt you'll have an Eclipse plugin up and running within just a few days.

Anyway, best of luck!

与之呼应 2024-11-20 10:36:24

我们的 DMS 软件再工程工具包 旨在以扑克赌注的方式实现此目的进行任何类型的自动化软件重组项目所必需的。

DMS 允许定义一种语法,类似于 ANTLR(和其他解析器生成器)样式。与 ANTLR(和其他解析器生成器)不同,DMS 使用 GLR 解析器,这意味着您不必改变语言语法规则来满足解析器生成器的要求。如果您可以编写上下文无关语法,DMS 会将其转换为该语言的解析器。这意味着事实上您可以比使用典型的 LL 或 L(AL)R 解析器生成器更快地获得有效的、正确的语法。

与 ANTLR(和其他解析器生成器)不同,构建 AST 不需要额外的工作;它是自动构建的。这意味着您花费零时间编写树构建规则,并且无需调试它们。

DMS 还提供了一种漂亮的打印规范语言,指定文本框垂直、水平或缩进堆叠,您可以在其中定义用于将 AST 转换回完全合法的“格式”格式化的源文本。众所周知的解析器生成器都没有在这里提供任何帮助;如果你想漂亮地打印树,你需要做大量的自定义编码。有关这方面的更多详细信息,请参阅我对 将 AST 编译回源代码 的回答。这意味着您可以在(紧张的)下午通过简单地使用框布局指令注释语法规则来为您的语法构建一个漂亮的打印机。

DMS 的词法分析器非常小心地捕获注释和“词法格式”(该数字是八进制吗?该字符串有什么样的引号?转义字符?),以便可以正确地重新生成它们。解析到 AST,然后漂亮打印 AST 到文本,将任意丑陋的代码转换为遵循漂亮打印规则的格式化代码。 (这个往返是扑克赌注:如果您想更进一步,实际操作 AST,您仍然希望能够重新生成有效的源文本)。

我们最近为 EGL 构建了解析器/漂亮打印机。这从头到尾花了大约一个星期的时间。诚然,我们是我们工具的专家。

您可以从我们的网站下载使用 DMS 构建的多种不同格式化程序中的任何一个,以了解此类格式化的功能。

2012 年 7 月编辑:上周(5 天)使用 DMS,我们(我个人)从头开始构建了一个完全符合 IEC61131-3“结构化文本”(工业控制语言,类似 Pascal)的解析器和漂亮打印机。 (它处理标准文档中的所有示例)。

Our DMS Software Reengineering Toolkit is designed to do this as kind poker-pot ante necessary to do any kind of automated software reengineering project.

DMS allows one to define a grammar, similar to ANTLR's (and other parser generator) styles. Unlike ANTLR (and other parser generators), DMS uses a GLR parser, which means you don't have to bend the language grammar rules to meet the requirements of the parser generator. If you can write an context-free grammar, DMS will convert that into a parser for that language. This means in fact you can get a working, correct grammar up considerably faster than with typical LL or L(AL)R parser generators.

Unlike ANTLR (and other parser generators), there is no additional work to build the AST; it is automatically constructed. This means you spend zero time write tree-building rules and none debugging them.

DMS additionally provides a pretty-printing specification language, specifying text boxes stack vertically, horizontally, or indented, in which you can define the "format" that is used to convert the AST back into completely legal, nicely formatted source text. None of the well known parser generators provide any help here; if you want to prettyprint the tree, you get to do a great deal of custom coding. For more details on this, see my SO answer to Compiling an AST back to source. What this means is you can build a prettyprinter for your grammar in an (intense) afternoon by simply annotating the grammar rules with box layout directives.

DMS's lexer is very careful to capture comments and "lexical formats" (was that number octal? What kind of quotes did that string have? Escaped characters?) so that they can be regenerated correctly. Parse-to-AST and then prettyprint-AST-to-text round trips arbitrarily ugly code into formatted code following the prettyprinting rules. (This round trip is the poker ante: if you want go further, to actually manipulate the AST, you still want to be able to regenerate valid source text).

We recently built parser/prettyprinters for EGL. This took about a week end to end. Granted, we are expert at our tools.

You can download any of a number of different formatters built using DMS from our web site, to see what such formatting can do.

EDIT July 2012: Last week (5 days) using DMS, from scratch we (I personally) built a fully compliant IEC61131-3 "Structured Text" (industrial control language, Pascal-like) parser and prettyprinter. (It handles all the examples from the standards documents).

寂寞笑我太脆弱 2024-11-20 10:36:24

对语言进行逆向工程以获得解析器是很困难的。非常难!即使它非常接近 Java。

但为什么要重新发明轮子呢?

作为 GitHub 上 Force.com IDE 的一部分,有一个出色的 Apex 解析器实现。它只是一个没有源代码的 jar,但您可以将其用于任何您想要的用途。 其背后的开发人员非常支持和乐于助人

我们目前正在构建 /snapshot/pmd-java/rules/index.html" rel="nofollow">著名的 Java 静态代码分析器 PMD 在这里。我们使用 Salesforce.com 内部解析器。它就像一个魅力。

嘿,这是一个开源项目,我们需要任何类型的贡献者;-)

Reverse engineering a language to get a parser is hard. Very hard! Even if it's very close to Java.

But why reinvent the wheel?

There is a wonderful Apex parser implementation as part of the Force.com IDE on GitHub. It's just a jar without source code but you can use it for whatever you want. And the developers behind it are really supportive and helpful.

We are currently building an Apex module of the famous Java static code analyzer PMD here. And we use Salesforce.com internal parser. It works like a charm.

And hey, it's an open source project and we need contributers of any kind ;-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文