ANTLRWorks 1.4.3 无法正确读取扩展 ASCII 字符

发布于 2024-12-19 04:40:57 字数 547 浏览 5 评论 0 原文

我正在开发一个相当标准的编译器项目，我选择 ANTLR 作为解析器生成器。在将现有语法从 v2 更新到 v3 时，我注意到 ANTLRWorks（ANTLR 的官方 IDE）无法正确显示文件中的任何扩展 ASCII 字符。即使使用 Notepad++ 将文件从 ASCII 转换为 UTF8 后，它仍然将这些字符显示为正方形。在 Notepad++ 中它们显示得很好。

由于这个故障意味着 ANTLRWorks 在我保存文件时会损坏该文件，因此我无法再将其用作编辑器，这相当烦人。这里还有其他人遇到过这个问题并且可能已经解决了吗？多谢。

[编辑]：最新版本的 ANTLRWorks（昨天下载的）和我从 http://www.antlr.org/grammar/1086696923011/vhdlams/index.html

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月下凄凉 2024-12-26 04:40:57

我无法使用 ANTLRWorks 1.4.3 重现这一点。

如果我创建一个虚拟语法：

grammar T;
parse : . ;
Any   : . ;

并将完整的扩展 ASCII 集粘贴到多行注释中：

grammar T;

/*
€

‚
ƒ

...

ÿ
*/

parse : . ;
Any   : . ;

那就没有问题了。如果我使用 ANTLRWorks 复制字符，或者使用普通编辑器复制字符，然后使用 ANTLRWorks 编辑现有语法，这并不重要：保存在 ANTLRWorks 中后，字符全部保持不变。

相关说明：ANTLR 3.0 到 3.3 版本仍然与 ANTLR 2.7 类存在一些依赖性，这可能会导致 org.antlr.Tool 遇到 ASCII 集之外的某些字符。在这种情况下，请使用 ANTLR 3.4，它不再具有这些旧的依赖项。

编辑

我怀疑原始语法中的某个地方有一些奇怪的字节导致了所有的混乱。我很快只复制了原始语法中的规则，将所有 v2.7 语法更改为 v3 语法（将双引号文字更改为单引号文字，protected 变为 fragment 并注释了一些自定义代码）并将其保存在新文件中。该文件可以由 ANTLRWorks 或纯文本编辑器打开（并保存），而不会导致其损坏扩展 ASCII 字符。

这是该语法的 ANTLR v3 版本： http://pastebin.com/zU4xcvXt （语法太大发布在 SO...)

编辑 II

语法名称除了给它一个标签之外还有其他用处吗？

不，不是。正如您所提到的，它仅用于为解析器或词法分析器提供名称。

ANTLR中有4种语法：

组合语法，类似于grammar T;，生成TLexer.java和TParser.java源文件;
解析器语法，类似于解析器语法TP;，生成TP.java源文件；
lexer 语法，类似于 lexer 语法 TL;，生成 TL.java 源文件；
树语法，类似于树语法TWalker，生成TWalker.java源文件。

I cannot reproduce this with ANTLRWorks 1.4.3.

If I create a dummy grammar:

grammar T;
parse : . ;
Any   : . ;

and paste the complete extended ASCII set in a multi-line comment:

grammar T;

/*
€

‚
ƒ

...

ÿ
*/

parse : . ;
Any   : . ;

there's no problem. It doesn't matter if I copy the chars with ANTLRWorks, or with a normal editor and then edit the existing grammar with ANTLRWorks: the characters all stay the same after saving inside ANTLRWorks.

On a related note: the versions ANTLR 3.0 to 3.3 still have some dependencies with ANTLR 2.7 classes which might cause the org.antlr.Tool to trip over certain characters outside the ASCII set. Use ANTLR 3.4 in that case, which doesn't have these old dependencies anymore.

EDIT

I suspect there's some odd byte in the original grammar somewhere that is causing all the mayhem. I quickly copied only the rules from the original grammar, changed all v2.7 syntax to v3 syntax (changed double quoted literals to single quoted ones, protected became fragment and commented some custom code) and saved it in a new file. This file could be opened (and saved) by ANTLRWorks or a plain text editor without causing it to mangle the extended ASCII chars.

Here is the ANTLR v3 version of said grammar: http://pastebin.com/zU4xcvXt (the grammar is too big to post on SO...)

EDIT II

Is the grammar name useful for anything beyond just giving it a label?

No, it's not. It's, as you mentioned, only used to give a parser or lexer a name.

There are 4 types of grammars in ANTLR:

combined grammar, which looks like grammar T;, generating TLexer.java and TParser.java source files;
parser grammar, looking like parser grammar TP;, generating a TP.java source file;
lexer grammar, looking like lexer grammar TL;, generating a TL.java source file;
tree grammar, looking like tree grammar TWalker, generating a TWalker.java source file.