意外的Antlr4解析器错误

发布于 2025-01-26 15:10:03 字数 1729 浏览 5 评论 0 原文

我在Antlr4中发现了一种奇怪的行为(我尝试了版本4.10和4.10.1,结果相同)。

当我尝试

grammar Paths;

cfg: NL? (entry (NL | EOF))* EOF;

entry: path ':' value;

path: SEGMENT ('.' SEGMENT)*;

value: USTRING;

SEGMENT: [a-zA-Z0-9]+;

USTRING: [a-zA-Z0-9]+;

NL: [\n\r]+;

WS: [ \t]+ -> skip;

在字符串上使用语法“ key1:value1 \ nkey2.sub:value2 \ nkey3.sub1.sub2:value3” ,我会看到错误消息:

line 1:5 mismatched input 'value1' expecting {':', '.'}
line 2:9 mismatched input 'value2' expecting {':', '.'}
line 3:15 mismatched input 'value3' expecting {':', '.'}

如果我替换 value /code>带有值的定义:segment ,一切都按预期工作。

第一个定义有什么问题?

在两种情况下,树的输出都是相同的:

(cfg (entry (path key1) : (value value1)) \n (entry (path key2 . sub) : (value value2)) \n (entry (path key3 . sub1 . sub2) : (value value3)) <EOF> <EOF>)

我试图简化语法:

grammar Paths;

cfg: NL? (entry (NL | EOF))* EOF;

entry: path ':' value;

path: SEGMENT;

value: USTRING;

SEGMENT: [a-zA-Z0-9]+;

USTRING: [a-zA-Z0-9]+;

NL: [\n\r]+;

WS: [ \t]+ -> skip;

在这种情况下,我有错误(解析的字符串为 “ key1:value1:value1 \ nkey2:value2 \ nkey3:value3:value 3:value3” ) :

line 1:5 mismatched input 'value1' expecting USTRING
line 2:5 mismatched input 'value2' expecting USTRING
line 3:5 mismatched input 'value3' expecting USTRING

如果我在 value 定义中替换为 segment ,那么一切都很好。 输出是

(cfg (entry (path key1) : (value value1)) \n (entry (path key2) : (value value2)) \n (entry (path key3) : (value value3)) <EOF> <EOF>)

I found an odd behavior in Antlr4 (I tried versions 4.10 and 4.10.1 with the same result).

When I try to use the grammar

grammar Paths;

cfg: NL? (entry (NL | EOF))* EOF;

entry: path ':' value;

path: SEGMENT ('.' SEGMENT)*;

value: USTRING;

SEGMENT: [a-zA-Z0-9]+;

USTRING: [a-zA-Z0-9]+;

NL: [\n\r]+;

WS: [ \t]+ -> skip;

on the string "key1:value1\nkey2.sub:value2\nkey3.sub1.sub2:value3", I see error messages:

line 1:5 mismatched input 'value1' expecting {':', '.'}
line 2:9 mismatched input 'value2' expecting {':', '.'}
line 3:15 mismatched input 'value3' expecting {':', '.'}

If I replace value definition with value: SEGMENT, everything works as expected.

What is wrong in the first definition?

The output of tree in both cases is the same:

(cfg (entry (path key1) : (value value1)) \n (entry (path key2 . sub) : (value value2)) \n (entry (path key3 . sub1 . sub2) : (value value3)) <EOF> <EOF>)

I tried to simplify the grammar:

grammar Paths;

cfg: NL? (entry (NL | EOF))* EOF;

entry: path ':' value;

path: SEGMENT;

value: USTRING;

SEGMENT: [a-zA-Z0-9]+;

USTRING: [a-zA-Z0-9]+;

NL: [\n\r]+;

WS: [ \t]+ -> skip;

In this case I have errors (the parsed string is "key1:value1\nkey2:value2\nkey3:value3"):

line 1:5 mismatched input 'value1' expecting USTRING
line 2:5 mismatched input 'value2' expecting USTRING
line 3:5 mismatched input 'value3' expecting USTRING

And again everything is fine if I replace USTRING with SEGMENT in the value definition.
Output is

(cfg (entry (path key1) : (value value1)) \n (entry (path key2) : (value value2)) \n (entry (path key3) : (value value3)) <EOF> <EOF>)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

谈场末日恋爱 2025-02-02 15:10:03

这是因为Antlr的Lexer处理字符的输入流以产生令牌流,而解析器则处理令牌流。 ANTLR中的递归下降正在处理令牌流,对Lexer的观点输入没有影响。

结果,Lexer规则 USTRING 相同,因此,这两个规则都将与输入字符完全相同。发生这种情况时,ANTLR将与第一个规则匹配,因此它们都将是“段令牌”。

如果您已经通过标准设置运行(并创建 grun 别名,则可以使用`tokens选项运行它来获取令牌流的转储。这通常是验证该验证的好主意您正在创建您期望的令牌流。

That’s because ANTLR’s Lexer processes the input stream of characters to produce a token stream, and the parser processes the token stream. The recursive descent parsing in ANTLR is processing the token stream and has no impact on how the Lexer views input.

The Lexer rules SEGMENT and USTRING are identical, as a result, both rules will match exactly the same run of input characters. When that happens, ANTLR will match the first rule, so they’ll all be `SEGMENTS tokens.

If you’ve run through the standard setup (and created the grun alias you can run it with the `-tokens option to get a dump of your token stream. This is generally a good idea for validating that you Lexer rules are creating the token stream you expect.

芸娘子的小脾气 2025-02-02 15:10:03

之所以发生,是因为ANTLR分配了值类型''segment''。 Lexer忽略语法,如果有可以分配给不同类型的令牌,则将其分配给单个随机类型。

这件代码对我有所帮助:

val lexer = PathsLexer(CharStreams.fromString("key1:value1\nkey2.sub:value2\nkey3.sub1.sub2:value3"))
val tokens = CommonTokenStream(lexer)
tokens.fill()
println(tokens.getTokens.asScala.map { token =>
  s"$token: ${token.getType} -> ${lexer.getVocabulary.getDisplayName(token.getType)}"
}.mkString("\n"))

可能我需要了解有关ANTLR模式的更多信息。

It happens because Antlr assigns to values types '''SEGMENT'''. Lexer ignores grammar and if there are tokens that could be assigned to different types, lexer assigns them to a single random type.

This piece of code helped me:

val lexer = PathsLexer(CharStreams.fromString("key1:value1\nkey2.sub:value2\nkey3.sub1.sub2:value3"))
val tokens = CommonTokenStream(lexer)
tokens.fill()
println(tokens.getTokens.asScala.map { token =>
  s"$token: ${token.getType} -> ${lexer.getVocabulary.getDisplayName(token.getType)}"
}.mkString("\n"))

Probably I need to learn more about Antlr modes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文