当前位置：文江博客话题详情

antlr Java antlr4

意外的Antlr4解析器错误

发布于 2025-01-26 15:10:03 字数 1729 浏览 5 评论 0 原文

我在Antlr4中发现了一种奇怪的行为（我尝试了版本4.10和4.10.1，结果相同）。

当我尝试

grammar Paths;

cfg: NL? (entry (NL | EOF))* EOF;

entry: path ':' value;

path: SEGMENT ('.' SEGMENT)*;

value: USTRING;

SEGMENT: [a-zA-Z0-9]+;

USTRING: [a-zA-Z0-9]+;

NL: [\n\r]+;

WS: [ \t]+ -> skip;

在字符串上使用语法“ key1：value1 \ nkey2.sub：value2 \ nkey3.sub1.sub2：value3” ，我会看到错误消息：

line 1:5 mismatched input 'value1' expecting {':', '.'}
line 2:9 mismatched input 'value2' expecting {':', '.'}
line 3:15 mismatched input 'value3' expecting {':', '.'}

如果我替换 value /code>带有值的定义：segment ，一切都按预期工作。

第一个定义有什么问题？

在两种情况下，树的输出都是相同的：

(cfg (entry (path key1) : (value value1)) \n (entry (path key2 . sub) : (value value2)) \n (entry (path key3 . sub1 . sub2) : (value value3)) <EOF> <EOF>)

我试图简化语法：

grammar Paths;

cfg: NL? (entry (NL | EOF))* EOF;

entry: path ':' value;

path: SEGMENT;

value: USTRING;

SEGMENT: [a-zA-Z0-9]+;

USTRING: [a-zA-Z0-9]+;

NL: [\n\r]+;

WS: [ \t]+ -> skip;

在这种情况下，我有错误（解析的字符串为 “ key1：value1：value1 \ nkey2：value2 \ nkey3：value3：value 3：value3” ）：

line 1:5 mismatched input 'value1' expecting USTRING
line 2:5 mismatched input 'value2' expecting USTRING
line 3:5 mismatched input 'value3' expecting USTRING

如果我在 value 定义中替换为 segment ，那么一切都很好。输出是

(cfg (entry (path key1) : (value value1)) \n (entry (path key2) : (value value2)) \n (entry (path key3) : (value value3)) <EOF> <EOF>)

原文

I found an odd behavior in Antlr4 (I tried versions 4.10 and 4.10.1 with the same result).

When I try to use the grammar

grammar Paths;

cfg: NL? (entry (NL | EOF))* EOF;

entry: path ':' value;

path: SEGMENT ('.' SEGMENT)*;

value: USTRING;

SEGMENT: [a-zA-Z0-9]+;

USTRING: [a-zA-Z0-9]+;

NL: [\n\r]+;

WS: [ \t]+ -> skip;

on the string "key1:value1\nkey2.sub:value2\nkey3.sub1.sub2:value3", I see error messages:

line 1:5 mismatched input 'value1' expecting {':', '.'}
line 2:9 mismatched input 'value2' expecting {':', '.'}
line 3:15 mismatched input 'value3' expecting {':', '.'}

If I replace value definition with value: SEGMENT, everything works as expected.

What is wrong in the first definition?

The output of tree in both cases is the same:

(cfg (entry (path key1) : (value value1)) \n (entry (path key2 . sub) : (value value2)) \n (entry (path key3 . sub1 . sub2) : (value value3)) <EOF> <EOF>)

I tried to simplify the grammar:

grammar Paths;

cfg: NL? (entry (NL | EOF))* EOF;

entry: path ':' value;

path: SEGMENT;

value: USTRING;

SEGMENT: [a-zA-Z0-9]+;

USTRING: [a-zA-Z0-9]+;

NL: [\n\r]+;

WS: [ \t]+ -> skip;

In this case I have errors (the parsed string is "key1:value1\nkey2:value2\nkey3:value3"):

line 1:5 mismatched input 'value1' expecting USTRING
line 2:5 mismatched input 'value2' expecting USTRING
line 3:5 mismatched input 'value3' expecting USTRING

And again everything is fine if I replace USTRING with SEGMENT in the value definition.
Output is

(cfg (entry (path key1) : (value value1)) \n (entry (path key2) : (value value2)) \n (entry (path key3) : (value value3)) <EOF> <EOF>)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谈场末日恋爱 2025-02-02 15:10:03

这是因为Antlr的Lexer处理字符的输入流以产生令牌流，而解析器则处理令牌流。 ANTLR中的递归下降正在处理令牌流，对Lexer的观点输入没有影响。

结果，Lexer规则段和 USTRING 相同，因此，这两个规则都将与输入字符完全相同。发生这种情况时，ANTLR将与第一个规则匹配，因此它们都将是“段令牌”。

如果您已经通过标准设置运行（并创建 grun 别名，则可以使用`tokens选项运行它来获取令牌流的转储。这通常是验证该验证的好主意您正在创建您期望的令牌流。

回复收藏 0 原文

芸娘子的小脾气 2025-02-02 15:10:03

之所以发生，是因为ANTLR分配了值类型''segment''。 Lexer忽略语法，如果有可以分配给不同类型的令牌，则将其分配给单个随机类型。

这件代码对我有所帮助：

val lexer = PathsLexer(CharStreams.fromString("key1:value1\nkey2.sub:value2\nkey3.sub1.sub2:value3"))
val tokens = CommonTokenStream(lexer)
tokens.fill()
println(tokens.getTokens.asScala.map { token =>
  s"$token: ${token.getType} -> ${lexer.getVocabulary.getDisplayName(token.getType)}"
}.mkString("\n"))

可能我需要了解有关ANTLR模式的更多信息。

It happens because Antlr assigns to values types '''SEGMENT'''. Lexer ignores grammar and if there are tokens that could be assigned to different types, lexer assigns them to a single random type.

This piece of code helped me:

val lexer = PathsLexer(CharStreams.fromString("key1:value1\nkey2.sub:value2\nkey3.sub1.sub2:value3"))
val tokens = CommonTokenStream(lexer)
tokens.fill()
println(tokens.getTokens.asScala.map { token =>
  s"$token: ${token.getType} -> ${lexer.getVocabulary.getDisplayName(token.getType)}"
}.mkString("\n"))

Probably I need to learn more about Antlr modes.

回复收藏 0 原文

~没有更多了~