ANTLR 解析器问题

发布于 2024-08-22 19:56:01 字数 514 浏览 4 评论 0原文

我正在尝试解析许多文本记录,其中记录中的元素由“+”字符分隔,整个记录由“#”字符终止。例如,E1+E2+E3+E4+E5+E6#

各个元素可以是必需的,也可以是可选的。如果一个元素是可选的,那么它的值就丢失了。例如,如果缺少 E2,则输入字符串将为:E1++E3+E4+E5+E6#。

然而,在处理空尾随元素时,分隔符字符(“+”)也可能会丢失。例如,如果缺少最后 3 个元素,则字符串可能是:E1+E2+E3#,但也可能是: E1+E2+E3+++#

我在Antlr中尝试了以下规则:

'R1' 'E1 + E2 + E3' '+'? ‘E4’? '+'? ‘E5’? '+'? ‘E6’? '#

但 Antlr 抱怨它含糊不清,当然这是正确的(E3 后面的每个标记都可能是 E4、E5 或 E6)。输入语法是固定的(它来自旧的大型机系统),所以我想知道是否有人可以解决这个问题?

另一种方法是在规则中指定所有不同的排列,但这将是一项主要任务。

致以最诚挚的问候和感谢,

迈克尔

I'm trying to parse a number of text records where elements in a record are separated by a '+' char, and where the entire record is terminated by a '#' char. For example E1+E2+E3+E4+E5+E6#

Individual elements can be required or optional. If an element is optional, its value is simply missing. For example, if E2 were missing, the input string would be: E1++E3+E4+E5+E6#.

When dealing with empty trailing elements, however, the separator char ('+') may be missing as well. If, for example, the last 3 elements were missing, the string could be: E1+E2+E3#, but it could also be:
E1+E2+E3+++#

I have tried the following rule in Antlr:

'R1' 'E1 + E2 + E3' '+'? 'E4'? '+'? 'E5'? '+'? 'E6'? '#

but Antlr complains that it's ambiguous which of course is correct (every token following E3 could be E4, E5 or E6). The input syntax is fixed (it's from a legacy mainframe system), so I was wondering if anybody has a solution to this problem ?

An alternative would be to specify all the different permutations in the rule, but that would be a major task.

Best regards and thanks,

Michael

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

面如桃花 2024-08-29 19:56:01

这个任务听起来对 ANTLR 来说太过分了,有什么原因你不使用“+”作为分隔符将字符串分割成数组吗?

如果它来自大型机,则很可能旨在以简单的方式进行处理。

例如,
C++:http://www.cplusplus.com/reference/clibrary/cstring/strtok /
PHP:https://www.php.net/manual/en/function。爆炸.php
Java: http://java.sun.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String%29
C#:http://msdn. microsoft.com/en-us/library/system.string.split%28VS.71%29.aspx

只是一个想法。

That task sounds like excessive overkill for ANTLR, any reason you're just not splitting the string into an array using the '+' as a separator?

If it's coming from a mainframe, it most likely was intended to be processed in a trivial way.

e.g.,
C++ : http://www.cplusplus.com/reference/clibrary/cstring/strtok/
PHP : https://www.php.net/manual/en/function.explode.php
Java: http://java.sun.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String%29
C# : http://msdn.microsoft.com/en-us/library/system.string.split%28VS.71%29.aspx

Just a thought.

何时共饮酒 2024-08-29 19:56:01

如果这是不明确的,可能是因为您的 E 都具有相同的格式(更复杂的情况是您的 E 都以相同的 开头 但我假设情况并非如此,这仍然有效;它只需要一个额外的步骤。)

k 个字符,其中 k 是您的前瞻, 看起来您最多可以有 6 个 E 和最多 5 个 +。我们会说“段”是一个可选的 E 后跟一个 + - 您可以有 5 个段,以及一个可选的尾部 E

这个语法可以大致表示如下(不完美的 ANTLR 语法,因为我对它不是很熟悉):

r : (e_opt? PLUS){1,5} e_opt? END
e_opt : E  // whatever your E is
PLUS : '+'
END : '#'

如果 ANTLR 不支持类似 {1,5} 的内容,那么这与以下内容相同:

(e_opt? PLUS) ((e_opt? PLUS) ((e_opt? PLUS) ((e_opt? PLUS) (e_opt? PLUS)?)?)?)?

这不是那么干净,所以也许有更好的方法来做到这一点。

If this is ambiguous, it's likely because your Es all have the same format (a more complicated case would be that your Es all just start with the same k characters where k is your lookahead, but I'm going to assume that's not the case. If it is, this will still work; it will just require an extra step.)

So it looks like you can have up to 6 Es and up to 5 +s. We'll say a "segment" is an optional E followed by a + - you can have 5 segments, and an optional trailing E.

This grammar can be represented roughly like this (imperfect ANTLR syntax since I'm not very familiar with it):

r : (e_opt? PLUS){1,5} e_opt? END
e_opt : E  // whatever your E is
PLUS : '+'
END : '#'

If ANTLR doesn't support anything like {1,5} then this is the same as:

(e_opt? PLUS) ((e_opt? PLUS) ((e_opt? PLUS) ((e_opt? PLUS) (e_opt? PLUS)?)?)?)?

which is not that clean, so maybe there is a nicer way to do it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文