如何阅读 EBNF 语法中的替代项
我有一个 EBNF 语法,它有一些这种模式的规则:
sequence ::=
item
| item extra* sequence
上面的内容是否等同于下面的内容?
sequence ::=
item (extra* sequence)*
编辑
由于你们中的一些人观察到两个序列中的错误或歧义,我将给出一个具体的例子。 SVG 规范 提供了 路径数据语法。这个语法有几个具有这种模式的生成器:
lineto-argument-sequence:
coordinate-pair
| coordinate-pair comma-wsp? lineto-argument-sequence
上面的内容可以重写如下吗?
lineto-argument-sequence:
coordinate-pair (comma-wsp? lineto-argument-sequence)*
I have an EBNF grammar that has a few rules with this pattern:
sequence ::=
item
| item extra* sequence
Is the above equivalent to the following?
sequence ::=
item (extra* sequence)*
Edit
Due to some of you observing bugs or ambiguities in both sequences, I'll give a specific example. The SVG specification provides a grammar for path data. This grammar has several producers with this pattern:
lineto-argument-sequence:
coordinate-pair
| coordinate-pair comma-wsp? lineto-argument-sequence
Could the above be rewritten as the following?
lineto-argument-sequence:
coordinate-pair (comma-wsp? lineto-argument-sequence)*
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
事实并非如此,他们似乎有不同的错误。第一个序列围绕“item”是不明确的,因为“extra”是可选的。您可以将其重写为以下内容以消除歧义:
第二个在“extra”周围不明确,因为它基本上是两个都以“extra”开头的嵌套循环。您可以将其重写为以下内容以消除歧义:
您的第一个版本可能会因由单个“项目”组成的输入序列(这取决于解析器实现)而阻塞,因为它不会消除歧义。
我的重写假设您想要匹配以“item”开头的序列,并且可以选择后跟一系列(0个或更多)“item”或“extra”(以任意顺序)。
例如,
如果没有其他信息,我个人会倾向于我标记为“sequence4”的选项,因为所有其他选项仅使用递归作为昂贵的循环构造。如果您愿意给我更多信息,我也许可以给出更好的答案。
编辑:基于乔恩的出色观察(带有一个小模型)。
如果您重写“sequence3”以删除递归,您会得到以下结果:
它认为这将是我更喜欢的版本,而不是“sequence4”。
我必须指出,上述所有三个版本在功能上都是等效的(作为识别器或生成器)。 3 的解析树与 4 和 5 不同,但我认为这不会影响性能以外的任何事情。
编辑:
关于以下内容:
此产生式表示,
lineto-argument-sequence
由至少一个coordinate-pair
后跟零个或多个coordinate-pair 组成
由可选的白色/逗号分隔。以下任何内容都将构成一个lineto-argument-sequence
(读作“成为”):因此
坐标对
实际上是任何2个连续的数量。
我在 ANTLR 中模拟了一个似乎有效的语法。请注意,
lineto_argument_sequence
使用的模式与 Jorn 和我之前推荐的模式类似。给定以下输入:
它生成这个解析树。
替代文本 http://www.freeimagehosting.net/uploads/85fc77bc3c.png
Not really, they seem to have different bugs. The first sequence is ambiguous around "item" seeing that "extra" is optional. You could rewrite it as the following to remove ambiguity:
The second one is ambigous around "extra", seeing as it is basically two nested loops both starting with "extra". You could rewrite it as the following to remove ambiguity:
Your first version will likely choke on an input sequence consisting of a single "item" (it depends on the parser implementation) because it won't disambiguate.
My rewrites assume you want to match a sequence starting with "item" and optionally followed by a series of (0 or more) "item" or "extra" in any order.
e.g.
Without additional information I would be personally inclined towards the option I labled "sequence4" as all the other options are merely using recursion as an expensive loop construct. If you are willing to give me more information I may be able to give a better answer.
EDIT: based on Jorn's excellent observation (with a small mod).
If you rewrite "sequence3" to remove recursion you get the following:
It think this will be my prefered version, not "sequence4".
I have to point out that all three versions above are functionally equivalent (as recognizers or generators). The parse trees for 3 would be different to 4 and 5, but I cannot think that that would affect anything other than perhaps performance.
EDIT:
Concerning the following:
What this production says is that a
lineto-argument-sequence
is composed of at least onecoordinate-pair
followed by zero or morecoordinate-pair
s seperated by optional white/comma. Any of the following would constitute alineto-argument-sequence
(read -> as 'becomes'):So a
coordinate-pair
is really any 2 consecutivenumber
s.I have mocked up a grammar in ANTLR that seems to work. Note the pattern used for
lineto_argument_sequence
is similar to the one Jorn and I recommended previously.Given the following input:
it produces this parse tree.
alt text http://www.freeimagehosting.net/uploads/85fc77bc3c.png
此规则也相当于
sequence ::= (item extra*)*
,从而删除sequence
上的递归。This rule would also be equivalent to
sequence ::= (item extra*)*
, thus removing the recursion onsequence
.是的,这两种语法描述的是同一种语言。
但这真的是 EBNF 吗? 关于 EBNF 的维基百科文章不包括 Kleene 星运算符。
Yes, those two grammars describe the same language.
But is that really EBNF? Wikipedia article on EBNF does not include the Kleene star operator.