PEG 和 CFG 有什么区别?

发布于 2024-10-29 11:15:12 字数 732 浏览 0 评论 0原文

从这个 wikipedia 页面:

两者的根本区别 上下文无关语法和解析 表达式语法是 PEG 的 选择运算符是有序的。如果 第一种选择成功,第二种选择成功 替代方案被忽略。如此订购 选择是不可交换的,与 上下文无关的无序选择 语法和正则表达式。 有序选择类似于软选择 某些逻辑中可用的剪切运算符 编程语言。

为什么PEG的选择运算符会短路匹配?是因为要最小化内存使用(由于记忆)吗?

我不确定正则表达式中的选择运算符是什么,但我们假设它是这样的: /[aeiou]/ 匹配元音。所以这个正则表达式是可交换的,因为我可以用 5 个中的任何一个来编写它! (五个阶乘)元音字符的排列?即 /[aeiou]/ 的行为与 /[eiaou]/ 相同。它可交换的优点是什么? (参见 PEG 的非交换性)

结果是,如果 CFG 是 直接音译为 PEG,任何 前者的歧义通过以下方式解决 确定性地选择一个解析 可能的解析树。经过 仔细选择顺序 替代语法是 指定,程序员有一个伟大的 对哪个解析树的控制 已选择。

这是不是说PEG的语法优于CFG呢?

From this wikipedia page:

The fundamental difference between
context-free grammars and parsing
expression grammars is that the PEG's
choice operator is ordered. If the
first alternative succeeds, the second
alternative is ignored. Thus ordered
choice is not commutative, unlike
unordered choice as in context-free
grammars and regular expressions.
Ordered choice is analogous to soft
cut operators available in some logic
programming languages.

Why does PEG's choice operator short circuits the matching? Is it because to minimize memory usage (due to memoization)?

I'm not sure what the choice operator is in regular expressions but let's suppose it is this: /[aeiou]/ to match a vowel. So this regex is commutative because I could have written it in any of the 5! (five factorial) permutations of the vowel characters? i.e. /[aeiou]/ behaves the same as /[eiaou]/. What is the advantage of it being commutative? (c.f. PEG's non-commutativity)

The consequence is that if a CFG is
transliterated directly to a PEG, any
ambiguity in the former is resolved by
deterministically picking one parse
tree from the possible parses. By
carefully choosing the order in which
the grammar alternatives are
specified, a programmer has a great
deal of control over which parse tree
is selected.

Is this saying that PEG's grammar is superior to CFG's?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

永言不败 2024-11-05 11:15:12

CFG 语法是非确定性的,这意味着某些输入可能会产生两个或更多可能的解析树。尽管大多数基于 CFG 的解析器生成器对语法的可确定性都有限制。如果有两个或多个选择,它将给出警告或错误。

PEG 语法是确定性的,这意味着任何输入只能以一种方式进行解析。

举一个经典的例子; 语法

if_statement := "if" "(" expr ")" statement "else" statement
              | "if" "(" expr ")" statement;

应用于输入的

if (x1) if (x2) y1 else y2

可以被解析为

if_statement(x1, if_statement(x2, y1, y2))

if_statement(x1, if_statement(x2, y1), y2)

CFG 解析器将生成 Shift/Reduce 冲突,因为它无法决定是否应该移动(读取另一个标记)或减少(完成节点),当到达“else”关键字。当然,有一些方法可以解决这个问题。

PEG 解析器总是会选择第一个选择。

哪一个更好由您决定。我的观点是,通常 PEG 语法更容易编写,而 CFG 语法更容易分析。

A CFG grammar is non-deterministic, meaning that some input could result in two or more possible parse-trees. Though most CFG-based parser-generators have restrictions on the determinability of the grammar. It will give a warning or error if it has two or more choices.

A PEG grammar is deterministic, meaning that any input can only be parsed one way.

To take a classic example; The grammar

if_statement := "if" "(" expr ")" statement "else" statement
              | "if" "(" expr ")" statement;

applied to the input

if (x1) if (x2) y1 else y2

could either be parsed as

if_statement(x1, if_statement(x2, y1, y2))

or

if_statement(x1, if_statement(x2, y1), y2)

A CFG-parser would generate a Shift/Reduce-conflict, since it can't decide if it should shift (read another token), or reduce (complete the node), when reaching the "else" keyword. Of course, there are ways to get around this problem.

A PEG-parser would always pick the first choice.

Which one is better is for you to decide. My opinion is that often PEG-grammars is easier to write, and CFG grammars easier to analyze.

黯淡〆 2024-11-05 11:15:12

我认为您将 CFG 与 LR 以及歧义混淆了。语法不是确定性/非确定性的,尽管它们的解析器可能是。如果符合定义,二义性语法仍然是 CFG,并且可以为其构建确定性解析器,执行 PEG 的操作。

I think you're confusing CFG with LR and with ambiguity. Grammars are not deterministic/nondeterministic, though their parsers may be. An ambiguous grammar is still CFG if it complies with the definition, and a deterministic parser can be built for it doing what PEG does.

爱情眠于流年 2024-11-05 11:15:12

PEG 和 CFG 是指定语言的两种不同方式。如果您手动编写一个解析器,那么您很有可能会编写一个所谓的递归下降解析器。递归下降解析器将自动解决语法中的任何歧义,但会默默地进行,并且可能不会按照您想要的方式进行。这样做的问题是,除非您彻底测试您的解析器,否则您永远不会发现有自动解决的歧义。 PEG 基本上是递归下降解析器的形式化,因此会出现这个问题。有关此问题的示例,请参阅回溯如何影响语言被解析器识别?,以及https://cs .stackexchange.com/questions/143480/dragon-book-4-4-5-exercise/143975

CFG 有很多理论支持,但 PEG 却没有那么多。可以由 CFG 编码的语言集和可以由 PEG 编码的语言集部分重叠,但两者都不包含对方。

为了更彻底地回顾这一点,我推荐这篇优秀的文章 哪种解析方法?

PEGs and CFGs are two different ways of specifying a language. If you write a parser by hand, chances are very good that you will write a so-called recursive descent parser. A recursive descent parser will automatically resolve any ambiguities in your grammar, but does so silently and likely not in the way you would have wanted. The problem with this is that you never find out that there were ambiguities that got automatically resolved, unless you thoroughly test your parser. PEGs are basically a formalization of recursive descent parsers, and so come with this problem. For examples of this problem see How does backtracking affect the language recognized by a parser?, and https://cs.stackexchange.com/questions/143480/dragon-book-4-4-5-exercise/143975.

CFGs have a lot of theory to back them up, but PEGs not so much. The set of languages that can be encoded by CFG and those that can be encoded by PEG partially overlap, but neither encompasses the other.

For a more thorough review of this I recommend the excellent essay Which Parsing Approach?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文