上下文无关语法

发布于 2024-10-04 05:44:23 字数 36 浏览 2 评论 0原文

是否有一种算法可以根据给定的上下文无关语法生成所有字符串？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧人 2024-10-11 05:44:23

莱昂纳多的回答从技术上来说是正确的；没有终止算法可以返回通用 CFG 的单词集，因为该集合通常是无限的。

但是有些算法可以为 CFG 生成单词序列，每个单词都匹配“最终”出现的语法。其中之一应该足以满足您的目的。使用具有 yield 的语言（例如 Python）编写其中一个相当容易。

这种算法的草图（恐怕是相当无望的伪代码）：

generate(g):
    if g is empty:
        yield ""
    otherwise if g is a terminal a:
        yield "a"
    otherwise if g is a single nonterminal:
        for c in every construction of g:
            start a generator for generate(c)
        until all generators are exhausted:
            looping over each nonexhausted generator gen:
                yield "a" where a = next(gen)
    otherwise if g is a pair of symbols m and n:
        for c in every construction of m:
            start a generator in set 1 for generate(c)
        for d in every construction of m:
            start a generator in set 2 for generate(d)
        until all in set 1 or all in set 2 are exhausted:
            loop over all pairs gen1,gen2 of nonexhausted in set 1 and set 2:
                yield "a b" where a = next(gen1) and b = next(gen2)

假设语法已被转换，以便每个构造都是零到两个终端，这将在所有解析树的树上运行广度优先搜索语法。 BFS 是必要的，因为任何给定子树的大小可能是无限的——DFS 可能会永远卡在向下查找其中一个子树上。

等待给定任何解析树实现的时间和内存成本可能是任意的，但对于该解析树来说是有限的。

Leonardo's answer is technically correct; there's no terminating algorithm that will return the set of words for a general CFG, because oftentimes that set is infinite.

But there are algorithms that will generate a sequence of words for a CFG, with every word matching the grammar appearing "eventually". One of these should be enough for your purposes. It is fairly easy to write one of these in a language with yield, such as Python.

A sketch of such an algorithm (in rather hopeless pseudocode, I'm afraid):

generate(g):
    if g is empty:
        yield ""
    otherwise if g is a terminal a:
        yield "a"
    otherwise if g is a single nonterminal:
        for c in every construction of g:
            start a generator for generate(c)
        until all generators are exhausted:
            looping over each nonexhausted generator gen:
                yield "a" where a = next(gen)
    otherwise if g is a pair of symbols m and n:
        for c in every construction of m:
            start a generator in set 1 for generate(c)
        for d in every construction of m:
            start a generator in set 2 for generate(d)
        until all in set 1 or all in set 2 are exhausted:
            loop over all pairs gen1,gen2 of nonexhausted in set 1 and set 2:
                yield "a b" where a = next(gen1) and b = next(gen2)

Assuming the grammar has been converted so that each construction is zero to two terminals, this will run a breadth first search over the tree of all parse trees of the grammar. BFS is necessary, because the size of any given subtree may be infinite -- a DFS could be stuck looking down one of them forever.

The time and memory cost of waiting for given any parse tree to materialise may be arbitrary, but it is finite for that parse tree.

回复收藏 0 原文