正则表达式与cfg交集的算法

发布于 2024-10-01 12:43:59 字数 219 浏览 6 评论 0原文

我正在寻找一种算法,该算法输出正则表达式和上下文无关语法的交集是否为空。我知道这个问题是可判定的,但是,我找不到任何示例实现(以伪代码形式)。

如果可能的话,有人可以在.NET 中为我提供这样的算法吗?但这不是必须的。这个问题也被称为“正则交叉”。谷歌搜索只给我提供了几何算法或相关理论。

编辑

任何人。我真的被困住了,还找不到任何东西。

Im looking for an algorithm which outputs if the intersection of a regular expression and a contex free grammar is empty or not. I know that this problem is decidable, however, I cannot find any example implementation (in pseudocode).

Can someone provide me with such an algorithm, in .NET if possible but this is not a must. This problem is also called "regular intersection". Googling for it only gives me the geometrical algorithm or the theory about it.

edit:

Anybody. Im really stuck on it, and cannot find anything yet.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

太阳公公是暖光 2024-10-08 12:43:59

这是我想到的一种方法的草图。我认为这应该可行,但它可能不是最好的方法,因为它使用从 PDA 到 CFG 的极其混乱的转换。

将正则表达式转换为非确定性有限自动机 (NFA),并将其简化为最小确定性有限自动机 (DFA)。将上下文无关语法(CFG)转换为下推自动机(PDA)。这些第一步都是众所周知且相当简单的算法。

拿DFA和PDA的交集来说,它也是一个PDA。我们可以说 DFA 具有状态 S1、起始状态 s1、最终状态 F1 和形式为 ((源、触发)、目的地) 的转换 delta1,而 PDA 具有状态 S2、起始状态 s2、最终状态 F2 和转换形式为 ((源、触发、弹出)、(目标、推送)) 的 delta2。新的 PDA 具有状态 S1 X S2,每个状态由一对标记。它具有最终状态 F1 X F2 和开始状态 (s1,s2)。现在进行过渡。

对于每个转移 d 一个 delta2 的元素,对于每个状态 s 一个元素 s1,找到形式为 ((s,d.trigger),?) 的转移 t 一个 delta1 的元素。进行新的转换 (((d.source, s), d.trigger, d.pop),((d.destination, t.destination),d.push))。

这个新的 PDA 应该接受 RE 和 CFG 生成的语言的交集。要测试语言是否为空,您需要将其转换回 CFG。该算法虽然混乱且庞大,但它确实有效。完成此操作后,标记每个终端符号。然后标记每个有规则的符号,即只有右侧有标记的符号,然后重复,直到无法标记更多符号为止。如果可以标记开始符号,则该语言不为空。否则,语言就是空的。

Here is a sketch of an approach that occurs to me. I think this should work but it is probably not the best way to do it since it uses the terribly messy conversion from PDA to CFG.

Convert the regular expression into a nondeterministic finite automaton (NFA) and reduce it down to the minimal determinsitic finite automaton (DFA). Convert the context free grammar (CFG) into a pushdown automoton (PDA). These first steps are all well known and fairly simple algorithms.

Take the intersection of the DFA and PDA, which is also a PDA. We will say the DFA has states S1, start state s1, final states F1, and transitions delta1 of the form ((source,trigger),destination), and the PDA has states S2, start state s2, final states F2, and transitons delta2 of the form ((source,trigger,pop),(destination,push)). The new PDA has states S1 X S2, each state labeled by a pair. It has final states F1 X F2, and start state (s1,s2). Now for the transitions.

For each transition d an element of delta2, for each state s an element s1, find the transition t an element of delta1 of the form ((s,d.trigger),?). Make a new transition (((d.source, s), d.trigger, d.pop),((d.destination, t.destination),d.push)).

This new PDA should accept the intersection of the languages produced by the RE and the CFG. To test if the language is empty you will need to convert it back to a CFG. The algorithm for that is messy and large, but it works. Once you have done that, mark each terminal symbol. Then mark each symbol which has a rule where there are only marked symbols on the right hand side, and repeat until you can mark no more symbols. If you can mark the start symbol, the language is not empty. Otherwise, the language is empty.

葬﹪忆之殇 2024-10-08 12:43:59

事实上,有一种更简单的算法来计算上下文无关语法和正则表达式之间的交集。它不使用下推自动机,从具有多次转换的 CFG 中获得下推自动机的成本可能很高。

该解决方案提出于:

Y。 Ba-Hillel、M. Prles 和 E. Shamir。 1965.论形式属性
简单的短语结构语法。 Z. Phonetik,Sprachwissen。科姆。 15
(I961),143-172。 Y. Bar-Hillel,《语言与信息》,
艾迪生-韦斯利,雷丁,马萨诸塞州 (1965),116–150。

但您可以在以下位置找到更简单的版本:

理查德·贝格尔和威廉·加萨奇。 .. 交叉点的证明
上下文无关语言和常规语言是上下文无关的
哪个不使用下推自动机。
http://www.cs.umd.edu/~gasarch/BLOGPAPERS/ cfg.pdf (.).

如果有帮助,此解决方案已在 Pyformlang 中实现(https://pyformlang.readthedocs.io/),你可以在 Python 的 Github 上找到它(https:// github.com/Aunsiels/pyformlang/blob/master/pyformlang/cfg/cfg.py

In fact, there is a simpler algorithm for computing the intersection between a context-free grammar and a regular expression. It does not use push-down automata, which can be costly to obtain from CFG with several conversions.

This solution was presented in:

Y. Ba-Hillel, M. Prles, and E. Shamir. 1965. On formal properties of
simple phrase structure grammars. Z. Phonetik, Sprachwissen. Komm. 15
(I961), 143-172. Y. Bar-Hillel, Language and Information,
Addison-Wesley, Reading, Mass (1965), 116–150.

but you can find a simpler version in:

Richard Beigel and William Gasarch. .. A Proof that the intersection
of a context-free language and a regular language is Context-Free
Which Does Not use pushdown automata.
http://www.cs.umd.edu/~gasarch/BLOGPAPERS/ cfg.pdf (.).

If it help, this solution was implemented in Pyformlang (https://pyformlang.readthedocs.io/), and you can find it on Github for Python (https://github.com/Aunsiels/pyformlang/blob/master/pyformlang/cfg/cfg.py)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文