如何将正则语法转换为正则表达式?

发布于 2024-12-27 14:56:44 字数 31 浏览 1 评论 0原文

有没有算法或工具可以将正则语法转换为正则表达式?

Is there an algorithm or tool to convert regular grammar to regular expression?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

向地狱狂奔 2025-01-03 14:56:44

dalibocai 的回答:

我的目标是将常规语法转换为 DFA。最后,我找到了一个很棒的工具:JFLAP。

此处提供了教程:https://www2.cs.duke。 edu/csed/jflap/tutorial/framebody.html

Answer from dalibocai:

My goal is to convert regular grammer to DFA. Finally, I found an excellent tool : JFLAP.

A tutorial is available here: https://www2.cs.duke.edu/csed/jflap/tutorial/framebody.html

永不分离 2025-01-03 14:56:44

如果您可以根据正则表达式计算自动机,则该算法非常简单。一旦你有了你的自动机。例如,对于 (aa*b|c),自动机将是(箭头向右):

          a
         / \
      a  \ / b
-> 0 ---> 1 ---> 2 ->
    \___________/
          c

然后只需将您的转换“枚举”为规则即可。下面,考虑 0、1 和 2 是非终结符号,当然 a、b 和 c 是标记。

0: a1 | c2
1: a1 | b2
2: epsilon

或者,如果您不想要空的右侧。

0: a1 | c
1: a1 | b

当然,另一个方向的路线提供了一种将正则语法转换为自动机(即有理表达式)的方法。

The algorithm is pretty straightforward if you can compute an automaton from your regular expression. Once you have your automaton. For instance for (aa*b|c), an automaton would be (arrows go to the right):

          a
         / \
      a  \ / b
-> 0 ---> 1 ---> 2 ->
    \___________/
          c

Then just "enumerate" your transitions as rules. Below, consider that 0, 1, and 2 are nonterminal symbols, and of course a, b and c are the tokens.

0: a1 | c2
1: a1 | b2
2: epsilon

or, if you don't want empty right-hand sides.

0: a1 | c
1: a1 | b

And of course, the route in the other direction provides one means to convert a regular grammar into an automaton, hence a rational expression.

空城旧梦 2025-01-03 14:56:44

从理论角度来看,解决此问题的算法的工作原理是根据语法中的每个规则创建正则表达式,并求解初始符号的所得方程组。

例如,对于正则语法 ({S,A},{a,b,c},P,S)

P:
   S -> aA | cS | a  | c
   A -> aA | a  | bS
  1. 采用每个非项式符号并从右手生成正则表达式:

    <前><代码>S = aA + cS + a + c
    A = aA + bS + c

  2. 求解初始符号S的方程组:

    A = a(aA + bS + c) + bS + c
    A = a⁺bS + a⁺c + bS + c  
    
    S = aA + c(aA + cS + a + c)
    S = aA + c⁺aA + c⁺a + c⁺
    
    S = a(a⁺bS + a⁺c + bS + c) + c⁺a(a⁺bS + a⁺c + bS + c) + c⁺a + c⁺
    S = a⁺bS + a⁺c + c⁺a⁺bS + c⁺a⁺c + c⁺a + c⁺
    
    S = (c⁺ + ε)a⁺bS + a⁺c + c⁺(a⁺c + a + ε)
    
    替换: x = (c⁺ + ε)a⁺b
    
    S = x(xS + a⁺c + c⁺(a⁺c + a + ε)) + a⁺c + c⁺(a⁺c + a + ε)
    S = x⁺a⁺c + x⁺c⁺(a⁺c + a + ε) + a⁺c + c⁺(a⁺c + a + ε)
    S = x*(a⁺c + c⁺(a⁺c + a + ε))
    
    S = ((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε)) 
    

因为所有修改都是等价的,((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a) + ε)) 是一个正则表达式,相当于可以从初始符号生成的所有单词。因此,这个表达式的值必须等于由初始符号为 S 的语法生成的语言。

它并不漂亮,但我故意选择了一个包含循环的语法来描述算法的工作方式。最难的部分是认识到 S = xS | x 相当于 S = x⁺,然后只需进行替换即可。

From a theoretical point of view, an algorithm to solve this problem works by creating a regular expression from each rule in the grammar, and solving the resulting system of equations for the initial symbol.

For example, for regular grammar ({S,A},{a,b,c},P,S):

P:
   S -> aA | cS | a  | c
   A -> aA | a  | bS
  1. Take each non-termimal symbol and generate regular expression from right hand:

    S = aA + cS + a + c
    A = aA + bS + c
    
  2. Solve equation system for initial symbol S:

    A = a(aA + bS + c) + bS + c
    A = a⁺bS + a⁺c + bS + c  
    
    S = aA + c(aA + cS + a + c)
    S = aA + c⁺aA + c⁺a + c⁺
    
    S = a(a⁺bS + a⁺c + bS + c) + c⁺a(a⁺bS + a⁺c + bS + c) + c⁺a + c⁺
    S = a⁺bS + a⁺c + c⁺a⁺bS + c⁺a⁺c + c⁺a + c⁺
    
    S = (c⁺ + ε)a⁺bS + a⁺c + c⁺(a⁺c + a + ε)
    
    substitution: x = (c⁺ + ε)a⁺b
    
    S = x(xS + a⁺c + c⁺(a⁺c + a + ε)) + a⁺c + c⁺(a⁺c + a + ε)
    S = x⁺a⁺c + x⁺c⁺(a⁺c + a + ε) + a⁺c + c⁺(a⁺c + a + ε)
    S = x*(a⁺c + c⁺(a⁺c + a + ε))
    
    S = ((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε)) 
    

Because all modifications were equivalent, ((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε)) is a regular expression equivalent to all words which can be produced from the initial symbol. Thus the value of this expression must be equivalent to the language generated by the grammar whose initial symbol is S.

It ain't pretty, but i purposefully picked a grammar including cycles to portray the way the algorithm works. The hardest part is recognizing that S = xS | x is equivalent to S = x⁺, then just doing the substitutions.

请别遗忘我 2025-01-03 14:56:44

我将把这个作为这个老问题的答案,以防有人发现它有用:

我最近发布了一个正是用于此目的的库:

https://github.com/rindPHI/grammar2regex

您可以精确地转换正则语法,还可以计算更通用的上下文无关语法的近似正则表达式。输出格式可以配置为自定义 ADT 类型或 z3 SMT 求解器的正则表达式格式 (z3.ReRef)。

在内部,该工具将语法转换为有限自动机。如果您对自动机本身感兴趣,可以调用方法right_linear_grammar_to_nfa

I'll leave this as an answer to this old question, in case that anybody finds it useful:

I have recently released a library for exactly that purpose:

https://github.com/rindPHI/grammar2regex

You can precisely convert regular grammars, but also compute approximate regular expressions for more general general context-free grammars. The output format can be configured to be a custom ADT type or the regular expression format of the z3 SMT solver (z3.ReRef).

Internally, the tool converts grammars to finite automata. If you're interested in the automaton itself, you can call the method right_linear_grammar_to_nfa.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文