简化这个正则表达式
我正在为我的编译器课程做一些考前练习,并且需要简化这个正则表达式。
(a U b)*(a U e)b* U (a U b)*(b U e)a*
很明显,e 是空字符串,U 代表并集。
到目前为止,我认为 (a U b)* 之一可以被删除,因为 a U a = a 的并集。然而,我找不到任何其他的简化,并且到目前为止我对其他问题的处理也不是很好。 :(
感谢任何帮助,非常感谢!
I'm doing some pre-exam exercises for my compilers class, and needed to simplify this regular expression.
(a U b)*(a U e)b* U (a U b)*(b U e)a*
Quite obviously, the e is the empty string, and the U stands for union.
So far, I think one of the (a U b)* can be removed, as the union of a U a = a. However, I can't find any other simplifications, and am not doing so well with the other problems thus far. :(
Any help is appreciated, thanks very much!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
首先翻译为该语言的英文描述:
翻译为:
任何
a
或b
序列,后跟可选的a
,后跟任意数量的b
。或者
任意数量的
a
和b
,后跟可选的b
,后跟任意数量的a
s这里有很多重叠 - 至少
(a U b)*(a U e)
与(a U b)*
完全相同,因为“任何a
和b
序列必然以a
或 epsilon 结尾(任何字符串都可以以 epsilon 结尾),因此可以消除这些组,留下翻译为:
任何
a
或b
序列,后跟任意数量的b
。或者
任意数量的 a 和 b ,后跟任意数量的 a
现在,最外层组的第一部分是相同的,所以让我们将它们折叠成一个
翻译为:
任何
a
或b
序列,后跟任意数量的a< /code>s 或任意数字
b
s。现在稍等一下,“任何 As 和 B 序列”必然以“任何
a
序列或任何b
序列结束”,这意味着任何与第一部分匹配的内容都可以匹配整个正则表达式(因为第二部分的长度可以为零),所以我们为什么不将其设为Ta Da.简单的。
First translate to an english description of the language:
Translates to:
Any sequence of
a
s orb
s, followed by an optionala
, followed by any number ofb
s.OR
Any number of
a
s andb
s, followed by an optionalb
, follwed by any number ofa
sThere is a lot of overlap here - at least
(a U b)*(a U e)
is exactly the same as(a U b)*
, because "Any sequence ofa
s andb
s" necessarily either ends with ana
or epsilon (as any string can end with epsilon) so those groups can be eliminated, leavingTranslates to:
Any sequence of
a
s orb
s, followed by any number ofb
s.OR
Any number of
a
s andb
s, follwed by any number ofa
sNow the first section of those to outermost groups is the same, so lets collapse those into one
Translates to:
Any sequence of
a
s orb
s, followed by any number ofa
s OR by any numberb
s.now hold on a minute, "Any sequence of As and Bs" necessarily ends with "Any sequence of
a
s OR any sequence ofb
s", which means anything which matches the first part can match the whole regex (because the second part can have a length of zero) so why don't we just make itTa Da. Simple.
正则表达式有点生疏,但如果 * 仍然代表“零次或多次出现”,您可以替换:
这使得第一部分为:
在右侧,您有
现在,因为 a U b = b U a,您得到:
在右手边,
我认为就是这样......
Little rusty on regex, but if * still represents the "zero or more ocurrences" you can replace:
which leaves the first part with:
On the right side, you have that
Now, since a U b = b U a, you get:
on the right hand side, which leaves just
I think that's it...
我认为整个事情相当于
(a U b)*
(或者在大多数正则表达式语法中,(a|b)*
)I think the whole thing is equivalent to
(a U b)*
(or in most regex grammars,(a|b)*
)我会给你一个如何解决这个问题的想法:(不是很正式,也没有保证)
看看主 U 的左侧:
(a U b)* - 这是什么意思?长度为 n 的 a´s 和 b´s 的组合,其中 n >= 0。
接下来是 (a U e)。我们这里有什么? a 或空词。如果我们想要的话,我们可以在前面的部分中得到它。如果我们想要 e,那么无论如何我们都可以省略它。请注意,我们不必选择 a,因为我们可以选择 e。所以我们可以跳过这整个部分。
接下来是什么? b*。那是什么?我们想要多少个 b 就可以。我们也可以在第一部分中得到这些!我们可以忽略它!
所以左边唯一的是(a U b)*。
让我们看一下右侧:
好吧,现在很容易了,我们可以使用相同的想法,只是不同的字母。
用同样的方法我们也可以得到(a U b)*。
所以最后我们有 (a U b)* U (a U b)* 你知道它等于 (a U b)*。
I´ll give you an idea of how I would solve it: (not very formal and no guarantee)
Look at the left side of the main U:
(a U b)* - What does it mean? A combination of a´s and b´s of length n, where n >= 0.
Next comes (a U e). What do we have here? An a or empty word. If we wanted that a we could just have gotten it in the previous part already. If we want the e, well we can leave it out anyway. Please note here that we dont have to take an a, because we have the option to chose e. So we can skip this whole part.
What is next? b*. What is that? As many b´s as we want. We could have gotten those in the first part also! we can leave that out!
So the only thing on the left is (a U b)*.
Lets have a look on the right side:
Ok this is easy now, we can use the same idea it is just different letters.
We will also get (a U b)* in the same way.
So in the end we have (a U b)* U (a U b)* which you know is equal to (a U b)*.