正则表达式定义了带有 {a,b} 的正则语言,不包含恰好 3 个 b's (bbb) 的子字符串
问题说的差不多了。我想出了
(ba)?(a + bb + bbbbb + aba)*(ab)?
还有什么更具可读性吗?或者这是不正确的? 我知道当您可以在代码中使用 !~/bbb/ 时,您实际上不应该使用正则表达式做这种事情,但这是一个理论练习。
谢谢。
编辑澄清:我没有使用 |
来表示正则表达式中的 OR 位,而是使用 +
来代替。抱歉造成混乱。
编辑 2: {a,b}
适用于仅包含“a”和“b”字符的语言。不是{最小值,最大值}。再次抱歉。
编辑 3:因为这是理论课程的一部分,所以我们只处理正则表达式的基础知识。您唯一可以使用的是 +、?、() 和 *。您不能使用{最小值,最大值)。
Pretty much what the question says. I came up with
(ba)?(a + bb + bbbbb + aba)*(ab)?
Is there anything more readable? Or is this incorrect?
I know you shouldn't really be doing this sorta thing with Regex when you can just go !~/bbb/ in your code, but it's a theory exercise.
Thanks.
Edit for Clarification: I'm not using |
to represent the OR bit in the Regex and using +
it instead. Sorry for the confusion.
Edit 2: {a,b}
is for a language with just 'a' and 'b' characters. Not {mininum, maximum}. Sorry again.
Edit 3: Because this is part of a theory class, we're just dealing with the basics of Regex. The only things you're allowed to use are +, ?, () and *. You cannot use {minimum, maximum).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我想我有一个有效的正则表达式。让
b°
(这是我刚刚发明的一种表示法)作为匹配零个或多个 b 的正则表达式,但它不会匹配其中的三个。这可以替换为(ε | b | bb | bbbb+)
,所以不用担心我使用了魔法或其他东西。现在我认为匹配字符串可以看作是零个或多个 a 后面跟着b°
的重复子模式,这可能是(a*b°)*
,但是你需要b 序列之间至少有一个“a”。所以你的最终正则表达式是a*b°(a+b°)*
。由于
b°
可以匹配空字符串,因此初始的a*
是多余的,因为a+
可以拾取初始的 a 就可以了,所以正则表达式可以优化到b°(a+b°)*
(谢谢,wrikken)。I think I have a working regex. Let
b°
—which is a notation I invented just now—be the regex that matches zero or more b's, except it won't match three of them. This can be replaced by(ε | b | bb | bbbb+)
, so don't worry that I'm using magic or anything. Now I think that matching strings can be seen as repeating subpatterns of zero or more a's followed byb°
, which could be(a*b°)*
, but you need there to be at least one "a" in between sequences of b's. So your final regex isa*b°(a+b°)*
.Since
b°
can match the empty string, the initiala*
is superfluous as thea+
can pick up the initial a's just fine, so the regex can be optimized down tob°(a+b°)*
(thanks, wrikken).嗯,有这样的事吗?
编辑:
噗,说的是把你的双手绑在背后……简单的解决方案:你做不到(
^
&$
是它的要求永远工作),我们需要|
。所以,想出一个更好的条件。放弃后视和前瞻可以可以完成,但不会很漂亮(至少,在不违反 DRY 的情况下):Hmm, something like this?
edit:
Pfff, talking about tying your hands behind your back... Simple solution: you cannot do it (
^
&$
are requirements for it ever to work), and we need the|
. So, come up with a better conditions. Dropping the lookbehind & lookahead could be done, but isn't going to be pretty (at least, not without violating DRY):您正在匹配一个连续不包含 3 个 b 的字符串。这意味着您正在查看诸如“aa”、“aba”、“abba”和“abbbbb*a”之类的子字符串,其中任何外部 a 都可以是字符串的开头或结尾,可以重叠,并且可以是多个。这表明:
通过适当的添加来解决字符串开头缺少的 a 的问题。有很多重复,但这就是正则表达式基本形式的工作方式。
You're matching a string without precisely 3 b's in a row. That means you're looking at substrings like "aa", "aba", "abba", and "abbbbb*a", where any of the exterior a's could be the beginning or end of the string, can be overlapped, and can be multiple. This suggests something like:
with appropriate additions to account for the missing a at the beginning of the string. There's a lot of repetitions, but that's how regular expressions work in basic form.