所有正则表达式都会停止吗?

发布于 2024-07-30 13:29:42 字数 37 浏览 2 评论 0原文

对于某些输入字符串,是否有任何正则表达式可以永远搜索匹配项?

Is there any regular expression that will, for some input string, search for a match forever?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

牵你手 2024-08-06 13:29:42

对于有限输入,不存在不会停止的正式正则表达式。

任何正式的正则表达式都可以转换为确定性有限自动机。 DFA 一次读取一个字符,在输入结束时,您要么处于接受状态,要么处于不接受状态。 如果状态为接受,则输入与正则表达式匹配。 否则,就不会。

现在,大多数“正则表达式”库都支持非正则表达式的内容,例如反向引用。 只要你远离这些功能,并且输入有限,你就一定会停止。 如果您不这样做...取决于您所使用的具体内容,您很可能无法保证停止。 例如,Perl 允许插入任意代码,并且不保证任意的图灵机等效代码会停止。

现在,如果输入是无限的,那么可以找到永远不会停止的简单正则表达式。 例如,“.*”。

For a finite input, there is no formal regular expression that will not halt.

Any formal regular expression can be translated into a Deterministic Finite Automata. A DFA reads the input one character at a time, and, at the end of the input, you are either in an accepting state or in a non-accepting state. If the state is accepting, then the input matches the regular expression. Otherwise, it doesn't.

Now, most "regular expression" libraries support things which are not regular expressions, such as back references. As long as you keep away from those features, and have a finite input, you are guaranteed a halt. If you don't... depending on exactly what you are using, you might well not be guaranteed a halt. Perl allows arbitrary code to be inserted, for instance, and arbitrary, turing-machine equivalent code is not guaranteed to halt.

Now, if the input is infinite, then trivial regular expressions can be found which will never halt. For example, ".*".

一个人的夜不怕黑 2024-08-06 13:29:42

形式正则表达式实际上是一种描述用于解析字符串的确定性有限自动机的方法。 如果 DFA 在输入结束时处于接受状态,则正则表达式“匹配”。 由于 DFA 顺序读取其输入,因此当它到达输入末尾时,它总是会停止,而是否存在匹配仅是检查它停止在 DFA 的哪个状态的问题。

子字符串匹配实际上是相同的,只不过 DFA 不是在一次读取字符串结束时被迫停止,而是在读取每个可能的子字符串一次后被迫停止 - 仍然是有限情况。 (是的,大多数正则表达式引擎以更优化的方式实现这一点,而不是仅仅将每个可能的子字符串扔到 DFA 中 - 但从概念上讲,限制仍然存在)。

因此,DFA 不会停止的唯一可能情况是输入无限,这通常被认为超出了停止问题的范围。

Formal regex is actually a method of describing a deterministic finite automaton for parsing strings. The regex "matches" if the DFA winds up in an accepting state at the end of input. Since the DFA reads its input sequentially, it will always halt when it reaches the end of the input, and whether or not there is a match is merely a matter of examining which state of the DFA it halts at.

Substring matching is effectively the same, except instead of being forced to halt at the end of one read-through of the string, the DFA would instead be forced to halt after reading each possible substring once - still a finite case. (Yes, most regex engines implement this in a bit more optimized manner than just throwing every possible substring at a DFA - but conceptually it the limit is still there).

Thus the only possible case in which the DFA would not halt is if the input were infinite, which is generally considered beyond the scope of the halting problem.

埖埖迣鎅 2024-08-06 13:29:42

我想,不可能找到一个不停止的正则表达式。

您的输入的大小是有限的。 正则表达式的任何匹配子组的最大大小为输入的大小。

除非使用的算法非常愚蠢(多次检查案例),否则匹配子组的数量也将是有限的。

所以,它会停止。

I imagine, it is not possible to find a regular expression that doesn't halt.

The size of your input is finite. The maximum size of any matched subgroup of the regular expression is, at max, the size of your input.

Unless the algorithm being used is pretty stupid (going over cases multiple times), the number of matched subgroups, will too, be finite.

So, it will halt.

浪漫人生路 2024-08-06 13:29:42

根据这个问题,每个正则表达式停止。

According to this question, every regular expression halts.

野鹿林 2024-08-06 13:29:42

不是您所描述的意义上的,您可能有一些非常低效的正则表达式,它们会占用大量资源并最终杀死正则表达式引擎,这与停止不同。

我认为停止在这里并不适用,正如这篇文章的其他评论者如此敏锐地指出的那样。
http://en.wikipedia.org/wiki/Halting_problem

Not in the sense you are describing, you can have some very inefficient regular expressions that take up loads of resources and end up killing the regex engine, this is not the same as halting.

I don't think halting really applies here, as the other commenters of this post have so astutely pointed out.
http://en.wikipedia.org/wiki/Halting_problem

久夏青 2024-08-06 13:29:42

我无法想象一个输入字符串将被永远解析,尽管无限长的字符串将被永远解析。 鉴于正则表达式可以描述正则语言,而正则语言可能是无限的单词集,那么正则表达式可以描述无限单词的语言,包括无限长度的单词。 但是,输入字符串不可能无限长,因此在某些时候它必须停止。

例如,如果 a*b 在语言中被接受,并且您有一个无限长的 'a' 字符串,那么是的,正则表达式将永远不会停止。 但实际上,这是不可能的。

I can't imagine an input string that would be parsed forever, although an infinitely long string would be parsed forever. Given that a regular expression can describe a regular language, which is potentially an infinite set of words, then a regex can describe a language of infinite words, including words of infinite length. However, no input string can be infinitely long, so at some point it would have to halt.

For example, if a*b is accepted in the language, and you have an infinitely long string of 'a's, then yes, the regex would never halt. Practically, though, this is impossible.

盛夏已如深秋| 2024-08-06 13:29:42

是的。

正则表达式可以用有限状态机表示。 每次收到原子输入时,它都会导致任何定义明确的 FSM 转换到已知状态。

例外情况是当您有无限输入时,但这不适用于停止问题,因为它处理有限输入。 当您拥有有限状态机和有限输入时,始终可以确定您的机器是否会停止。

http://en.wikipedia.org/wiki/Finite_state_machine

Yes.

A regular expression can be represented by a finite state machine. Every time you receive an atomic input, it will cause any well-defined FSM to transition to a known state.

The exception is when you have infinite input, but this is not applicable to the halting problem because it deals with finite input. When you have a finite state machine and finite input, it is always possible to determine whether your machine will halt or not.

http://en.wikipedia.org/wiki/Finite_state_machine

单挑你×的.吻 2024-08-06 13:29:42

Daniel的答案+1:所有有限输入都会导致真正的正则表达式(即,没有反向引用或其他非正则表达式功能)停止,并且正则表达式相当于DFA。

奖励:正则表达式匹配可以简单而快速
(但在 Java、Perl、PHP、Python、Ruby 等中速度很慢)

http: //swtch.com/~rsc/regexp/regexp1.html

请注意,文章顶部的两个图表在 y 轴上具有不同的刻度:一个是秒,另一个是微秒< /em>!

+1 for Daniel's answer: all finite inputs cause true regex's (i.e., without backreferences or other non-regex features) to halt, and regex's are equivalent to DFAs.

Bonus: Regular Expression Matching Can Be Simple And Fast
(but is slow in Java, Perl, PHP, Python, Ruby, ...)

http://swtch.com/~rsc/regexp/regexp1.html

Note that the two graphs at the top of the article have different scales on the y-axis: one is seconds, the other is microseconds!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文