匹配n的发生(排除最后一次发生)

发布于 2025-01-18 12:07:10 字数 264 浏览 3 评论 0原文

我有一个关于正则的问题。我不知道为什么我不能做以下内容。

示例句子:

"This is a test string with five t's"

我使用的正则句:

^(.*?(?=t)){3}

我希望正则态度匹配以下内容。

"This is a test s"

但是它行不通,有人知道为什么吗?

I have a question about regex. I don't know why I cannot do the following.

Sample sentence:

"This is a test string with five t's"

The regex I use:

^(.*?(?=t)){3}

I want the regex to match the following.

"This is a test s"

But it doesn't work, does anyone know why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

晨光如昨 2025-01-25 12:07:10

这里的要点是整个 .*?(?=t) 组模式可以匹配空字符串。它在第一个 t 之前停止,并且无法“跳跃通过”,因为当先行模式(非消耗模式)匹配时它仍保持在原来的位置。

你不能这样做,你必须消耗(并移动正则表达式索引)至少一个字符。

此具体案例的替代解决方案是

^(?:[^t]*t){2}[^t]*

参见 正则表达式演示^(? :[^t]*t){2}[^t]* 匹配字符串的开头 (^),然后消耗两次出现的位置 ({2}) 任何其他字符比 t ([^t]*) 后跟 t,然后再次消耗两次出现 ({2}) 除 t 之外的任何字符。

或者,一般情况解决方案(如果 t 是多字符字符串):

^(?:.*?t){2}(?:(?!t).)*

请参阅 另一个正则表达式演示(?:.*?t){2} 模式匹配任意 0+ 字符的两次出现,尽可能少,直到第一个 t,然后 < code>(?:(?!t).)* 匹配出现 0 次以上且不启动 t 字符序列的任何字符。

The point here is that the whole .*?(?=t) group pattern can match an empty string. It stops before the first t and cannot "hop thru" because it remains where it is when the lookahead pattern (a non-consuming pattern) matches.

You cannot do it like this, you must consume (and move the regex index) at least one char.

An alternative solution for this concrete case is

^(?:[^t]*t){2}[^t]*

See the regex demo, the ^(?:[^t]*t){2}[^t]* matches the start of string (^), then consumes two occurrences ({2}) of any chars other than t ([^t]*) followed with t, and then again consumes two occurrences ({2}) of any chars other than t.

Or, a general case solution (if t is a multicharacter string):

^(?:.*?t){2}(?:(?!t).)*

See another regex demo. The (?:.*?t){2} pattern matches two occurrences of any 0+ chars, as few as possible, up to the first t, and then (?:(?!t).)* matches any char, 0+ occurrences, that does not start a t char sequence.

渔村楼浪 2025-01-25 12:07:10

正如 @CertainPerformance 所说, .* 将匹配模式中的零个或多个字符,但是您使用其惰性版本.*?
量词的惰性版本将使其匹配尽可能少的字符。
使用与空字符串匹配的量词,这将始终导致零长度匹配。

您需要使用 + 量词代替`,以防止空字符串匹配。

用Python演示:

>>> import re
>>> s = "This is a test string with five t's"
>>> r = r'^(.+?(?=t)){3}'
>>> re.match(r, s)
<_sre.SRE_Match object; span=(0, 16), match='This is a test s'>

As said by @CertainPerformance, .* will match zero or more characters in the pattern, but you use its lazy version .*?.
The lazy version of a quantifier will have it match as few characters as possible.
With a quantifier that matches the empty string, this will always lead to a zero-length match.

You need to use the + quantifier instead`, in order to prevent an empty string match.

Demonstration with Python:

>>> import re
>>> s = "This is a test string with five t's"
>>> r = r'^(.+?(?=t)){3}'
>>> re.match(r, s)
<_sre.SRE_Match object; span=(0, 16), match='This is a test s'>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文