匹配n的发生（排除最后一次发生）

发布于 2025-01-18 12:07:10 字数 264 浏览 3 评论 0原文

我有一个关于正则的问题。我不知道为什么我不能做以下内容。

示例句子：

"This is a test string with five t's"

我使用的正则句：

^(.*?(?=t)){3}

我希望正则态度匹配以下内容。

"This is a test s"

但是它行不通，有人知道为什么吗？

原文

I have a question about regex. I don't know why I cannot do the following.

Sample sentence:

"This is a test string with five t's"

The regex I use:

^(.*?(?=t)){3}

I want the regex to match the following.

"This is a test s"

But it doesn't work, does anyone know why?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晨光如昨 2025-01-25 12:07:10

这里的要点是整个 .*?(?=t) 组模式可以匹配空字符串。它在第一个 t 之前停止，并且无法“跳跃通过”，因为当先行模式（非消耗模式）匹配时它仍保持在原来的位置。

你不能这样做，你必须消耗（并移动正则表达式索引）至少一个字符。

此具体案例的替代解决方案是

^(?:[^t]*t){2}[^t]*

参见正则表达式演示，^(? :[^t]*t){2}[^t]* 匹配字符串的开头 (^)，然后消耗两次出现的位置 ({2}) 任何其他字符比 t ([^t]*) 后跟 t，然后再次消耗两次出现 ({2}) 除 t 之外的任何字符。

或者，一般情况解决方案（如果 t 是多字符字符串）：

^(?:.*?t){2}(?:(?!t).)*

请参阅另一个正则表达式演示。 (?:.*?t){2} 模式匹配任意 0+ 字符的两次出现，尽可能少，直到第一个 t，然后 < code>(?:(?!t).)* 匹配出现 0 次以上且不启动 t 字符序列的任何字符。

The point here is that the whole .*?(?=t) group pattern can match an empty string. It stops before the first t and cannot "hop thru" because it remains where it is when the lookahead pattern (a non-consuming pattern) matches.

You cannot do it like this, you must consume (and move the regex index) at least one char.

An alternative solution for this concrete case is

^(?:[^t]*t){2}[^t]*

See the regex demo, the ^(?:[^t]*t){2}[^t]* matches the start of string (^), then consumes two occurrences ({2}) of any chars other than t ([^t]*) followed with t, and then again consumes two occurrences ({2}) of any chars other than t.

Or, a general case solution (if t is a multicharacter string):

^(?:.*?t){2}(?:(?!t).)*

See another regex demo. The (?:.*?t){2} pattern matches two occurrences of any 0+ chars, as few as possible, up to the first t, and then (?:(?!t).)* matches any char, 0+ occurrences, that does not start a t char sequence.

回复收藏 0 原文

渔村楼浪 2025-01-25 12:07:10

正如 @CertainPerformance 所说， .* 将匹配模式中的零个或多个字符，但是您使用其惰性版本.*?。
量词的惰性版本将使其匹配尽可能少的字符。
使用与空字符串匹配的量词，这将始终导致零长度匹配。

您需要使用 + 量词代替`，以防止空字符串匹配。

用Python演示：

>>> import re
>>> s = "This is a test string with five t's"
>>> r = r'^(.+?(?=t)){3}'
>>> re.match(r, s)
<_sre.SRE_Match object; span=(0, 16), match='This is a test s'>

As said by @CertainPerformance, .* will match zero or more characters in the pattern, but you use its lazy version .*?.
The lazy version of a quantifier will have it match as few characters as possible.
With a quantifier that matches the empty string, this will always lead to a zero-length match.

You need to use the + quantifier instead`, in order to prevent an empty string match.

Demonstration with Python:

>>> import re
>>> s = "This is a test string with five t's"
>>> r = r'^(.+?(?=t)){3}'
>>> re.match(r, s)
<_sre.SRE_Match object; span=(0, 16), match='This is a test s'>

回复收藏 0 原文

~没有更多了~