正则表达式：匹配两个给定字符串之间的单词（没有空白或类似）

发布于 2025-01-23 20:31:25 字数 310 浏览 2 评论 0原文

我正在尝试获得能够得到单词，而不是在两个给定的字符串之间获得空白的正等式，目前我有一个：

(?<=STR1)(?:\s*)(.*?)(?:\s*)(?=STR2)

我想用它来获得以下结果：

WORD0 STR1    WORD1 WORD2 WORD3  
WORD4 WORD5 STR2 WORD6

我想要一条正则是正则匹配Word1，Word2，Word3，Word4，Word5

PS：我正在与Python合作，谢谢您

原文

I am trying to get a regex that is able to get the words, not getting the blank spaces, between two given strings, at this moment I have this one:

(?<=STR1)(?:\s*)(.*?)(?:\s*)(?=STR2)

I want to use it to get the following results:

WORD0 STR1    WORD1 WORD2 WORD3  
WORD4 WORD5 STR2 WORD6

I want a regex that matches WORD1,WORD2,WORD3,WORD4,WORD5

PS: I am working with python, and thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

咋地 2025-01-30 20:31:25

您不能使用re做到这一点，因为1）它不支持未知的长度lougbehind模式，而2）它不支持\ g运算符，可用于匹配字符串在两个字符串之间。

因此，您可以做的是pip安装regex，然后使用

import regex
text = "WORD0 STR1    WORD1 WORD2 WORD3  \nWORD4 WORD5 STR2 WORD6"
print( regex.findall(r"(?<=STR1.*)\w+(?=.*STR2)", text, regex.DOTALL) )
# => ['WORD1', 'WORD2', 'WORD3', 'WORD4', 'WORD5']

python demo 。 详细信息：

（？当前位置
\ w+ - 一个或多个单词chars
（？=。*str2） - 正面lookahead匹配任何零或更多chars和str2 立即在当前位置的右侧。

You cannot do that with re because 1) it does not support unknown length lookbehind patterns and 2) it has no support for \G operator that can be used to match strings in between two strings.

So, what you can do is pip install regex, and then use

import regex
text = "WORD0 STR1    WORD1 WORD2 WORD3  \nWORD4 WORD5 STR2 WORD6"
print( regex.findall(r"(?<=STR1.*)\w+(?=.*STR2)", text, regex.DOTALL) )
# => ['WORD1', 'WORD2', 'WORD3', 'WORD4', 'WORD5']

See the Python demo. Details:

(?<=STR1.*) - a positive lookbehind matching STR1 and any zero or more chars immediately to the left of the current location
\w+ - one or more word chars
(?=.*STR2) - a positive lookahead matching any zero or more chars and STR2 immediately to the right of the current location.

回复收藏 0 原文

肩上的翅膀 2025-01-30 20:31:25

假设'str1'和'str2'在场，您可以编写以下

str = "WORD0 STR1    WORD1 WORD2 WORD3\nWORD4 WORD5 STR2 WORD6"

rgx = r'\b(?!.*\bSTR1\b)\w+(?=.*\bSTR2\b)'

re.findall(rgx, str, re.S) 
  #=> ['WORD1', 'WORD2', 'WORD3', 'WORD4', 'WORD5']

re.s（与相同。 dotall）导致周期匹配所有字符，包括线终结者。

regex demo ^{^{_{_{＆lt; -}}}}< sub> \（ツ）/ ^{_{- ＆gt;}}python demo 正

则表达式可以分解如下。

\b          # match a word boundary
(?!         # begin a negative lookahead
  .*        # match zero or more characters
  \bSTR1\b  # match 'STR1' with word boundaries
)           # end negative lookahead
\w+         # match zero or more word characters
(?=         # begin a positive lookahead
  .*        # match zero or more characters
  \bSTR1\b  # match 'STR2' with word boundaries
)           # end positive lookahead

请注意，负lookahead确保匹配的单词（\ w+）不会后面'str1'，在这种情况下必须先于该字符串。

根据要求，\ w+可能会替换为[az]+\ d+或其他东西。

另请注意，表达式开始时的单词边界（\ b）是避免匹配'tr1'。

Assuming 'STR1' and 'STR2' are known to be present you can write the following

str = "WORD0 STR1    WORD1 WORD2 WORD3\nWORD4 WORD5 STR2 WORD6"

rgx = r'\b(?!.*\bSTR1\b)\w+(?=.*\bSTR2\b)'

re.findall(rgx, str, re.S) 
  #=> ['WORD1', 'WORD2', 'WORD3', 'WORD4', 'WORD5']

re.S (same as re.DOTALL) causes periods to match all characters, including line terminators.

Regex demo^_<-_\(ツ)/^_->Python demo

The regular expression can be broken down as follows.

\b          # match a word boundary
(?!         # begin a negative lookahead
  .*        # match zero or more characters
  \bSTR1\b  # match 'STR1' with word boundaries
)           # end negative lookahead
\w+         # match zero or more word characters
(?=         # begin a positive lookahead
  .*        # match zero or more characters
  \bSTR1\b  # match 'STR2' with word boundaries
)           # end positive lookahead

Note that the negative lookahead ensures that the matched word (\w+) is not followed by 'STR1', in which case it must be preceded by that string.

Depending on requirements, \w+ might replaced with [A-Z]+\d+ or something else.

Also note that the word boundary (\b) at the beginning of the expression is to avoid matching 'TR1'.

回复收藏 0 原文

~没有更多了~