正则表达式:匹配两个给定字符串之间的单词(没有空白或类似)
我正在尝试获得能够得到单词,而不是在两个给定的字符串之间获得空白的正等式,目前我有一个:
(?<=STR1)(?:\s*)(.*?)(?:\s*)(?=STR2)
我想用它来获得以下结果:
WORD0 STR1 WORD1 WORD2 WORD3
WORD4 WORD5 STR2 WORD6
我想要一条正则是正则匹配Word1,Word2,Word3,Word4,Word5
PS:我正在与Python合作,谢谢您
I am trying to get a regex that is able to get the words, not getting the blank spaces, between two given strings, at this moment I have this one:
(?<=STR1)(?:\s*)(.*?)(?:\s*)(?=STR2)
I want to use it to get the following results:
WORD0 STR1 WORD1 WORD2 WORD3
WORD4 WORD5 STR2 WORD6
I want a regex that matches WORD1,WORD2,WORD3,WORD4,WORD5
PS: I am working with python, and thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您不能使用
re
做到这一点,因为1)它不支持未知的长度lougbehind模式,而2)它不支持\ g
运算符,可用于匹配字符串在两个字符串之间。因此,您可以做的是
pip安装regex
,然后使用python demo 。 详细信息:
(?当前位置
\ w+
- 一个或多个单词chars(?=。*str2)
- 正面lookahead匹配任何零或更多chars和str2
立即在当前位置的右侧。You cannot do that with
re
because 1) it does not support unknown length lookbehind patterns and 2) it has no support for\G
operator that can be used to match strings in between two strings.So, what you can do is
pip install regex
, and then useSee the Python demo. Details:
(?<=STR1.*)
- a positive lookbehind matchingSTR1
and any zero or more chars immediately to the left of the current location\w+
- one or more word chars(?=.*STR2)
- a positive lookahead matching any zero or more chars andSTR2
immediately to the right of the current location.假设
'str1'
和'str2'
在场,您可以编写以下re.s
(与相同。 dotall
)导致周期匹配所有字符,包括线终结者。regex demo &lt; - < sub> \(ツ)/ - &gt; python demo 正
则表达式可以分解如下。
请注意,负lookahead确保匹配的单词(
\ w+
)不会后面'str1'
,在这种情况下必须先于该字符串。根据要求,
\ w+
可能会替换为[az]+\ d+
或其他东西。另请注意,表达式开始时的单词边界(
\ b
)是避免匹配'tr1'
。Assuming
'STR1'
and'STR2'
are known to be present you can write the followingre.S
(same asre.DOTALL
) causes periods to match all characters, including line terminators.Regex demo<-\(ツ)/->Python demo
The regular expression can be broken down as follows.
Note that the negative lookahead ensures that the matched word (
\w+
) is not followed by'STR1'
, in which case it must be preceded by that string.Depending on requirements,
\w+
might replaced with[A-Z]+\d+
or something else.Also note that the word boundary (
\b
) at the beginning of the expression is to avoid matching'TR1'
.