使用正则表达式来匹配两个字符串之间的字符串

发布于 2024-08-16 05:21:20 字数 1468 浏览 7 评论 0原文

如何使用正则表达式来匹配两个字符串之间的文本,其中这两个字符串本身包含两个其他字符串,并且内部和外部封闭字符串之间有任意数量的文本?

例如,我有这样的文字:

outer-start some text inner-start text-that-i-want inner-end some更多文本外端

因为它位于inner-startinner-end之间,它们本身位于outer-startouter-end之间。

如果我有

一些文本内部开始我想要的文本内部结束一些更多文本外部结束

然后 还有一些文本outer-end我不想要text-that-i-want,因为虽然它在inner-startinner-end之间,但没有outer-start 包含这些字符串。

同样,如果我有

outer-start一些文本text-that-i-wantinner-end一些更多文本outer-end

然后再说一次,我不想要 text-that-i-want,因为没有封闭的 inner-start,尽管有封闭的 outer-start 和 outer-end 字符串。

假设仅使用 outer-startinner-startinner-endouter-end出于封闭/界定的目的。

我认为我可以通过进行两次正则表达式匹配来做到这一点,即查找 outer-startouter-end 之间的任何数据,然后在该数据中查找对于 inner-startinner-end 之间的任何文本(如果这些字符串确实存在),但我想知道是否可以一次性完成。

How can I use a regular expression to match text that is between two strings, where those two strings are themselves enclosed two other strings, with any amount of text between the inner and outer enclosing strings?

For example, I have this text:

outer-start some text inner-start text-that-i-want inner-end some more text outer-end

In this case, I want text-that-i-want because it is between inner-start and inner-end, which themselves are between outer-start and outer-end.

If I have

some text inner-start text-that-i-want inner-end some more text outer-end

then I don't want text-that-i-want, because although it is between inner-start and inner-end, there is no outer-start enclosing these strings.

Likewise, if I have

outer-start some text text-that-i-want inner-end some more text outer-end

then again, I don't want text-that-i-want, because there is no enclosing inner-start, although there are enclosing outer-start and outer-end strings.

Assume that outer-start, inner-start, inner-end and outer-end will only ever be used for the purposes of enclosing/delimiting.

I reckon that I can do this by doing a two pass regular expression match, i.e. looking for any data between outer-start and outer-end, and then within that data looking for any text between inner-start and inner-end (if indeed those strings exist), but I would like to know if it can be done in one go.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

岁月静好 2024-08-23 05:21:20
/outer-start.*?inner-start(.*?)inner-end.*?outer-end/

当存在多个“我想要的文本”时,您需要使用最小匹配来防止正则表达式引擎发生故障,例如:

“outer-start some text inside-start first-text-that-i-want inside -end some more textouter-endouter-start一些文本inner-startsecond-text-that-i-wantinner-end一些更多文本outer-end”

如果没有最小匹配,您将得到令人费解的单个匹配,“第二个-我想要的文字”。

这 。*?表示“吃掉零个或多个字符,但只要使表达式的其余部分匹配就可以了。使用 ? 时,只要表达式的其余部分匹配,正则表达式引擎就会吃掉尽可能多的字符。

/outer-start.*?inner-start(.*?)inner-end.*?outer-end/

You need to use minimal matching to keep the regexp engine from malfunctioning when there are multiple "texts-that-i-want"s, for example:

"outer-start some text inner-start first-text-that-i-want inner-end some more text outer-end outer-start some text inner-start second-text-that-i-want inner-end some more text outer-end"

Without minimal matching, you'll get the puzzling single match, "second-text-that-i-want".

The .*? means "eat zero or more characters, but only as many as you need to to make the rest of the expression match. With the ?, a regexp engine will eat as many characters as it can as long as the rest of the expression matches.

‖放下 2024-08-23 05:21:20

我想你可以做这样的事情:


outer-start .*? inner-start (.*?) inner-end .*? outer-end

I imagine you can do something like:


outer-start .*? inner-start (.*?) inner-end .*? outer-end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文