正则表达式查找标签内没有特定短语的 HTML 元素
我需要匹配在开始
和结束 之间不包含短语“Story”的
标签。
元素。
元素永远不会嵌套,所以我认为我应该能够使用正则表达式来做到这一点 - 请不要回复说这是不可能的,除非它真的是!
这是我将使用 perl 或 vim 搜索的文本示例(我发现在 vim 中测试正则表达式更容易):
<output_channels>
<output_channel>RSS</output_channel>
<output_channel>Story</output_channel>
</output_channels>
<output_channels>
<output_channel>RSS</output_channel>
</output_channels>
我想我需要运行如下所示的内容,但这与 匹配
块:
<output_channels>.*?((?!Story).)*?<\/output_channels>
I need to match <output_channels>
elements which don't contain the phrase 'Story' between the opening <output_channels>
and closing </output_channels>
tags. <output_channels>
elements are never nested, so I think I should be able to do this with regex - please don't reply that it's impossible unless it genuinely is!
Here's an example of the text I'll be searching in, using either perl or vim (I find it easier to test regexes in vim):
<output_channels>
<output_channel>RSS</output_channel>
<output_channel>Story</output_channel>
</output_channels>
<output_channels>
<output_channel>RSS</output_channel>
</output_channels>
I'm thinking I need to run something like the following, but this matches both <output_channels>
blocks:
<output_channels>.*?((?!Story).)*?<\/output_channels>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用搜索词:
这只会与上面的第二个
元素匹配,因为它没有Story
。\_s
将匹配任何空白字符,包括换行符\_.
将匹配任何字符,包括换行符{-}
是在 vim 中创建一个非贪婪模式\@!
是否定前面的模式匹配\(
和\)
用于对模式进行分组Use search term:
This will match your 2nd
<output_channels>
element only above since it doesn't have<output_channel>Story</output_channel>
.\_s
will match any white space character including new line\_.
will match any character including new line{-}
is to make a pattern non-greedy in vim\@!
is to negate preceding pattern match\(
and\)
is for grouping the pattern这可能需要一些调整,具体取决于您的整个 XML 文件的外观,但它适用于您的示例:
This might need some adjustment depending on what your whole XML file looks like, but it works with your example:
您需要删除第一个
.*?
。发生的情况是,在((?!Story).)*?
部分正确地未能将内容与其中的Story
匹配之后,正则表达式引擎回溯并给出.*?
对其进行了破解,当然成功了。当然,假设您在/s
(单行或点匹配全部)模式下进行匹配。You need to get rid of that first
.*?
. What's happening is, after the((?!Story).)*?
part correctly fails to match content withStory
in it, the regex engine backtracks and gives the.*?
a crack at it, and of course it succeeds. Assuming, of course, that you're matching in/s
(single-line or dot-matches-all) mode.