正则表达式查找标签内没有特定短语的 HTML 元素

发布于 2024-11-05 10:09:40 字数 773 浏览 1 评论 0原文

我需要匹配在开始和结束 之间不包含短语“Story”的元素。 标签。元素永远不会嵌套，所以我认为我应该能够使用正则表达式来做到这一点 - 请不要回复说这是不可能的，除非它真的是！

这是我将使用 perl 或 vim 搜索的文本示例（我发现在 vim 中测试正则表达式更容易）：

<output_channels>
  <output_channel>RSS</output_channel>
  <output_channel>Story</output_channel> 
</output_channels>

<output_channels>
  <output_channel>RSS</output_channel>
</output_channels>

我想我需要运行如下所示的内容，但这与 匹配 块：

<output_channels>.*?((?!Story).)*?<\/output_channels>

原文

I need to match <output_channels> elements which don't contain the phrase 'Story' between the opening <output_channels> and closing </output_channels> tags. <output_channels> elements are never nested, so I think I should be able to do this with regex - please don't reply that it's impossible unless it genuinely is!

Here's an example of the text I'll be searching in, using either perl or vim (I find it easier to test regexes in vim):

<output_channels>
  <output_channel>RSS</output_channel>
  <output_channel>Story</output_channel> 
</output_channels>

<output_channels>
  <output_channel>RSS</output_channel>
</output_channels>

I'm thinking I need to run something like the following, but this matches both <output_channels> blocks:

<output_channels>.*?((?!Story).)*?<\/output_channels>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浮萍、无处依 2024-11-12 10:09:41

使用搜索词：

<output_channels>\_s\{-}\(\(<output_channel>\_s\{-}Story\_s\{-}<\/output_channel>\)\@!\_.\)\{-}\_s\{-}<\/output_channels>

这只会与上面的第二个元素匹配，因为它没有 Story。

\_s 将匹配任何空白字符，包括换行符
\_. 将匹配任何字符，包括换行符
{-} 是在 vim 中创建一个非贪婪模式
\@! 是否定前面的模式匹配
\( 和 \) 用于对模式进行分组

Use search term:

<output_channels>\_s\{-}\(\(<output_channel>\_s\{-}Story\_s\{-}<\/output_channel>\)\@!\_.\)\{-}\_s\{-}<\/output_channels>

This will match your 2nd <output_channels> element only above since it doesn't have <output_channel>Story</output_channel>.

\_s will match any white space character including new line
\_. will match any character including new line
{-} is to make a pattern non-greedy in vim
\@! is to negate preceding pattern match
\( and \) is for grouping the pattern

回复收藏 0 原文

痴骨ら 2024-11-12 10:09:41

这可能需要一些调整，具体取决于您的整个 XML 文件的外观，但它适用于您的示例：

<output_channels>(?:\s*<output_channel>(?!Story)[^<]+<\/output_channel>\s*)+<\/output_channels>

This might need some adjustment depending on what your whole XML file looks like, but it works with your example:

<output_channels>(?:\s*<output_channel>(?!Story)[^<]+<\/output_channel>\s*)+<\/output_channels>

回复收藏 0 原文

天赋异禀 2024-11-12 10:09:41

您需要删除第一个 .*?。发生的情况是，在 ((?!Story).)*? 部分正确地未能将内容与其中的 Story 匹配之后，正则表达式引擎回溯并给出 .*? 对其进行了破解，当然成功了。当然，假设您在 /s （单行或点匹配全部）模式下进行匹配。

回复收藏 0 原文

~没有更多了~

关于作者

囚你心

暂无简介

0 文章

0 评论

494 人气

关注发私信

友情链接

文江博客

正则表达式查找标签内没有特定短语的 HTML 元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

正则表达式查找标签内没有特定短语的 HTML 元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。