如何匹配不包含单词的字符串?
为了匹配包含某个单词的字符串,我可以使用模式“/.*word.*/”。但是如何匹配不包含该单词的字符串呢?
示例:
我需要在一个大文本中找到一个子字符串,该子字符串由两个标签 和 括起来,并且里面有一些像“Hello”这样的字符串。我想出的最好的:
"@<div>(.*?Hello.?*)</div>@i"
但它也会匹配序列:
<div>Bye.</div><div>Hello!</div>
而且我不想匹配第一对 div 标签 - 因此我想替换“.*?”类似于“匹配任何字符串,但不包含”的字符串除外。
测试用例:
对于输入字符串:
<div>Bye.</div><div>Hello!</div>
我需要捕获
<div>Hello!</div>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
该问题的更好标题可能是:“匹配包含特定子字符串的
DIV
元素。”首先必须说,正则表达式不是解决此问题的最佳工具工作。最好使用 HTML 解析器来解析标记,然后在每个DIV
元素的内容中搜索所需的子字符串。也就是说,由于您不想更多地了解如何使用正则表达式来匹配非其他内容,因此以下内容描述了使用正则表达式执行此操作的有限方法。正如 Dogbert 正确指出的那样,这个问题确实是 Regular 的重复表达式来匹配不包含单词的字符串?。但是,我发现您已经看过这个问题,但需要知道如何将此技术应用于子模式。
要匹配不包含特定单词(或多个单词)的字符串部分(子模式),您需要在每个字符之前应用否定先行断言检查。以下是对开始和结束
DIV
标记之间的文本执行此操作的方法。请注意,仅使用单个正则表达式时,由于DIV
元素可能嵌套,因此只有在嵌套DIV 的“最里面”查找
元素。"HELLO"
才是合理的伪代码:
DIV
标签。的开头。
"HELLO"
后,继续进行匹配。的开头。
标记。
请注意,要仅匹配“最里面”的 和
DIV
内容,需要在扫描时排除元素的内容一次一个字符。以下是经过测试的 PHP 函数形式的相应正则表达式:
该函数将正确匹配以下测试数据所需的 DIV 元素:
它还会在嵌套 DIV 元素的最里面正确找到“HELLO”:
但是,如前所述,它不会找到位于非最内层嵌套 DIV 元素内的“HELLO”字符串,如下所示:
要做到这一点是一个更加复杂的解决方案。
在很多情况下,该解决方案可能会失败。再次。我建议使用 HTML 解析器。
A better title for the question might be: "Match a
DIV
element containing a specific sub-string." First it must be said that regex is not the best tool for this job. It would be much better to use an HTML parser to parse the markup, then search the contents of eachDIV
element for the desired sub-string. That said, since you wan't to know more about how to use regex to match stuff that is not something else, the following describes a limited way of doing this with a regex.As Dogbert correctly points out, this question really is a duplicate of Regular expression to match string not containing a word?. However, I see that you have looked at that question but need to know how to apply this technique to a subpattern.
To match a part of a string (sub-pattern) which does not include a specific word (or words), you need to apply a negative lookahead assertion check before each and every character. Here is how you would do it for the text between opening and closing
DIV
tags. Note that when using only a single regex, becauseDIV
elements may be nested, it is only reasonable to find"HELLO"
within the "innermost" of nestedDIV
elements.Pseudo code:
DIV
tag.<div
or</div
."HELLO"
is found, go ahead and match it.<div
or</div
.</div>
tag.Note that to match only the "innermost"
DIV
contents, it is necessary to exclude both<DIV
and</DIV
while scanning the element's contents one char at a time. Here is the corresponding regex in the form of a tested PHP function:This function will correctly match the desired DIV element of your following test data:
It will also correctly find "HELLO" within the innermost of nested DIV elements:
But, as stated earlier, it will NOT find the "HELLO" string located within non-innermost nested DIV elements like so:
To do this is a much more complicated solution.
There are lots of cases where this solution can fail. Once again. I recommend using an HTML parser.
你就不能检查一下是否没有匹配到吗?
如果您正在寻找除单词“word”之外的任何内容:
仅当未找到“word”时,才会在
if
下面运行代码。Can't you just check for if you didn't get a match?
If you're looking for anything but the word "word":
This will run code underneath the
if
only if "word" was not found.