负向后看正则表达式捕获的问题
我尝试匹配电子邮件地址,但前提是它们前面没有“mailto:”。我尝试这个正则表达式:
"/(?
对此字符串:
'[电子邮件受保护] “>电子邮件 ... [电子邮件受保护] '
我希望仅捕获 '[电子邮件受保护]'
,但我也收到 ' <一href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f29d9f97979f939b9eb2969d9f939b9cdc919d9f">[电子邮件受保护]'
- 查看缺少的 '
。我想知道这里出了什么问题。在后行断言之后我不能有一个正常的正则表达式吗?
我的整个 PHP 示例如下:
$testString = '<a href="mailto:[email protected]">EMAIL</a> ... [email protected] ';
$pattern = "/(?<!mailto:)[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/";
preg_match_all($pattern, $testString, $matches);
echo('<pre>');print_r($matches);echo('</pre>');
谢谢!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
因为在
s
之后有一个与您的正则表达式匹配的字符串,[电子邮件受保护]
,并且因为s
很难与mailto:
匹配。在其中设置单词边界适用于大多数情况:更改:
至:
旁注:使用 example.com 为例,domain.com 由实际公司拥有。
Because after
s
there is a string that matches your regex,[email protected]
, and becauses
is hardlymailto:
it matches. Getting a word boundary in there will work for most cases:Change:
To:
On a side note: use example.com for examples, domain.com is owned by an actual company.
它尝试匹配“someemail@”,但失败,因为它前面紧接着“mailto:”,所以它尝试匹配“omeemail@”,它成功,因为它前面没有紧接着“mailto:”。
编辑:它认为将
(? 更改为
(?!mailto:)
效果最好。@Wrikken:正则表达式允许“。”在电子邮件地址中,但如果您有
(? 那么“mailto:some.email@”将从“email@”中匹配。
It tries to match at "someemail@", but fails because it's immediately preceded by "mailto:", so then it tries to match at "omeemail@", which succeeds because it's not immediately preceded by "mailto:".
EDIT: It think that changing
(?<!mailto:)
to(?!mailto:)
works best.@Wrikken: The regex permits "." in the email address, but if you have
(?<!mailto:)\b
then "mailto:some.email@" will be matched from "email@".因此,根据 @Wrikken 和 @MRAB 的提示,我们提出了最终且有效的正则表达式:
"/(?
重要的是要使用前瞻作为否定后瞻之后的“电子邮件边界”。
So with tips from @Wrikken and @MRAB we come up with the final and working regex:
"/(?<!mailto:)(?<=^|[^A-Za-z0-9_.+@-])[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/"
The important thing was to use a lookahead serving as an "email boundary" after the negative lookbehind.