负向后看正则表达式捕获的问题

发布于 2024-11-25 16:35:37 字数 1500 浏览 5 评论 0 原文

我尝试匹配电子邮件地址,但前提是它们前面没有“mailto:”。我尝试这个正则表达式:

"/(?

对此字符串: '[电子邮件受保护] “>电子邮件 ... [电子邮件受保护] '

我希望仅捕获 '[电子邮件受保护]',但我也收到 ' <一href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f29d9f97979f939b9eb2969d9f939b9cdc919d9f">[电子邮件受保护]' - 查看缺少的 ' 。我想知道这里出了什么问题。在后行断言之后我不能有一个正常的正则表达式吗?

我的整个 PHP 示例如下:

$testString = '<a href="mailto:[email protected]">EMAIL</a>  ...   [email protected] ';
$pattern = "/(?<!mailto:)[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/";
preg_match_all($pattern, $testString, $matches);
echo('<pre>');print_r($matches);echo('</pre>');

谢谢!

I try to match email addresses but only when they are not preceeded with "mailto:". I try this regular expression:

"/(?<!mailto:)[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/"

against this string:
'<a href="mailto:[email protected]">EMAIL</a> ... [email protected] '

I would expect to catch only '[email protected]', but I also receive '[email protected]' - see missing 's'. I wonder what's wrong here. Can't I have a normal regex after the lookbehind assertion?

My whole example in PHP looks like:

$testString = '<a href="mailto:[email protected]">EMAIL</a>  ...   [email protected] ';
$pattern = "/(?<!mailto:)[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/";
preg_match_all($pattern, $testString, $matches);
echo('<pre>');print_r($matches);echo('</pre>');

Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

枫以 2024-12-02 16:35:37

因为在 s 之后有一个与您的正则表达式匹配的字符串,[电子邮件受保护],并且因为s 很难与 mailto: 匹配。在其中设置单词边界适用于大多数情况:

更改:

(?<!mailto:)

至:

(?<!mailto:)\b

旁注:使用 example.com 为例,domain.com 由实际公司拥有。

Because after s there is a string that matches your regex, [email protected], and because s is hardly mailto: it matches. Getting a word boundary in there will work for most cases:

Change:

(?<!mailto:)

To:

(?<!mailto:)\b

On a side note: use example.com for examples, domain.com is owned by an actual company.

随心而道 2024-12-02 16:35:37

它尝试匹配“someemail@”,但失败,因为它前面紧接着“mailto:”,所以它尝试匹配“omeemail@”,它成功,因为它前面没有紧接着“mailto:”。

编辑:它认为将 (? 更改为 (?!mailto:) 效果最好。

@Wrikken:正则表达式允许“。”在电子邮件地址中,但如果您有 (? 那么“mailto:some.email@”将从“email@”中匹配。

It tries to match at "someemail@", but fails because it's immediately preceded by "mailto:", so then it tries to match at "omeemail@", which succeeds because it's not immediately preceded by "mailto:".

EDIT: It think that changing (?<!mailto:) to (?!mailto:) works best.

@Wrikken: The regex permits "." in the email address, but if you have (?<!mailto:)\b then "mailto:some.email@" will be matched from "email@".

潜移默化 2024-12-02 16:35:37

因此,根据 @Wrikken 和 @MRAB 的提示,我们提出了最终且有效的正则表达式:
"/(?

重要的是要使用前瞻作为否定后瞻之后的“电子邮件边界”。

So with tips from @Wrikken and @MRAB we come up with the final and working regex:
"/(?<!mailto:)(?<=^|[^A-Za-z0-9_.+@-])[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/"

The important thing was to use a lookahead serving as an "email boundary" after the negative lookbehind.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文