用于替换多个电子邮件地址的正则表达式

发布于 2024-12-06 02:40:25 字数 4108 浏览 0 评论 0原文

好的，这是我的情况。我有一个由 WordPress 运行的网站。我需要确保电子邮件混淆，因此安装了一个名为“Graceful Email Obfuscation”的插件。这已经很好用了。问题是我想要一个包罗万象的东西，以防有人不遵循它指定的输入电子邮件地址的规则（即 [email] [电子邮件受保护] [/电子邮件]）。

以下正则表达式非常适合抓取所有电子邮件，但我不希望它触及正确编写为 [email][电子邮件受保护][/email]。我需要添加什么？

// Match any a href="mailto: AND make it optional
$monster_regex = '`(\<a([^>]+)href\=\"mailto\:)?';  

// Match any email address
$monster_regex .= '([^0-9:\\r\\n][A-Z0-9_]+([.][A-Z0-9_]+)*[@][A-Z0-9_]+([.][A-Z0-9_]+)*[.][A-Z]{2,4})'; 

// Now include all its attributes AND make it optional
$monster_regex .= '(\"*\>)?';

// Match any information enclosed in the <a> tag AND make it optional
$monster_regex .= '(.*)?'; 

// Match the closing </a> tag AND make it optional
$monster_regex .= '(\<\/a\>)?`'; 

$monster_regex .= 'im'; // Set the modifiers

preg_match_all($monster_regex, $content, $matches, PREG_SET_ORDER);

我的测试输入是这样的：

<a href = "[email protected]">Tester</a>
[email protected]
<a href = "[email protected]">Hotmail Test</a>
[email][email protected]]

我得到的输出是这样的：

(
    [0] => Array
        (
            [0] => <a href="mailto:[email protected]">Tester</a>

            [1] => <a href="mailto:
            [2] =>  
            [3] => [email protected]
            [4] => 
            [5] => 
            [6] => ">
            [7] => Tester</a>

        )

    [1] => Array
        (
            [0] => [email protected]

            [1] => 
            [2] => 
            [3] => [email protected]
            [4] => 
            [5] => 
            [6] => 
            [7] => 

        )

    [2] => Array
        (
            [0] => <a href="mailto:[email protected]">Hotmail Test</a>

            [1] => <a href="mailto:
            [2] =>  
            [3] => [email protected]
            [4] => 
            [5] => 
            [6] => ">
            [7] => Hotmail Test</a>

        )

    [3] => Array
        (
            [0] => [email][email protected][/email]

            [1] => 
            [2] => 
            [3] => [email][email protected]
            [4] => 
            [5] => 
            [6] => 
            [7] => [/email]

        )
)

提前致谢。

原文

OK so here is my situation. I have a site that is run by WordPress. I need to ensure email obfuscation and as such have installed a plugin called 'Graceful Email Obfuscation'. This works great already. The catch is that I want a catchall in case someone does not follow the rules it specifies for entering email addresses (ie [email] [email protected] [/email]).

The following regex works great at grabbing all the emails BUT I don't want it to touch the ones that are correctly written as [email][email protected][/email]. What do I need to add?

// Match any a href="mailto: AND make it optional
$monster_regex = '`(\<a([^>]+)href\=\"mailto\:)?';  

// Match any email address
$monster_regex .= '([^0-9:\\r\\n][A-Z0-9_]+([.][A-Z0-9_]+)*[@][A-Z0-9_]+([.][A-Z0-9_]+)*[.][A-Z]{2,4})'; 

// Now include all its attributes AND make it optional
$monster_regex .= '(\"*\>)?';

// Match any information enclosed in the <a> tag AND make it optional
$monster_regex .= '(.*)?'; 

// Match the closing </a> tag AND make it optional
$monster_regex .= '(\<\/a\>)?`'; 

$monster_regex .= 'im'; // Set the modifiers

preg_match_all($monster_regex, $content, $matches, PREG_SET_ORDER);

My inputs for testing are this:

<a href = "[email protected]">Tester</a>
[email protected]
<a href = "[email protected]">Hotmail Test</a>
[email][email protected]]

The output I am getting is this:

(
    [0] => Array
        (
            [0] => <a href="mailto:[email protected]">Tester</a>

            [1] => <a href="mailto:
            [2] =>  
            [3] => [email protected]
            [4] => 
            [5] => 
            [6] => ">
            [7] => Tester</a>

        )

    [1] => Array
        (
            [0] => [email protected]

            [1] => 
            [2] => 
            [3] => [email protected]
            [4] => 
            [5] => 
            [6] => 
            [7] => 

        )

    [2] => Array
        (
            [0] => <a href="mailto:[email protected]">Hotmail Test</a>

            [1] => <a href="mailto:
            [2] =>  
            [3] => [email protected]
            [4] => 
            [5] => 
            [6] => ">
            [7] => Hotmail Test</a>

        )

    [3] => Array
        (
            [0] => [email][email protected][/email]

            [1] => 
            [2] => 
            [3] => [email][email protected]
            [4] => 
            [5] => 
            [6] => 
            [7] => [/email]

        )
)

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

百思不得你姐 2024-12-13 02:40:25

那么您想要匹配任何看起来像电子邮件地址的内容，除非它已经包含在 [email]...[/email] 标记中？试试这个：

'%(?>\b[A-Z0-9_]+(?:\.[A-Z0-9_]+)*@[A-Z0-9_]+(?:\.[A-Z0-9_]+)*\.[A-Z]{2,4}\b)(?!\s*\[/email\])%i'

注意：这个答案只解决如何匹配不包含更大结构的东西的问题。我不打算就如何（或是否）将电子邮件地址与正则表达式进行匹配进行争论。我只是从问题中提取了核心正则表达式，用单词边界 (\b）并将其包装在原子组中（( ？>...))。

一旦找到潜在的匹配项，否定先行会断言该地址后面没有结束 [/email] 标记。假设标签正确配对，这意味着地址已经正确标记。如果它们未正确配对，则插件的工作就是捕获它。

当我在这里时，我想对您的正则表达式提供一些评论：

范围表达式Az出现在您的一些字符类中。可能只是一个拼写错误，但有些人用它作为匹配大写或小写字母的习惯用法。这是一个错误，因为它还匹配几个标点符号，这些标点符号的代码点恰好位于两个字母范围之间。（我在编辑问题时修复了这个问题。）
字符 <、>、:、"< /code>、@、= 和 / 在正则表达式中没有特殊含义，不需要转义，也没有什么坏处。任何东西，但正则表达式已经很难阅读了；为什么要扔一堆呢？您不需要的反斜杠和方括号？
(.*)? 中的问号属于括号内：(.*?)。这样它就会不情愿地匹配下一个之前的所有内容。如果没有什么可匹配的，那么它将什么也不匹配。使其成为可选不仅是多余的，还可能导致严重的性能损失。

So you want to match anything that looks like an email address unless it's already enclosed in [email]...[/email] tags? Try this:

'%(?>\b[A-Z0-9_]+(?:\.[A-Z0-9_]+)*@[A-Z0-9_]+(?:\.[A-Z0-9_]+)*\.[A-Z]{2,4}\b)(?!\s*\[/email\])%i'

NB: This answer only addresses the problem of how to match something that's not contained some larger structure. I don't intend to get into a debate over how (or whether) to match email addresses with regexes. I simply extracted the core regex from the question, bracketed it with word boundaries (\b) and wrapped it in an atomic group ((?>...)).

Once a potential match is found, the negative lookahead asserts that the address isn't followed by a closing [/email] tag. Assuming the tags are correctly paired, that means the address already properly tagged. And if they aren't correctly paired, it's the plugin's job to catch it.

While I'm here, I'd like to offer some comments on your regex:

The range expression A-z appeared in some of your character classes. Probably just a typo, but some people use that as an idiom for matching uppercase or lowercase letters. That's an error because it also matches several punctuation characters whose code points happen to lie between the two letter ranges. (I fixed that when I edited the question.)
The characters <, >, :, ", @, = and / have no special meaning in regexes and don't need to be escaped. It doesn't hurt anything, but regexes hard enough to read already; why throw in a bunch of backslashes and square brackets you don't need?
The question mark in (.*)? belongs inside the parens: (.*?). That way it will reluctantly match everything before the next </a>. If there's nothing to match, it will match nothing. Making it optional is not only redundant, it could lead to serious performance penalties.