电子邮件的正则表达式不以替换脚本结尾

发布于 2024-09-03 07:53:28 字数 1405 浏览 4 评论 0原文

我目前正在为此修改我的正则表达式:

提取电子邮件基本上

,我正在制作另一个使用 ROT13 的混淆器,通过解析包含 mailtoreferrer 的所有链接的文本块(使用 hpricot)。一个无法捕捉到的用例是,如果用户只是输入电子邮件地址(没有通过tinymce将其转换为链接),

那么这是我的方法的基本流程: 1. 解析文本块中包含 href="mailto:..." 的所有标签 2. 用 javascript 函数替换每个标签,将其更改为 ROT13(使用此脚本:http://unixmonkey.净/?p=20) 3. 一旦所有链接都被混淆,将生成的文本块传递到另一个解析所有电子邮件的函数中(这个函数有一个电子邮件正则表达式,可以反转电子邮件地址,然后向该电子邮件添加一个跨度 - 将其反转回来)

第 3 步应该清理不在 href 标签中的剩余电子邮件的文本块(意味着它没有被 hpricot 解析)。问题是我的正则表达式仍然可以找到转换为 ROT13 的电子邮件。我想要捕获的只是未转换为 ROT13 的电子邮件。

我该怎么做?好吧,所有已转换的电子邮件都有一个尾随的“'.replace”。意思是,我需要获取所有没有该字符串的电子邮件。到目前为止我有这个正则表达式:

/\b([A-Z0-9._%+-]+@[A-Z0-9.-]+.[AZ]{2,4}('.replace)) \b/i

但这会获取所有带有尾随 '.replace 的电子邮件,我想要得到相反的结果,而我目前对此感到困惑。正则表达式专家有什么帮助吗?

更多信息:

这是正则表达式+解析的文本块:

http://www.rubular.com/ r/NqXIHrNqjI

如您所见,前两个“电子邮件地址”已使用 ROT13 进行混淆。我需要一个正则表达式来获取电子邮件 [email protected][电子邮件受保护]

I'm currently modifying my regex for this:

Extracting email addresses in an html block in ruby/rails

basically, im making another obfuscator that uses ROT13 by parsing a block of text for all links that contain a mailto referrer(using hpricot). One use case this doesn't catch is that if the user just typed in an email address(without turning it into a link via tinymce)

So here's the basic flow of my method:
1. parse a block of text for all tags with href="mailto:..."
2. replace each tag with a javascript function that changes this into ROT13 (using this script: http://unixmonkey.net/?p=20)
3. once all links are obfuscated, pass the resulting block of text into another function that parses for all emails(this one has an email regex that reverses the email address and then adds a span to that email - to reverse it back)

step 3 is supposed to clean the block of text for remaining emails that AREN'T in a href tags(meaning it wasn't parsed by hpricot). Problem with this is that the emails that were converted to ROT13 are still found by my regex. What i want to catch are just emails that WEREN'T CONVERTED to ROT13.

How do i do this? well all emails the WERE CONVERTED have a trailing "'.replace" in them. meaning, i need to get all emails WITHOUT that string. so far i have this regex:

/\b([A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}('.replace))\b/i

but this gets all the emails with the trailing '.replace i want to get the opposite and I'm currently stumped with this. any help from regex gurus out there?

MORE INFO:

Here's the regex + the block of text im parsing:

http://www.rubular.com/r/NqXIHrNqjI

as you can see, the first two 'email addresses' are already obfuscated using ROT13. I need a regex that gets the emails [email protected] and [email protected]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

嗼ふ静 2024-09-10 07:53:28

关于负向先行

您可以使用负向先行来断言模式匹配。

例如,以下正则表达式匹配所有不以 ".replace" 字符串结尾的字符串:

^(?!.*\.replace$).*$

再举一个例子,此正则表达式匹配所有 a*b*,除了 < code>aabb:

^(?!aabb$)a*b*$

理想情况下,

另请参阅


具体解决

方案以下正则表达式适用于这种情况:(参见 rubular.com):

/\b([A-Z0-9._%+-]+@(?![A-Z0-9.-]*'\.replace\b)[A-Z0-9.-]+\.[A-Z]{2,4})\b/i

On negative lookaheads

You can use a negative lookahead to assert that a pattern doesn't match.

For example, the following regex matches all strings that doesn't end with ".replace" string:

^(?!.*\.replace$).*$

As another example, this regex matches all a*b*, except aabb:

^(?!aabb$)a*b*$

Ideally,

See also


Specific solution

The following regex works in this scenario: (see on rubular.com):

/\b([A-Z0-9._%+-]+@(?![A-Z0-9.-]*'\.replace\b)[A-Z0-9.-]+\.[A-Z]{2,4})\b/i
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文