电子邮件混淆真的会使自动收集变得更加困难吗?
许多用户和论坛程序试图通过混淆来使自动电子邮件地址收集更加困难 - @ 被替换为“at”和 。 被替换为“点”,所以
[email protected]
现在
team at stackoverflow dot com
我不是正则表达式方面的专家,我真的很好奇 - 这种混淆真的会让自动收获变得更困难吗? 自动识别这种混淆的地址真的更难吗?
Many users and forum programs in attempt to make automatic e-mail address harversting harder conseal them via obfuscation - @ is replaced with "at" and . is replaced with "dot", so
[email protected]
now becomes
team at stackoverflow dot com
I'm not an expert in regular expressions and I'm really curious - does such obfuscation really make automatic harvesting harder? Is it really much harder to automatically identify such obfuscated addresses?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
确实!
我阅读了这篇文章 不久前显示了各种方法的有效性(以及相对程度)。
目前,反转已经反转的字符串似乎是相当不错的保护。
以下代码示例:
将输出电子邮件,以便至少可读。
也就是说,这几乎是一场军备竞赛。 但只要你处于领先地位,获取你的地址就会比普通的未混淆的地址付出更多的努力。
Definitely!
I read this article a while ago which shows how effective (as well as the relative degree) the various methods can be.
Reversing an already reversed string seems to be fairly decent protection at the moment.
The following code sample:
Will output the email so it's readable at least.
That said, it is almost an arms race. But as long at you're ahead of the curve, it'll be more effort to harvest your address rather than ordinary un-obfuscated ones.
混淆技术与验证码属于同一类别。 它们不可靠,并且比机器人对普通用户的伤害更大。
JavaScript 混淆似乎受到赞扬,但并不是什么灵丹妙药:如今,自动化浏览器的电子邮件嗅探并不难。 如果能在浏览器中显示出来,那就可以收获了。 您甚至可以想象一个机器人会截取浏览器窗口的屏幕截图并使用 OCR 提取地址来击败您价值数百万美元的混淆技术。
根据您想要混淆电子邮件的位置和原因,这些技术可能很有用:
限制电子邮件可见性:您可以在您的网站/论坛上向匿名用户、新用户(几乎没有什么用处)隐藏电子邮件。迄今为止没有任何活动或帖子),甚至完全隐藏它们,并使用内置的私人消息传递功能取代成员之间的电子邮件联系。
使用专门的垃圾邮件过滤电子邮件:您会收到垃圾邮件,但仅限于该特定地址。 当您需要向任何用户公开电子邮件地址时,这是一个很好的权衡。
使用联系表单:虽然机器人非常擅长填写表单,但事实证明它们太擅长填写表单了。 隐藏字段技术可以过滤大部分即将到来的垃圾邮件通过您的联系表单。
Obfuscation techniques falls in the same category than captchas. They are not reliable and tend to hurt regular users more than bots.
Javascript obfuscation seems to be praised, but is no silver bullet : it is not that hard today to automate a browser for email sniffing. If it can be displayed in a browser, it can be harvested. You could even imagine a bot that's taking screenshots of a browser window and using OCR to extract addresses to beat your million-dollar-obfuscation-technique.
Depending on where and why you want to obfuscate emails, those techniques could be useful :
Restrict email visibility : you may hide emails on your website/forum to anonymous users, to new users (with little to no activity or posts to date) or even hide them completely and replace email contact between members with a built-in private messaging feature.
Use a dedicated spam-filtered email : you will get spammed, but it will be limited to this particular address. This is a good trade-off when you need to expose the email address to any user.
Use a contact form : while bots are pretty good at filling forms, it turns out that they are too good at filling forms. Hidden field techniques can filter most of the spam coming through your contact form.
当我看到这种类型的混淆时,我也立即想到了正则表达式。 以这种方式收集“混淆”的电子邮件是小菜一碟。
我曾经有一个想法,以这种方式公开我的电子邮件地址:
你可以在这里给我发邮件:
谁没有成功,就没有通过我的基本智力测试。
When I see this type of obfuscation I also immediately think of regular expressions. It's a piece of cake to harvest emails "obfuscated" in this manner.
I once came with an idea to publish my email address in this way:
You can mail me here:
Whoever does not make it out, has failed my basic intelligence test.
垃圾邮件发送者和您的用户很难识别电子邮件地址。
维基百科上一篇关于电子邮件混淆或地址修改的好文章
来自此处
It will be difficult for the spammers as well as your users to identify the email address.
A nice article from wikipedia on Email obfuscation or address munging
From here
我不确定它是否真的有助于处理垃圾邮件 - 但我已经学会喜欢 Escape Encode mailto 的混淆:标签/电子邮件。 示例标记:
邮件[电子邮件受保护]
I'm not sure if it really helps with spam - but I've learned to love the Escape Encode Obfuscation for mailto: tags/emails. An example tag:
Mails [email protected]
这类似于在前门上贴上“受 ADT 保护”的贴纸。
这能阻止有才华的窃贼进入你的房子吗? 当然不是。
它会让隔壁的门没锁、窗户上放着 iPod 的房子成为更引人注目的目标吗? 很有可能。
一个简单的、未混淆的电子邮件抓取工具将获取大量电子邮件。 也许一个非常简单的正则表达式来获取非常常见的混淆方法是值得的。 除此之外,您会花费大量时间尝试破译越来越少的电子邮件。
综上所述,进行一些巧妙的混淆可能是值得的。
根据记录,我的电子邮件多年来一直以纯文本形式出现在我的公开简历中,因为我使用 gmail,它有一个有效的垃圾邮件过滤器。
It's analagous to putting a "protected by ADT" sticker on your front door.
Will that prevent a talented burglar from entering your house? Of course not.
Will it make the house next door with an unlocked door and an iPod in the window a more compelling target? Pretty likely.
A simple unobfuscated email scraper is going to get TONS of emails as it is. Maybe a very simple regex to pick up very common obfuscation methods is worth the effort. Past that, you're spending a lot of time trying to decipher an increasingly small percentage of emails.
All that to say, having some clever obfuscation is probably worth it.
For the record, my email has been on my public resume in plain text for years now, because I use gmail, which has a spam filter that works.
我想知道为什么到目前为止没有人提到 ALAs 解决方案。
Roel Van Gils 在 2007 年写了一篇关于Graceful Email Obfuscation的文章
Graceful Email Obfuscation 只是一种JavaScript 电子邮件混淆技术,具有联系表单后备。
mailto:[电子邮件受保护]
→contact/mail+example+com
→contact/znvy+rknzcyr+pbz
mailto:[email protected]
contact/znvy+rknzcyr+pbz
作为后备。 由于 URL,联系表单将知道将电子邮件发送到哪里。http://www.alistapart.com/articles/gracefulemailobfuscation/
I was wondering why nobody mentioned ALAs solution so far.
Roel Van Gils wrote an Article about Graceful Email Obfuscation in 2007
Graceful Email Obfuscation is simply a JavaScript Email Obfuscation technique with a contact form fallback.
mailto:[email protected]
→contact/mail+example+com
→contact/znvy+rknzcyr+pbz
contact/znvy+rknzcyr+pbz
is converted back tomailto:[email protected]
contact/znvy+rknzcyr+pbz
as a fallback. The contact form will know where to send the email because of the url.http://www.alistapart.com/articles/gracefulemailobfuscation/
这确实让事情变得更加困难,但是有太多真正聪明的抓取工具,它可能没有多大帮助,因为大型垃圾邮件发送者正在使用高质量的垃圾邮件工具。
It does make it harder but there are so many really smart scrapers that it probably doesn't help a lot, since the big spammers are using the high quality spam tools.
如何对抗垃圾邮件发送者? 使电子邮件地址对于没有大脑的东西(即计算机)来说难以识别。
非英语人士是您的朋友:如果您的用户群是非英语社区,请切换到使用其他语言进行混淆:team_małpa_stackoverlow_kropka_com 或 team_Affenschwanz_stackoverflow_Punkt_com 分别是波兰语和德语社区的完全可识别的电子邮件地址。 一些电子邮件收割者懂波兰语或德语,但大多数收割者可能只懂英语。
如果您不能离开英语,那么请切换到一些描述性短语 - 例如:“为了向我们发送消息,请在您的地址字段中写下团队,然后输入符号 AT,然后写下我们网站的名称!”。
How to fight spamers? Make email address less recognizable for something without brain (i.e. computer).
Non-English speakers are your friends: if your user base is non-English speaking community, switch to obfuscating using other languages: team_małpa_stackoverlow_kropka_com or team_Affenschwanz_stackoverflow_Punkt_com are perfectly recognizable email addresses for respectively Polish- and German-speaking communities. Some email harvesters know Polish or German, but chance is most of harvesters will understand only English.
If you cannot leave English, than switch to some descriptive phrases- like: “in order to send us message write team in your address field, than put symbol AT, than write the name of our site!”.
从字面上看,是的,收集混淆地址比收集标准化地址更困难。 真正的问题是收割机是否会付出额外的努力,以及收割机的(主要?次要?)障碍是否值得为您的用户带来可能的问题。
如果您打算打乱地址或以其他方式将它们从标准形式中调换,您应该避免在这样做时保持一致 - 至少在同一站点上。
例如,如果大型社区网站上的每个电子邮件地址在标记中都被反转,并使用 CSS 正确呈现,或标记替换(@ 变为“at”)或任何其他可预测的方法,那么收割者将只为以下内容编写一个瘦适配器:你的网站。
可以这样想:如果您只需一行代码即可在整个站点“打乱”它们,那么收割机也只需一行代码即可为您的站点“打乱”它们。 大致说来。
在我看来,垃圾邮件已经成为一个严重的问题,如此多的数据库已被移交,以至于我们无法隐藏我们的地址。 相反,请考虑查看 Defensio 和 Akismet 等,以帮助分类和阻止垃圾邮件。
To provide a literal answer, yes, harvesting obfuscated addresses is harder than harvesting standardized addresses. The real question is whether the extra effort will be put in by harvesters and if the (major? minor?) barrier to the harvesters is worth the possible problems for your users.
If you are going to scramble addresses or otherwise transpose them away from the standard form, you should avoid being consistent in how you do so – at least on the same site.
For example, if every email address on a large community site is reversed in the markup and rendered properly with CSS, or token-replaced (@ becomes 'at'), or any other predictable method, the harvesters will just write a thin adapter for your site.
Think of it this way: if it only takes you one line of code to "scramble" them sitewide, it will only take the harvester one line of code to "unscramble" them for your site. Roughly speaking.
In my opinion, spam has become such a problem and so many DBs have been turned over that we're beyond hiding our addresses. Instead, consider looking at Defensio and Akismet, etc, to help classify and block spam.
我有一个解决方案,嗯,更多的是一个理论。
问题是,机器人解析页面。 他们可以获得文本。 即使它被放置
通过 Javascript 以某种复杂的方式进入页面。
所以,只有你CSS3伪元素! 它不会是一个链接,但您的电子邮件将是可见的,并且永远不会是实际的文本。 像这样的东西:
再说一遍,这是一个理论,我不知道这些邪恶的人能走多远才能得到它,但我认为这是相当安全的。 (除非他们解析CSS文件,我认为他们不会这样做)
I have a solution, well, more of a theory.
Problem is, the bots parse the page. they can get the text. even if it's being put
into the page in some sophisticated way through Javascript.
So, just you CSS3 pseudo element! it won't be a link, but your email will be visible, and will never be an actual text. something like this:
Again, it's a theory, I've no idea how far these evil people can go to get it, but I think this be pretty safe. (unless they parse the CSS files, which I don't think they do)
它确实在一定程度上使其变得更加困难,但是即使在今天,用户使用的简单的(
[dot]
和[at]
)已经过时,并且可以使用轻松捕获垃圾邮件发送者的简单正则表达式。使用像图像这样简单的东西对于目标读者来说是有帮助和可读的,而无需努力“解密”编码的电子邮件 ID。
如果您仍然对配备字符识别的垃圾邮件机器人持偏执态度,那么像这样的东西会很有效。
它利用视错觉作为优势,完成人类思维中计算机视觉无法轻易理解的字母。 应用类似于验证码的叠加层也有帮助,但我怀疑您需要走那么远。
It does make it harder to a degree, but the simple ones used by users even today (the
[dot]
and[at]
) are obsolete and can be captured easily using a simple regex by spammers.Using something as simple as an image would be helpful and readable for the intended human reader without effort to 'decrypt' the encoded email id.
If you are still paranoid about character recognition equipped spam bots, them something like this would be effective.
It uses optical illusion as an advantage to complete letters in the human mind that cannot be easily understood by computer vision. Applying CAPCHA-like overlay can also help, but I doubt you need to go that far.