Image presented as HTML table is just a technical speed bump. There's no difficulty in extraction of pixels from such document.
IMHO CAPTCHA puts focus on a wrong thing – you're not interested whether there's a human on the other side. You wouldn't like human to spam you either. So take a step back and focus on spam:
Analyze text (look for spammy keywords, use bayesian filtering)
Alternatives to captchas are going to be to consider the problem from other angles. The reason for this is because captchas are built around the idea that a human and computer actor can be distinguished. As Artificial intelligence progresses, this will always become an increasingly difficult problem as the gap between computer and human users shrinks.
The technique used here on slashdot is for other users of the site to act as gatekeepers, marking abuse and removing offending posts before they become noticeable to a wide audience.
Another technique is to detect spam-like posts directly, using the same technology used to filter spam from email. Obviously it isn't 100% effective for email, and wont be for other uses, either, but if you can filter out 75% of the spam with very few false positives being filtered, then other techniques will only have to deal with the remaining 25%.
Keep a log of spam-related activity, so that you can track trends about offending ip addresses, content of posts, claimed user agent, and so forth, so that you can block abusive users at a routing level.
In nearly all cases, your users would rather put up with the slight inconvenience of abuse prevention, than the huge inconvenience of a major spam problem.
Ultimately, the arms race between you and spammers is one of cost-benefit. Initially, it will cost spammers close to nothing to spam your site, but you can change that to make it very difficult. Even if they continue to spam your site, the benefit they recieve will never grow beyond a few innocent users falling for their schemes. Once the cost of spamming rises sharply above the benefit, the spammers will go away.
Another way to benefit from that is to allow advertising on your site. Make it inexpensive (but not free, of course) and easy for legitimate advertisers to post responsible marketing material for your users to see. Would be spammers may find that it is a better deal to just pay you a few dollars and get their offering seen than to pursue clandestine methods.
Obviously most spammers won't fit in this category, since that is often more about getting your users to fall victim to malware exploits. You can do your part for that by encouring users to use modern, up to date browsers or plugins so that they become less vulnerable to those same exploits.
This article describes a technique based on hashed field names (changing with each page view) with some of them being honeypot fields (i.e. the request is rejected if they're filled) that are hidden from human users via various techniques.
Basically, it relies on spam scripts not being sophisticated enough to determine which form fields are actually visible. In a way, that is a CAPTCHA, since in order to solve it reliably, not only would they have to implement HTML, CSS and JavaScript fully, they'd also have to recognize when a field is too small to see, colored the same as the background, hidden behind another field, placed outside the browser's viewport, etc.
It's the same basic problem that makes Web Standards a farce: there is no algorithm to determine whether a webpage "looks right" - only a human can decide that.
I really think that Dinah hit the nail on the head. The fact seems to be that the beauty of the whole CAPTCHA setup is that there is no standard. Standardizing would only help the market to be more profitable.
Therefore it seems that the best way to handle the CAPTCHA problem is to come up with a fairly hard system for bots to catch that is NOT used by anyone else on the planet. It could be a question system, a very custom image creator, or even a mix of JS calls that only browsers respect.
By the time that your site is big enough for spammers to care you should have the budget to rethink your CAPTCHA setup and optimize it much more. In the mean time we should be monitoring our server logs and banning bad agents, refers, and IP's.
In my case I created a CAPTCHA image that I believe is very different from any other CAPTCHA I have seen. This should do fine for now along side my Apache logs + htaccess banning and Aksimet checking. Maybe I should spend time on a reporting feature as well.
发布评论
评论(6)
以 HTML 表格形式呈现的图像只是一个技术减速带。 从此类文档中提取像素没有困难。
恕我直言,验证码将注意力集中在错误的事情上——你对另一边是否有人类不感兴趣。 您也不希望人类向您发送垃圾邮件。 因此,退后一步,重点关注垃圾邮件:
看看 Sblam! 的源代码(这是一个完全透明的服务器端垃圾评论)筛选)。
Image presented as HTML table is just a technical speed bump. There's no difficulty in extraction of pixels from such document.
IMHO CAPTCHA puts focus on a wrong thing – you're not interested whether there's a human on the other side. You wouldn't like human to spam you either. So take a step back and focus on spam:
Have a look at source code of Sblam! (it's a completely transparent server-side comment spam filter).
验证码的替代方案是从其他角度考虑问题。 这样做的原因是因为验证码是围绕可以区分人类和计算机参与者的想法构建的。 随着人工智能的进步,随着计算机和人类用户之间的差距缩小,这将始终成为一个越来越困难的问题。
lashdot 上使用的技术是让该网站的其他用户充当看门人,在违规帖子被广泛受众注意到之前标记滥用行为并删除它们。
另一种技术是直接检测类似垃圾邮件的帖子,使用与过滤电子邮件中的垃圾邮件相同的技术。 显然,它对于电子邮件并不是 100% 有效,也不会用于其他用途,但如果您可以过滤掉 75% 的垃圾邮件,并且过滤掉很少的误报,那么其他技术只需处理剩余的垃圾邮件25%。
记录与垃圾邮件相关的活动,以便您可以跟踪有关违规 IP 地址、帖子内容、声称的用户代理等的趋势,以便您可以在路由级别阻止滥用用户。
几乎在所有情况下,您的用户宁愿忍受滥用预防带来的轻微不便,也不愿忍受主要垃圾邮件问题带来的巨大不便。
最终,您和垃圾邮件发送者之间的军备竞赛是成本效益的竞赛之一。 最初,垃圾邮件发送者向您的网站发送垃圾邮件几乎不需要花费任何成本,但您可以更改这一点以使其变得非常困难。 即使他们继续向您的网站发送垃圾邮件,他们获得的好处也永远不会超出少数无辜用户上当受骗的范围。 一旦垃圾邮件发送的成本大幅高于收益,垃圾邮件发送者就会消失。
从中受益的另一种方法是允许在您的网站上投放广告。 让合法广告商能够廉价(但不是免费)且轻松地发布负责任的营销材料供用户查看。 垃圾邮件发送者可能会发现,只需付给您几美元并让他们的产品被看到,比采用秘密方法更好。
显然,大多数垃圾邮件发送者不属于这一类别,因为这通常更多的是让您的用户成为恶意软件攻击的受害者。 您可以通过鼓励用户使用现代的、最新的浏览器或插件来尽自己的一份力量,这样他们就不会那么容易受到这些相同的攻击。
Alternatives to captchas are going to be to consider the problem from other angles. The reason for this is because captchas are built around the idea that a human and computer actor can be distinguished. As Artificial intelligence progresses, this will always become an increasingly difficult problem as the gap between computer and human users shrinks.
The technique used here on slashdot is for other users of the site to act as gatekeepers, marking abuse and removing offending posts before they become noticeable to a wide audience.
Another technique is to detect spam-like posts directly, using the same technology used to filter spam from email. Obviously it isn't 100% effective for email, and wont be for other uses, either, but if you can filter out 75% of the spam with very few false positives being filtered, then other techniques will only have to deal with the remaining 25%.
Keep a log of spam-related activity, so that you can track trends about offending ip addresses, content of posts, claimed user agent, and so forth, so that you can block abusive users at a routing level.
In nearly all cases, your users would rather put up with the slight inconvenience of abuse prevention, than the huge inconvenience of a major spam problem.
Ultimately, the arms race between you and spammers is one of cost-benefit. Initially, it will cost spammers close to nothing to spam your site, but you can change that to make it very difficult. Even if they continue to spam your site, the benefit they recieve will never grow beyond a few innocent users falling for their schemes. Once the cost of spamming rises sharply above the benefit, the spammers will go away.
Another way to benefit from that is to allow advertising on your site. Make it inexpensive (but not free, of course) and easy for legitimate advertisers to post responsible marketing material for your users to see. Would be spammers may find that it is a better deal to just pay you a few dollars and get their offering seen than to pursue clandestine methods.
Obviously most spammers won't fit in this category, since that is often more about getting your users to fall victim to malware exploits. You can do your part for that by encouring users to use modern, up to date browsers or plugins so that they become less vulnerable to those same exploits.
本文介绍了一种基于哈希字段名称(随每个页面视图而变化)的技术,其中包含一些它们是蜜罐字段(即,如果它们被填充,则请求被拒绝),通过各种技术对人类用户隐藏。
基本上,它依赖于垃圾邮件脚本,这些脚本不够复杂,无法确定哪些表单字段实际上可见。 在某种程度上,这是一个验证码,因为为了可靠地解决它,他们不仅必须完全实现 HTML、CSS 和 JavaScript,而且还必须识别字段何时过于复杂。小到看不见,颜色与背景相同,隐藏在另一个字段后面,放置在浏览器视口之外,等等。
这是相同的基本问题 让网络标准成为一场闹剧:没有算法可以确定网页是否“看起来正确”——只有人类才能决定。
This article describes a technique based on hashed field names (changing with each page view) with some of them being honeypot fields (i.e. the request is rejected if they're filled) that are hidden from human users via various techniques.
Basically, it relies on spam scripts not being sophisticated enough to determine which form fields are actually visible. In a way, that is a CAPTCHA, since in order to solve it reliably, not only would they have to implement HTML, CSS and JavaScript fully, they'd also have to recognize when a field is too small to see, colored the same as the background, hidden behind another field, placed outside the browser's viewport, etc.
It's the same basic problem that makes Web Standards a farce: there is no algorithm to determine whether a webpage "looks right" - only a human can decide that.
看过这个吗?
这是一个带有可爱图片而不是验证码的系统;)
但我仍然认为蜜罐是更好的解决方案 - 它们是如此便宜、简单且隐形
seen this?
It's a system with cute pictures instead of captcha ;)
But I still think honeypots are a better solution - they're so cheap&easy&invisible
我真的认为黛娜说得一针见血。 事实似乎是,整个验证码设置的美妙之处在于没有标准。 标准化只会帮助市场获得更多利润。
因此,处理验证码问题的最佳方法似乎是提出一个相当严格的系统,让机器人能够捕获该系统,而地球上其他任何人都没有使用过该系统。 它可以是一个问题系统、一个非常自定义的图像创建器,甚至是只有浏览器尊重的 JS 调用的组合。
当您的网站足够大以供垃圾邮件发送者关注时,您应该有预算来重新考虑您的验证码设置并进一步优化它。 与此同时,我们应该监控我们的服务器日志并禁止不良代理、引用和 IP。
就我而言,我创建了一个验证码图像,我认为它与我见过的任何其他验证码都非常不同。 现在,除了我的 Apache 日志 + htaccess 禁止和 Aksimet 检查之外,这应该做得很好。 也许我也应该花时间在报告功能上。
I really think that Dinah hit the nail on the head. The fact seems to be that the beauty of the whole CAPTCHA setup is that there is no standard. Standardizing would only help the market to be more profitable.
Therefore it seems that the best way to handle the CAPTCHA problem is to come up with a fairly hard system for bots to catch that is NOT used by anyone else on the planet. It could be a question system, a very custom image creator, or even a mix of JS calls that only browsers respect.
By the time that your site is big enough for spammers to care you should have the budget to rethink your CAPTCHA setup and optimize it much more. In the mean time we should be monitoring our server logs and banning bad agents, refers, and IP's.
In my case I created a CAPTCHA image that I believe is very different from any other CAPTCHA I have seen. This should do fine for now along side my Apache logs + htaccess banning and Aksimet checking. Maybe I should spend time on a reporting feature as well.
虽然不是真实的图像验证码,但良好的图灵测试会询问用户一个随机问题 - 常见的选项是:冰是热还是冷? 5+2=..? ETC。
although not a true image captcha, good turing test is asking users a random question - common options are: is ice hot or cold? 5+2= ..? etc.