阻止脚本编写者攻击您的网站
我已经接受了答案,但遗憾的是,我相信我们陷入了最初的最坏情况:验证码每个人都尝试购买垃圾。 简短说明:缓存/网络场使得无法跟踪点击量,并且任何解决方法(发送非缓存网络信标、写入统一表等)都会比机器人更慢地减慢网站速度。 思科等公司可能有一些昂贵的硬件可以在高水平上提供帮助,但如果每个人都验证码是一种替代方案,那么很难证明成本合理。 稍后我将尝试更完整的解释,并为未来的搜索者清理这一点(尽管欢迎其他人尝试,因为它是社区 wiki)。
情况
这是关于 woot.com 上的 bag o' crap 销售。 我是 Woot Workshop 的总裁,Woot Workshop 是 Woot 的子公司,负责设计、撰写产品描述、播客、博客文章并主持论坛。 我使用 CSS/HTML,对其他技术几乎不熟悉。 我与开发人员密切合作,并讨论了这里的所有答案(以及我们的许多其他想法)。
可用性是我工作的重要组成部分,而让网站变得令人兴奋和有趣则是其余的大部分工作。 这就是以下三个目标的来源。 验证码损害了可用性,机器人窃取了我们垃圾销售的乐趣和兴奋。
机器人每秒数十次地猛击我们的首页,抓取(和/或扫描我们的 RSS)以进行随机垃圾促销。 当他们看到这一点时,它就会触发程序的第二阶段,即登录,单击“我想要一个”,填写表格,然后购买垃圾。
评估
lc:在 stackoverflow 和其他使用此方法的网站上,它们几乎总是处理经过身份验证(登录)的用户,因为正在尝试的任务需要这样做。
在 Woot 上,匿名(未登录)用户可以查看我们的主页。 换句话说,抨击机器人可能未经身份验证(并且除了 IP 地址之外基本上不可追踪)。
因此,我们回到扫描 IP,这 a) 在这个云网络和垃圾邮件僵尸僵尸的时代相当无用,b) 考虑到来自一个 IP 地址的企业数量,它捕获了太多无辜者(更不用说问题了)非静态 IP ISP 以及试图跟踪此问题的潜在性能影响)。
哦,让人们打电话给我们将是最糟糕的情况。 我们可以让他们给你打电话吗?
BradC:Ned Batchelder 的方法看起来很酷,但它们的设计目的非常明确,就是为了击败为网站网络构建的机器人。 我们的问题是机器人是专门为了击败我们的网站而构建的。 其中一些方法可能会在短时间内起作用,直到脚本编写者将他们的机器人发展为忽略蜜罐,从屏幕上抓取附近的标签名称而不是表单 ID,并使用支持 JavaScript 的浏览器控件。
lc 再次:“当然,除非炒作是您营销计划的一部分。” 是的,绝对是。 当物品出现时的惊喜,以及你设法得到一件物品时的兴奋可能与你最终得到的垃圾一样重要,甚至更重要。 任何消除先到先得的做法都会损害“赢得”垃圾的快感。
novatrust:首先,我欢迎我们的新机器人霸主。 实际上,我们确实提供 RSSfeed 来允许第 3 方应用程序扫描我们的网站以获取产品信息,但不能先于主网站 HTML。 如果我的解释正确的话,您的解决方案确实通过完全牺牲目标 1 来帮助实现目标 2(性能问题),并且只是放弃机器人将购买大部分垃圾的事实。 我对你的回答投了赞成票,因为你最后一段的悲观主义对我来说是准确的。 这里似乎没有灵丹妙药。
其余的响应通常依赖于 IP 跟踪,这似乎既无用(对于僵尸网络/僵尸/云网络)又有害(捕获许多来自相同 IP 目的地的无辜者)。
还有其他方法/想法吗? 我的开发人员一直说“让我们只进行验证码”,但我希望对于所有想要我们的垃圾的实际人类来说,有更少的侵入性方法。
原始问题
假设您正在出售具有非常高感知价值的廉价商品,并且您的数量非常有限。 没有人确切知道您何时会出售该商品。 超过一百万人定期过来看看您所销售的产品。
最终,脚本编写者和机器人试图以编程方式 [a] 弄清楚您何时出售所述商品,并 [b] 确保他们是第一批购买该商品的人。 这很糟糕有两个原因:
- 您的网站受到非人类的猛烈攻击,减慢了每个人的速度。
- 脚本编写者最终“赢得”了产品,让常客感到被欺骗了。
一个看似显而易见的解决方案是为用户在下订单之前创建一些障碍,以便他们跳过,但这至少存在三个问题:
- 对于人类来说,用户体验很糟糕,因为他们必须破译验证码,挑选出猫,或者解决一个数学问题。
- 如果感知到的好处足够高,并且人群足够大,一些群体就会找到绕过任何调整的方法,从而导致军备竞赛。 (调整越简单,这一点尤其正确;隐藏的“评论”表单、重新排列表单元素、错误地标记它们、隐藏的“陷阱”文本都将起作用一次,然后需要进行更改以针对该特定表单进行战斗.)
- 即使脚本编写者无法“解决”您的调整,也无法阻止他们猛击您的首页,然后发出警报,要求脚本编写者手动填写订单。 鉴于他们从解决[a]中获得优势,他们很可能仍然会赢得[b],因为他们将是第一个到达订单页面的人。 此外,1.仍然会发生,导致服务器错误并降低每个人的性能。
另一种解决方案是监视 IP 是否频繁命中,将其阻止在防火墙之外,或者以其他方式阻止它们进行排序。 这可以解决 2. 并防止 [b],但扫描 IP 对性能的影响是巨大的,并且可能会导致比脚本编写者自己造成的更多问题,如 1.。 此外,云网络和垃圾邮件机器人僵尸的可能性使得 IP 检查相当无用。
第三个想法,强制加载订单表格一段时间(例如半秒)可能会减慢快速订单的进度,但同样,脚本编写者仍然是第一个进入的人,无论速度如何,都不会对订单产生不利影响。实际用户。
目标
- 将项目出售给非脚本人员。
- 保持网站的运行速度不被机器人减慢。
- 不要让“普通”用户完成任何任务来证明他们是人类。
I've accepted an answer, but sadly, I believe we're stuck with our original worst case scenario: CAPTCHA everyone on purchase attempts of the crap. Short explanation: caching / web farms make it impossible to track hits, and any workaround (sending a non-cached web-beacon, writing to a unified table, etc.) slows the site down worse than the bots would. There is likely some pricey hardware from Cisco or the like that can help at a high level, but it's hard to justify the cost if CAPTCHA-ing everyone is an alternative. I'll attempt a more full explanation later, as well as cleaning this up for future searchers (though others are welcome to try, as it's community wiki).
Situation
This is about the bag o' crap sales on woot.com. I'm the president of Woot Workshop, the subsidiary of Woot that does the design, writes the product descriptions, podcasts, blog posts, and moderates the forums. I work with CSS/HTML and am only barely familiar with other technologies. I work closely with the developers and have talked through all of the answers here (and many other ideas we've had).
Usability is a massive part of my job, and making the site exciting and fun is most of the rest of it. That's where the three goals below derive. CAPTCHA harms usability, and bots steal the fun and excitement out of our crap sales.
Bots are slamming our front page tens of times a second screen scraping (and/or scanning our RSS) for the Random Crap sale. The moment they see that, it triggers a second stage of the program that logs in, clicks I want One, fills out the form, and buys the crap.
Evaluation
lc: On stackoverflow and other sites that use this method, they're almost always dealing with authenticated (logged in) users, because the task being attempted requires that.
On Woot, anonymous (non-logged) users can view our home page. In other words, the slamming bots can be non-authenticated (and essentially non-trackable except by IP address).
So we're back to scanning for IPs, which a) is fairly useless in this age of cloud networking and spambot zombies and b) catches too many innocents given the number of businesses that come from one IP address (not to mention the issues with non-static IP ISPs and potential performance hits to trying to track this).
Oh, and having people call us would be the worst possible scenario. Can we have them call you?
BradC: Ned Batchelder's methods look pretty cool, but they're pretty firmly designed to defeat bots built for a network of sites. Our problem is bots are built specifically to defeat our site. Some of these methods could likely work for a short time until the scripters evolved their bots to ignore the honeypot, screen-scrape for nearby label names instead of form ids, and use a javascript-capable browser control.
lc again: "Unless, of course, the hype is part of your marketing scheme." Yes, it definitely is. The surprise of when the item appears, as well as the excitement if you manage to get one is probably as much or more important than the crap you actually end up getting. Anything that eliminates first-come/first-serve is detrimental to the thrill of 'winning' the crap.
novatrust: And I, for one, welcome our new bot overlords. We actually do offer RSSfeeds to allow 3rd party apps to scan our site for product info, but not ahead of the main site HTML. If I'm interpreting it right, your solution does help goal 2 (performance issues) by completely sacrificing goal 1, and just resigning the fact that bots will be buying most of the crap. I up-voted your response, because your last paragraph pessimism feels accurate to me. There seems to be no silver bullet here.
The rest of the responses generally rely on IP tracking, which, again, seems to both be useless (with botnets/zombies/cloud networking) and detrimental (catching many innocents who come from same-IP destinations).
Any other approaches / ideas? My developers keep saying "let's just do CAPTCHA" but I'm hoping there's less intrusive methods to all actual humans wanting some of our crap.
Original question
Say you're selling something cheap that has a very high perceived value, and you have a very limited amount. No one knows exactly when you will sell this item. And over a million people regularly come by to see what you're selling.
You end up with scripters and bots attempting to programmatically [a] figure out when you're selling said item, and [b] make sure they're among the first to buy it. This sucks for two reasons:
- Your site is slammed by non-humans, slowing everything down for everyone.
- The scripters end up 'winning' the product, causing the regulars to feel cheated.
A seemingly obvious solution is to create some hoops for your users to jump through before placing their order, but there are at least three problems with this:
- The user experience sucks for humans, as they have to decipher CAPTCHA, pick out the cat, or solve a math problem.
- If the perceived benefit is high enough, and the crowd large enough, some group will find their way around any tweak, leading to an arms race. (This is especially true the simpler the tweak is; hidden 'comments' form, re-arranging the form elements, mis-labeling them, hidden 'gotcha' text all will work once and then need to be changed to fight targeting this specific form.)
- Even if the scripters can't 'solve' your tweak it doesn't prevent them from slamming your front page, and then sounding an alarm for the scripter to fill out the order, manually. Given they get the advantage from solving [a], they will likely still win [b] since they'll be the first humans reaching the order page. Additionally, 1. still happens, causing server errors and a decreased performance for everyone.
Another solution is to watch for IPs hitting too often, block them from the firewall, or otherwise prevent them from ordering. This could solve 2. and prevent [b] but the performance hit from scanning for IPs is massive and would likely cause more problems like 1. than the scripters were causing on their own. Additionally, the possibility of cloud networking and spambot zombies makes IP checking fairly useless.
A third idea, forcing the order form to be loaded for some time (say, half a second) would potentially slow the progress of the speedy orders, but again, the scripters would still be the first people in, at any speed not detrimental to actual users.
Goals
- Sell the item to non-scripting humans.
- Keep the site running at a speed not slowed by bots.
- Don't hassle the 'normal' users with any tasks to complete to prove they're human.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
怎么样:创建一个表单,以便在新商品促销时接收电子邮件,并添加一个捕获系统,该系统将为任何在不到 X 秒内刷新的人提供相同的内容。
这样你就赢得了所有的场景:你摆脱了抓取工具(他们可以抓取他们的电子邮件帐户),并且你给那些不会编写代码只是为了在你的网站上购买东西的人提供了机会! 我确信如果我真的想的话,我会在手机中收到电子邮件并登录购买东西。
How about this: Create a form to receive an email if a new item is on sale and add a catching system that will serve the same content to anyone refreshing in less than X seconds.
This way you win all the escenarios: you get rid of the scrapers(they can scrape their email account) and you give chance to the people who wont code something just to buy in your site! Im sure i would get the email in my mobile and log in to buy something if i really wanted to.
如何使用验证码实现像 SO 那样的功能?
如果您正常使用该网站,您可能永远不会看到该网站。 如果您碰巧过于频繁地重新加载同一页面、过快地发布连续评论或其他触发警报的事情,请让他们证明自己是人类。 在您的情况下,这可能是同一页面的不断重新加载,快速跟踪页面上的每个链接,或者填写订单的速度太快而无法人类完成。
如果它们连续 x 次(例如 2 或 3 次)检查失败,请对该 IP 给予超时或其他此类措施。 然后在超时结束时,再次将它们转回支票。
由于您有未注册的用户访问该网站,因此您只有 IP 可以继续。 如果您愿意,您可以向每个浏览器发出会话并进行跟踪。 当然,如果连续(重新)创建了太多会话(以防机器人不断删除 cookie),请进行人工检查。
至于抓到太多无辜者,您可以在人工检查页面上发布免责声明:“如果太多匿名用户从同一位置访问我们的网站,也可能会出现此页面。我们鼓励您注册或登录,以避免这。” (适当调整措辞。)
另外,X个人同时从一个IP加载同一页面的几率有多大? 如果它们很高,也许您的机器人警报需要不同的触发机制。
编辑:另一种选择是,如果他们失败太多次,并且您对产品的需求有信心,则阻止他们并让他们亲自打电话给您以删除阻止。
让人们打电话确实看起来是一种愚蠢的措施,但它可以确保计算机后面的某个地方有人。 关键是只有在几乎永远不会发生的情况下才使用该块,除非它是机器人(例如,连续多次检查失败)。 然后它迫使人类互动——拿起电话。
为了回应让他们打电话给我的评论,这里显然存在这种权衡。 您是否足够担心确保您的用户是人性化的,能够在产品发售时接听几个电话? 如果我如此担心产品能否到达人类用户手中,我就必须做出这个决定,也许会在这个过程中牺牲一点(一小部分)时间。
既然您似乎决心不让机器人占据上风/攻击您的网站,我相信手机可能是一个不错的选择。 由于我不从你们的产品中获利,所以我没有兴趣接听这些电话。 然而,如果你分享一些利润,我可能会感兴趣。 由于这是您的产品,您必须决定您的关心程度并相应地实施。
释放封锁的其他方法并不那么有效:超时(但他们会在重复之后再次猛击您的网站),长时间超时(如果真的是一个人试图购买您的产品,他们将受到 SOL 的处罚,并因未通过检查而受到惩罚)、电子邮件(很容易由机器人完成)、传真(相同)或普通邮件(需要太长时间)。
当然,您也可以在每次超时时增加每个 IP 的超时时间。 只要确保您没有无意中惩罚真正的人类即可。
How about implementing something like SO does with the CAPTCHAs?
If you're using the site normally, you'll probably never see one. If you happen to reload the same page too often, post successive comments too quickly, or something else that triggers an alarm, make them prove they're human. In your case, this would probably be constant reloads of the same page, following every link on a page quickly, or filling in an order form too fast to be human.
If they fail the check x times in a row (say, 2 or 3), give that IP a timeout or other such measure. Then at the end of the timeout, dump them back to the check again.
Since you have unregistered users accessing the site, you do have only IPs to go on. You can issue sessions to each browser and track that way if you wish. And, of course, throw up a human-check if too many sessions are being (re-)created in succession (in case a bot keeps deleting the cookie).
As far as catching too many innocents, you can put up a disclaimer on the human-check page: "This page may also appear if too many anonymous users are viewing our site from the same location. We encourage you to register or login to avoid this." (Adjust the wording appropriately.)
Besides, what are the odds that X people are loading the same page(s) at the same time from one IP? If they're high, maybe you need a different trigger mechanism for your bot alarm.
Edit: Another option is if they fail too many times, and you're confident about the product's demand, to block them and make them personally CALL you to remove the block.
Having people call does seem like an asinine measure, but it makes sure there's a human somewhere behind the computer. The key is to have the block only be in place for a condition which should almost never happen unless it's a bot (e.g. fail the check multiple times in a row). Then it FORCES human interaction - to pick up the phone.
In response to the comment of having them call me, there's obviously that tradeoff here. Are you worried enough about ensuring your users are human to accept a couple phone calls when they go on sale? If I were so concerned about a product getting to human users, I'd have to make this decision, perhaps sacrificing a (small) bit of my time in the process.
Since it seems like you're determined to not let bots get the upper hand/slam your site, I believe the phone may be a good option. Since I don't make a profit off your product, I have no interest in receiving these calls. Were you to share some of that profit, however, I may become interested. As this is your product, you have to decide how much you care and implement accordingly.
The other ways of releasing the block just aren't as effective: a timeout (but they'd get to slam your site again after, rinse-repeat), a long timeout (if it was really a human trying to buy your product, they'd be SOL and punished for failing the check), email (easily done by bots), fax (same), or snail mail (takes too long).
You could, of course, instead have the timeout period increase per IP for each time they get a timeout. Just make sure you're not punishing true humans inadvertently.
您需要找到一种方法让机器人购买价格过高的东西:12 毫米翼形螺母:20 美元。 在编剧决定你要玩弄机器人之前,看看有多少机器人抢购一空。
用利润购买更多服务器并支付带宽费用。
You need to figure a way to make the bots buy stuff that is massively overpriced: 12mm wingnut: $20. See how many bots snap up before the script-writers decide you're gaming them.
Use the profits to buy more servers and pay for bandwidth.
我的解决方案是通过为“机器人和脚本”添加大约 10 分钟的延迟,使屏幕抓取变得毫无价值。
我的做法如下:
您不需要记录每次点击的每个 IP 地址。 仅跟踪大约每 20 个点击中的一个。 惯犯仍会出现在随机偶尔的跟踪中。
保留大约 10 分钟前的页面缓存。
当重复攻击者/机器人访问您的网站时,向他们提供 10 分钟前的缓存页面。
他们不会立即知道他们正在访问旧网站。 他们将能够争取到一切,但他们不会再赢得任何比赛,因为“真正的人”将有 10 分钟的领先优势。
优点:
缺点
My solution would be to make screen-scraping worthless by putting in a roughly 10 minute delay for 'bots and scripts.
Here's how I'd do it:
You don't need to log every IP address on every hit. Only track one out of every 20 hits or so. A repeat offender will still show up in a randomized occassional tracking.
Keep a cache of your page from about 10-minutes earlier.
When a repeat-hitter/bot hits your site, give them the 10-minute old cached page.
They won't immediately know they're getting an old site. They'll be able to scrape it, and everything, but they won't win any races anymore, because "real people" will have a 10 minute head-start.
Benefits:
Drawbacks
请参阅这里由 ned Batchelder 撰写的这篇文章。 他的文章是关于阻止垃圾邮件机器人的,但相同的技术可以轻松应用于您的网站。
其他一些想法:
编辑:为了完全清楚,Ned 的上面的文章描述了通过阻止 BOT 通过表单提交订单来防止自动购买物品的方法。 他的技术对于防止机器人通过屏幕抓取主页来确定胡萝卜带何时出售是没有用的。 我不确定阻止这种情况是否真的可能。
关于您对内德策略有效性的评论:是的,他讨论了蜜罐,但我不认为这是他最强的策略。 他对SPINNER的讨论是我提到他的文章的最初原因。 抱歉,我在原来的帖子中没有说得更清楚:
以下是在 WOOT.com 上实现这一点的方法:
更改每次新商品上市时用作哈希一部分的“秘密”值。 这意味着,如果有人要设计一个 BOT 来自动购买商品,它只会在下一个商品发售之前起作用!!
即使有人能够快速重建他们的机器人,所有其他实际用户都已经购买了 BOC,那么您的问题就解决了!
他讨论的另一个策略是不时更改蜜罐技术(同样,当新商品上市时更改它):
我想我的总体想法是在每个新商品上市时改变形式设计。 或者至少,当新的 BOC 上市时进行更改。
哪个是什么,每月几次?
Take a look at this article by ned Batchelder here. His article is about stopping spambots, but the same techniques could easily apply to your site.
Some other ideas:
EDIT: To be totally clear, Ned's article above describe methods to prevent the automated PURCHASE of items by preventing a BOT from going through the forms to submit an order. His techniques wouldn't be useful for preventing bots from screen-scraping the home page to determine when a Bandoleer of Carrots comes up for sale. I'm not sure preventing THAT is really possible.
With regard to your comments about the effectiveness of Ned's strategies: Yes, he discusses honeypots, but I don't think that's his strongest strategy. His discussion of the SPINNER is the original reason I mentioned his article. Sorry I didn't make that clearer in my original post:
Here is how you could implement that at WOOT.com:
Change the "secret" value that is used as part of the hash each time a new item goes on sale. This means that if someone is going to design a BOT to auto-purchase items, it would only work until the next item comes on sale!!
Even if someone is able to quickly re-build their bot, all the other actual users will have already purchased a BOC, and your problem is solved!
The other strategy he discusses is to change the honeypot technique from time to time (again, change it when a new item goes on sale):
I guess my overall idea is to CHANGE THE FORM DESIGN when each new item goes on sale. Or at LEAST, change it when a new BOC goes on sale.
Which is what, a couple times/month?
问:如何阻止脚本编写者每秒数百次攻击您的网站?
答:你不知道。 外部代理无法阻止这种行为。
您可以采用大量技术来分析传入的请求,并试探性地尝试确定谁是人类,谁不是人类……但它会失败。 最终,即使不是立即。
唯一可行的长期解决方案是改变游戏,使网站对机器人不友好,或者对脚本编写者的吸引力降低。
你是怎样做的? 嗯,这是一个不同的问题! ;-)
...
好吧,上面已经给出了(并拒绝了)一些选项。 我对您的网站不太熟悉,只看过一次,但由于人们可以阅读图像中的文本,而机器人无法轻松做到这一点,因此请将公告更改为图像。 不是验证码,只是一个图像 -
对真实的人对此做出反应进行计时试验,并忽略(“哎呀,发生了错误,抱歉!请重试”)响应速度比(比如说)一半的时间快。 此事件还应该向开发人员发出警报,表明至少有一个机器人已经弄清楚了代码/游戏,因此是时候更改代码/游戏了。
无论如何,继续定期更改游戏,即使没有机器人触发它,只是为了浪费脚本编写者的时间。 最终,脚本编写者应该厌倦了游戏并去其他地方......我们希望;-)
最后一个建议:当对主页的请求出现时,将其放入队列并响应请求为了在一个单独的过程中(您可能需要破解/扩展网络服务器才能做到这一点,但这可能是值得的)。 如果当第一个请求在队列中时来自同一 IP/代理的另一个请求进来,请忽略它。 这应该会自动减轻机器人的负载。
编辑:除了使用图像之外,另一种选择是使用 javascript 填写购买/不购买文本; 机器人很少解释 JavaScript,所以他们看不到它
Q: How would you stop scripters from slamming your site hundreds of times a second?
A: You don't. There is no way to prevent this behavior by external agents.
You could employ a vast array of technology to analyze incoming requests and heuristically attempt to determine who is and isn't human...but it would fail. Eventually, if not immediately.
The only viable long-term solution is to change the game so that the site is not bot-friendly, or is less attractive to scripters.
How do you do that? Well, that's a different question! ;-)
...
OK, some options have been given (and rejected) above. I am not intimately familiar with your site, having looked at it only once, but since people can read text in images and bots cannot easily do this, change the announcement to be an image. Not a CAPTCHA, just an image -
Run time-trials of real people responding to this, and ignore ('oops, an error occurred, sorry! please try again') responses faster than (say) half of this time. This event should also trigger an alert to the developers that at least one bot has figured out the code/game, so it's time to change the code/game.
Continue to change the game periodically anyway, even if no bots trigger it, just to waste the scripters' time. Eventually the scripters should tire of the game and go elsewhere...we hope ;-)
One final suggestion: when a request for your main page comes in, put it in a queue and respond to the requests in order in a separate process (you may have to hack/extend the web server to do this, but it will likely be worthwhile). If another request from the same IP/agent comes in while the first request is in the queue, ignore it. This should automatically shed the load from the bots.
EDIT: another option, aside from use of images, is to use javascript to fill in the buy/no-buy text; bots rarely interpret javascript, so they wouldn't see it
我不知道这有多可行:……继续进攻。
找出机器人正在扫描哪些数据。 当你不卖垃圾时,向他们提供他们正在寻找的数据。 以不会打扰或迷惑人类用户的方式执行此操作。 当机器人触发第二阶段时,他们将登录并填写表格以购买 100 美元的 Roomba,而不是 BOC。 当然,这是假设机器人不是特别强大。
另一个想法是在垃圾促销期间实施随机降价。 当你明确表示它只值 20 美元时,谁会花 150 美元随机购买一个袋子或垃圾? 除了过于热心的机器人之外没有人。 但 9 分钟后,价格变成了 35 美元……17 分钟后,价格变成了 9 美元。 管他呢。
当然,僵尸王能够做出反应。 关键是让他们的错误变得非常昂贵(并让他们付钱给你来对抗他们)。
所有这些都假设您想激怒一些机器人领主,这可能不是 100% 可取的。
I don't know how feasible this is: ... go on the offensive.
Figure out what data the bots are scanning for. Feed them the data that they're looking for when you're NOT selling the crap. Do this in a way that won't bother or confuse human users. When the bots trigger phase two, they'll log in and fill out the form to buy $100 roombas instead of BOC. Of course, this assumes that the bots are not particularly robust.
Another idea is to implement random price drops over the course of the bag o crap sale period. Who would buy a random bag o crap for $150 when you CLEARLY STATE that it's only worth $20? Nobody but overzealous bots. But then 9 minutes later it's $35 dollars ... then 17 minutes later it's $9. Or whatever.
Sure, the zombie kings would be able to react. The point is to make their mistakes become very costly for them (and to make them pay you to fight them).
All of this assumes you want to piss off some bot lords, which may not be 100% advisable.
所以问题似乎确实是:机器人想要他们的“垃圾袋”,因为它以较低的感知价格具有较高的感知价值。 有时,您提供此商品,机器人就会潜伏起来,等待查看是否有货,然后购买该商品。
由于机器人所有者似乎正在盈利(或可能盈利),所以诀窍是通过鼓励他们购买垃圾来使他们无利可图。
首先,总是提供“bag 'o crap”。
其次,确保垃圾通常就是垃圾。
第三,经常转动垃圾。
很简单,不是吗?
你需要一个永久的“为什么我们的垃圾有时是垃圾?” 报价旁边的链接向人们解释发生了什么。
当机器人看到有垃圾并且自动购买垃圾时,接收者会非常沮丧,因为他们花了 10 美元买了一根折断的牙签。 然后是一个空垃圾袋。 然后是鞋底的一些污垢。
如果他们在相对较短的时间内购买了足够多的这种垃圾(并且你到处都有大量的免责声明解释你为什么这样做),他们将在你的“垃圾袋”。 如果你足够频繁地旋转垃圾,即使是人工干预(检查以确保垃圾不是垃圾)也可能会失败。 哎呀,也许机器人会注意到并且不会购买轮换时间太短的任何东西,但这意味着人类会购买非垃圾的东西。
哎呀,你的老客户可能会觉得很有趣,你可以把这变成一个巨大的营销胜利。 开始发布“垃圾”鲤鱼的销售量。 人们会回来看看机器人被咬得有多严重。
更新:我预计您可能会接到一些人们抱怨的电话。 我认为你无法完全阻止这种情况。 但是,如果这杀死了机器人,您可以随时停止它并稍后重新启动。
So the problem really seems to be: the bots want their "bag 'o crap" because it has a high perceived value at a low perceived price. You sometimes offer this item and the bots lurk, waiting to see if it's available and then they buy the item.
Since it seems like the bot owners are making a profit (or potentially making a profit), the trick is to make this unprofitable for them by encouraging them to buy the crap.
First, always offer the "bag 'o crap".
Second, make sure that crap is usually crap.
Third, rotate the crap frequently.
Simple, no?
You'll need a permanent "why is our crap sometimes crap?" link next to the offer to explain to humans what's going on.
When the bot sees that there's crap and the crap is automatically purchased, the recipient is going to be awfully upset that they've paid $10 for a broken toothpick. And then an empty trash bag. And then some dirt from the bottom of your shoe.
If they buy enough of this crap in a relatively short period of time (and you have large disclaimers all over the place explaining why you're doing this), they're going to lose a fair "bag 'o cash" on your "bag 'o crap". Even human intervention on their part (checking to ensure that the crap isn't crap) can fail if you rotate the crap often enough. Heck, maybe the bots will notice and not buy anything that's been in the rotation for too short a time, but that means the humans will buy the non-crap.
Heck, your regular customers might be so amused that you can turn this into a huge marketing win. Start posting how much of the "crap" carp is being sold. People will come back just to see how hard the bots have been bitten.
Update: I expect that you might get a few calls up front with people complaining. I don't think you can stop that entirely. However, if this kills the bots, you can always stop it and restart it later.
您可能不想听到这个,但是#1 和#3 是互斥的。
好吧,没有人知道你是机器人也可以。 没有一种程序化的方法可以告诉连接的另一端是否有人,而不需要人做某事。 防止脚本/机器人在网络上执行操作是发明验证码的全部原因。 这并不是什么新问题,还没有花费很多精力来解决。 如果有一种更好的方法来做到这一点,并且不会像验证码那样给真实用户带来麻烦,那么每个人都已经在使用它了。
我认为您需要面对这样一个事实:如果您想让机器人远离您的订购页面,良好的验证码是唯一的方法。 如果对随机垃圾的需求足够高,人们愿意不遗余力地获取它,那么合法用户就不会因为验证码而望而却步。
You probably don't want to hear this, but #1 and #3 are mutually exclusive.
Well, nobody knows you're a bot either. There's no programatic way to tell the whether or not there's a human on the other end of the connection without requiring the person to do something. Preventing scripts/bots from doing stuff on the web is the whole reason CAPTCHAs were invented. It's not like this is some new problem that hasn't seen a lot of effort expended on it. If there were a better way to do it, one that didn't involve the hassle to real users that a CAPTCHA does, everyone would be using it already.
I think you need to face the fact that if you want to keep bots off your ordering page, a good CAPTCHA is the only way to do it. If demand for your random crap is high enough that people are willing to go to these lengths to get it, legitimate users aren't going to be put off by a CAPTCHA.
Woot 用来解决这个问题的方法实际上正在改变游戏规则。 当他们展示一款非常理想的商品进行销售时,他们会让用户玩视频游戏来订购它。
这不仅可以成功地对抗机器人(它们可以轻松地对游戏进行微小的更改以避免自动玩家,甚至为每次销售提供一个新游戏),而且还给用户留下了在放慢速度的同时“赢得”所需物品的印象订购过程。
它仍然很快售罄,但我认为解决方案很好 - 重新评估问题并更改参数导致了成功的策略,而严格的技术解决方案根本不存在。
您的整个业务模式基于“先到先得”。 你不能做广播电台所做的事情(他们不再让第一个呼叫者成为获胜者,他们让第 5 个、第 20 个或第 13 个呼叫者成为获胜者) - 它与你的主要功能不匹配。
不,如果不改变真实用户的订购体验,就无法做到这一点。
假设您实施了所有这些策略。 如果我认为这很重要,我会简单地让 100 个人与我一起工作,我们将构建在 100 台独立计算机上运行的软件,并每秒访问您的网站 20 次(每个用户访问之间间隔 5 秒/ cookie/帐户/IP 地址)。
你有两个阶段:
你不能设置验证码阻止 #1 - 这会失去真正的客户(“什么?每次我想看到最新的 woot 时,我都必须解决验证码?!?”) 。
所以我的小组一起观察,计时,这样我们每秒大约进行 20 次检查,无论谁首先看到变化,都会向所有其他人发出警报(自动),他们将再次加载首页,点击订单链接,并执行交易(这也可能会自动发生,除非您实现验证码并为每个 wootoff/boc 更改它)。
您可以在 #2 前面放置一个验证码,虽然您不愿意这样做,但这可能是确保即使机器人观看首页,真正的用户也能获得产品的唯一方法。
但即使有了验证码,我这 100 人的小团队仍然拥有显着的先发优势 - 而且你无法看出我们不是人类。 如果您开始对我们的访问进行计时,我们只需添加一些抖动即可。 我们可以随机选择要刷新的计算机,以便访问顺序不断变化 - 但看起来仍然足够像人类。
首先,摆脱简单的机器人
你需要有一个自适应防火墙来监视请求,如果有人做了明显愚蠢的事情 - 在同一 IP 上每秒刷新一次以上,然后采用策略来减慢它们的速度(丢弃数据包,发回被拒绝或 500 错误等)。
这应该会显着降低您的流量并改变机器人用户采用的策略。
其次,使服务器速度极快。
你真的不想听到这个......但是......
我认为你需要的是一个自下而上的完全定制的解决方案。
您不需要搞乱 TCP/IP 堆栈,但您可能需要开发一个非常非常快的自定义服务器,该服务器是专门为关联用户连接并对各种攻击做出适当反应而构建的。
Apache、lighthttpd 等都非常灵活,但是您运行一个单一用途的网站,并且您确实需要能够执行比当前服务器能够执行的更多操作(无论是在处理流量方面,还是在适当地对抗机器人方面) )。
通过在自定义服务器上提供基本上静态的网页(每 30 秒左右更新一次),您不仅应该能够处理 10 倍的请求和流量(因为服务器除了获取请求和读取数据之外,不执行任何其他操作)内存中的页面进入 TCP/IP 缓冲区),但它还可以让您访问可能帮助您减慢机器人速度的指标。 例如,通过关联 IP 地址,您可以简单地阻止每个 IP 每秒多个连接。 人类的速度不可能比这更快,即使使用相同 NAT 的 IP 地址的人也很少会被阻止。 您需要进行慢速阻止 - 在正式终止会话之前将连接保持一整秒。 这可以输入防火墙,对特别严重的违法者给予长期阻止。
但现实是,无论你做什么,当机器人是由人类出于单一目的定制时,都无法区分人类和机器人。 机器人只是人类的代理。
结论
归根结底,您无法通过观看首页来区分人类和计算机。 您可以在订购步骤停止机器人,但机器人用户仍然具有先发优势,并且您仍然有巨大的负载需要管理。
您可以为简单的机器人添加块,这将提高标准并减少打扰它的人。 这可能就足够了。
但如果不改变你的基本模型,你就运气不好了。 你能做的最好的事情就是处理简单的情况,使服务器速度快到普通用户不会注意到,并出售如此多的物品,即使你有几百万个机器人,只要有想要的普通用户就可以得到它们。
您可能会考虑设置一个蜜罐并将用户帐户标记为机器人用户,但这会引起社区的巨大负面抵制。
每当我想到“好吧,这样做怎么样……”我总能用合适的机器人策略来应对。
即使您将首页设置为验证码以访问订购页面(“此商品的订购按钮是蓝色的,带有粉色闪光,位于此页面上的某处”),机器人也会简单地打开页面上的所有链接,并使用其中出现的任何一个返回订购页面。 这是没有办法赢得这场比赛的。
使服务器速度更快,在订购页面上添加 reCaptcha(我发现的唯一一个不容易被愚弄的验证码,但对于您的应用程序来说可能太慢了),并考虑稍微更改模型的方法,以便普通用户与机器人用户有同样好的机会。
-亚当
The method Woot uses to combat this issue is changing the game - literally. When they present an extraordinarily desirable item for sale, they make users play a video game in order to order it.
Not only does that successfully combat bots (they can easily make minor changes to the game to avoid automatic players, or even provide a new game for each sale) but it also gives the impression to users of "winning" the desired item while slowing down the ordering process.
It still sells out very quickly, but I think that the solution is good - re-evaluating the problem and changing the parameters led to a successful strategy where strictly technical solutions simply didn't exist.
Your entire business model is based on "first come, first served." You can't do what the radio stations did (they no longer make the first caller the winner, they make the 5th or 20th or 13th caller the winner) - it doesn't match your primary feature.
No, there is no way to do this without changing the ordering experience for the real users.
Let's say you implement all these tactics. If I decide that this is important, I'll simply get 100 people to work with me, we'll build software to work on our 100 separate computers, and hit your site 20 times a second (5 seconds between accesses for each user/cookie/account/IP address).
You have two stages:
You can't put a captcha blocking #1 - that's going to lose real customers ("What? I have to solve a captcha each time I want to see the latest woot?!?").
So my little group watches, timed together so we get about 20 checks per second, and whoever sees the change first alerts all the others (automatically), who will load the front page once again, follow the order link, and perform the transaction (which may also happen automatically, unless you implement captcha and change it for every wootoff/boc).
You can put a captcha in front of #2, and while you're loathe to do it, that may be the only way to make sure that even if bots watch the front page, real users are getting the products.
But even with captcha my little band of 100 would still have a significant first mover advantage - and there's no way you can tell that we aren't humans. If you start timing our accesses, we'd just add some jitter. We could randomly select which computer was to refresh so the order of accesses changes constantly - but still looks enough like a human.
First, get rid of the simple bots
You need to have an adaptive firewall that will watch requests and if someone is doing the obvious stupid thing - refreshing more than once a second at the same IP then employ tactics to slow them down (drop packets, send back refused or 500 errors, etc).
This should significantly drop your traffic and alter the tactics the bot users employ.
Second, make the server blazingly fast.
You really don't want to hear this... but...
I think what you need is a fully custom solution from the bottom up.
You don't need to mess with TCP/IP stack, but you may need to develop a very, very, very fast custom server that is purpose built to correlate user connections and react appropriately to various attacks.
Apache, lighthttpd, etc are all great for being flexible, but you run a single purpose website, and you really need to be able to both do more than the current servers are capable of doing (both in handling traffic, and in appropriately combating bots).
By serving a largely static webpage (updates every 30 seconds or so) on a custom server you should not only be able to handle 10x the number of requests and traffic (because the server isn't doing anything other than getting the request, and reading the page from memory into the TCP/IP buffer) but it will also give you access to metrics that might help you slow down bots. For instance, by correlating IP addresses you can simply block more than one connection per second per IP. Humans can't go faster than that, and even people using the same NATed IP address will only infrequently be blocked. You'd want to do a slow block - leave the connection alone for a full second before officially terminating the session. This can feed into a firewall to give longer term blocks to especially egregious offenders.
But the reality is that no matter what you do, there's no way to tell a human apart from a bot when the bot is custom built by a human for a single purpose. The bot is merely a proxy for the human.
Conclusion
At the end of the day, you can't tell a human and a computer apart for watching the front page. You can stop bots at the ordering step, but the bot users still have a first mover advantage, and you still have a huge load to manage.
You can add blocks for the simple bots, which will raise the bar and fewer people with bother with it. That may be enough.
But without changing your basic model, you're out of luck. The best you can do is take care of the simple cases, make the server so fast regular users don't notice, and sell so many items that even if you have a few million bots, as many regular users as want them will get them.
You might consider setting up a honeypot and marking user accounts as bot users, but that will have a huge negative community backlash.
Every time I think of a "well, what about doing this..." I can always counter it with a suitable bot strategy.
Even if you make the front page a captcha to get to the ordering page ("This item's ordering button is blue with pink sparkles, somewhere on this page") the bots will simply open all the links on the page, and use whichever one comes back with an ordering page. That's just no way to win this.
Make the servers fast, put in a reCaptcha (the only one I've found that can't be easily fooled, but it's probably way too slow for your application) on the ordering page, and think about ways to change the model slightly so regular users have as good a chance as the bot users.
-Adam
免责声明:这个答案完全与编程无关。 然而,它确实首先尝试攻击脚本的原因。
另一个想法是,如果您的销售数量确实有限,为什么不改变先到先得的方法呢? 当然,除非炒作是您营销计划的一部分。
还有许多其他选项,我相信其他人可以想到一些不同的选项:
排序队列(预订系统) - 某些脚本可能仍然位于队列的前面,但可能会更快只需手动输入信息即可。
抽奖系统(每个试图订购一个的人都会进入系统) - 这样,拥有脚本的人与没有脚本的人拥有相同的机会。
紧急优先队列 - 如果确实有很高的感知价值,人们可能愿意支付更多费用。 实施排序队列,但允许人们支付更多费用以在队列中获得更高的排名。
拍卖(这一点归功于 David Schmitt,评论是我自己的) - 人们仍然可以在最后一刻使用脚本进行狙击,但这不仅改变了定价结构,人们还期望与之抗争和别人出去。 您还可以采取一些措施来限制给定时间段内的出价数量,让人们提前打电话索取授权码等。
Disclaimer: This answer is completely non-programming-related. It does, however, try to attack the reason for scripts in the first place.
Another idea is if you truly have a limited quantity to sell, why don't you change it from a first-come-first-served methodology? Unless, of course, the hype is part of your marketing scheme.
There are many other options, and I'm sure others can think of some different ones:
an ordering queue (pre-order system) - Some scripts might still end up at the front of the queue, but it's probably faster to just manually enter the info.
a raffle system (everyone who tries to order one is entered into the system) - This way the people with the scripts have just the same chances as those without.
a rush priority queue - If there is truly a high perceived value, people may be willing to pay more. Implement an ordering queue, but allow people to pay more to be placed higher in the queue.
auction (credit goes to David Schmitt for this one, comments are my own) - People can still use scripts to snipe in at the last minute, but not only does it change the pricing structure, people are expecting to be fighting it out with others. You can also do things to restrict the number of bids in a given time period, make people phone in ahead of time for an authorization code, etc.
无论纳粹认为他们的通讯多么安全,盟友们经常会破坏他们的信息。 无论您如何尝试阻止机器人使用您的网站,机器人所有者都会想出办法解决它。 如果这让你成为纳粹,我很抱歉:-)
我认为需要不同的心态
进入这样的心态:无论您网站的客户是人类还是机器人,两者都只是付费客户; 但其中一方比另一方拥有不公平的优势。 一些没有太多社交生活的用户(隐士)可能会像机器人一样让您网站的其他用户感到厌烦。
记录您发布报价的时间以及帐户选择购买的时间。
改变发布优惠的时间。
随着时间的推移,一幅图画就会浮现出来。
01:您可以在产品上线后几秒钟内看到哪些账户经常购买产品。 表明他们可能是机器人。
02:您还可以查看报价所用的时间窗口,如果窗口是 1 小时,那么一些早期买家将是人类。 不过,人类很少会刷新 4 小时。 如果无论窗口持续时间如何,发布/购买之间的经过时间都非常一致,那么那就是机器人。 如果小窗口的发布/购买时间较短,而大窗口的发布/购买时间较长,那就是隐士!
现在,您不必阻止机器人使用您的网站,而是拥有足够的信息来告诉您哪些帐户肯定被机器人使用,以及哪些帐户可能被隐士使用。 您如何处理这些信息取决于您,但您当然可以使用它来使您的网站对有生活的人更加公平。
我认为禁止机器人账户毫无意义,这就像给希特勒打电话并说“感谢你的潜艇的位置!” 不知何故,您需要以帐户所有者不会意识到的方式使用这些信息。 让我们看看我是否能想到什么......
处理队列中的订单:
当客户下订单时,他们立即收到一封确认电子邮件,告诉他们他们的订单已放入队列中,并将被处理。处理完毕后通知。 我在亚马逊上的订单/发货时遇到过这种情况,这根本不困扰我,我不介意几天后收到一封电子邮件告诉我我的订单已发货,只要我立即收到一封电子邮件告诉我亚马逊知道我想要这本书。 对于您的情况,这将是一封电子邮件,表示
用户认为他们处于公平的队列中。 每 1 小时处理一次队列,以便普通用户也体验到队列,以免引起怀疑。 仅当机器人和隐士帐户处于“平均人工订购时间 + x 小时”的队列中时,才处理来自机器人和隐士帐户的订单。 有效减少机器人对人类的影响。
No matter how secure the Nazi's thought their communications were, the allies would often break their messages. No matter how you try to stop bots from using your site the bot owners will work out a way around it. I'm sorry if that makes you the Nazi :-)
I think a different mindset is required
Get into the mindset that it doesn't matter whether the client of your site is a human or a bot, both are just paying customers; but one has an unfair advantage over the other. Some users without much of a social life (hermits) can be just as annoying for your site's other users as bots.
Record the time you publish an offer and the time an account opts to buy it.
Vary the time of day you publish offers.
Over time a picture will emerge.
01: You can see which accounts are regularly buying products within seconds of them going live. Suggesting they might be bots.
02: You can also look at the window of time used for the offers, if the window is 1 hour then some early buyers will be humans. A human will rarely refresh for 4 hours though. If the elapsed time is quite consistent between publish/purchase regardless of the window duration then that's a bot. If the publish/purchase time is short for small windows and gets longer for large windows, that's a hermit!
Now instead of stopping bots from using your site you have enough information to tell you which accounts are certainly used by bots, and which accounts are likely to be used by hermits. What you do with that information is up to you, but you can certainly use it to make your site fairer to people who have a life.
I think banning the bot accounts would be pointless, it would be akin to phoning Hitler and saying "Thanks for the positions of your U-boats!" Somehow you need to use the information in a way that the account owners wont realise. Let's see if I can dream anything up.....
Process orders in a queue:
When the customer places an order they immediately get a confirmation email telling them their order is placed in a queue and will be notified when it has been processed. I experience this kind of thing with order/dispatch on Amazon and it doesn't bother me at all, I don't mind getting an email days later telling me my order has been dispatched as long as I immediately get an email telling me that Amazon knows I want the book. In your case it would be an email for
Users think they are in a fair queue. Process your queue every 1 hour so that normal users also experience a queue, so as not to arouse suspicion. Only process orders from bot and hermit accounts once they have been in the queue for the "average human ordering time + x hours". Effectively reducing bots to humans.
我说的是使用 API 公开价格信息。 这是一个不直观的解决方案,但它确实可以让您控制情况。 对 API 添加一些限制,使其功能略逊于网站。
您可以对订单执行相同的操作。 您可以尝试对 API 功能/性能进行一些小的更改,直到获得所需的效果。
有代理和僵尸网络可以绕过 IP 检查。 有一些验证码读取脚本非常好。 印度甚至有一些工作团队以很小的代价破解了验证码。 你能想出的任何解决方案都可能被合理地击败。 即使是 Ned Batchelder 的解决方案也可以通过使用 WebBrowser 控件或其他模拟浏览器与僵尸网络或代理列表相结合来解决。
I say expose the price information using an API. This is the unintuitive solution but it does work to give you control over the situation. Add some limitations to the API to make it slightly less functional than the website.
You could do the same for ordering. You could experiment with small changes to the API functionality/performance until you get the desired effect.
There are proxies and botnets to defeat IP checks. There are captcha reading scripts that are extremely good. There are even teams of workers in India who defeat captchas for a small price. Any solution you can come up with can be reasonably defeated. Even Ned Batchelder's solutions can be stepped past by using a WebBrowser control or other simulated browser combined with a botnet or proxy list.
我们目前正在使用 F5 最新一代 BigIP 负载均衡器来实现此目的。 BigIP 具有先进的流量管理功能,可以根据使用频率和模式识别抓取工具和机器人,甚至可以从单个 IP 背后的一组来源中识别抓取工具和机器人。 然后,它可以限制这些内容,为它们提供替代内容,或者简单地用标头或 cookie 标记它们,以便您可以在应用程序代码中识别它们。
We are currently using the latest generation of BigIP load balancers from F5 to do this. The BigIP has advanced traffic management features that can identify scrapersand bots based on frequency and patterns of use even from amongst a set of sources behind a single IP. It can then throttle these, serve them alternative content or simply tag them with headers or cookies so you can identify them in your application code.
引入需要人机交互的延迟怎么样,就像一种“验证码游戏”。 例如,它可能是一个小型 Flash 游戏,在 30 秒内,他们必须爆破方格球并避免爆破实心球(避免色盲问题!)。 游戏将获得一个随机数种子,游戏传输回服务器的将是点击点的坐标和时间戳,以及所使用的种子。
在服务器上,您可以使用该种子模拟游戏机制,看看点击是否确实会使球破裂。 如果他们这样做了,他们不仅是人类,而且还花了 30 秒来验证自己。 给他们一个会话 ID。
您让该会话 ID 做它喜欢做的事,但如果发出太多请求,它们就无法继续而不继续播放。
How about introducing a delay which requires human interaction, like a sort of "CAPTCHA game". For example, it could be a little Flash game where during 30 seconds they have to burst checkered balls and avoid bursting solid balls (avoiding colour blindness issues!). The game would be given a random number seed and what the game transmits back to the server would be the coordinates and timestamps of the clicked points, along with the seed used.
On the server you simulate the game mechanics using that seed to see if the clicks would indeed have burst the balls. If they did, not only were they human, but they took 30 seconds to validate themselves. Give them a session id.
You let that session id do what it likes, but if makes too many requests, they can't continue without playing again.
首先,让我回顾一下我们需要做的事情。 我意识到我只是在转述原来的问题,但重要的是我们要 100% 直接回答这个问题,因为有很多很好的建议可以在 4 分中答对 2 或 3 个,但正如我将演示的,您将需要一个多方面的方法来满足所有要求。
要求 1:消除“机器人攻击”:
首页的快速“攻击”正在损害网站的性能,并且是问题的核心。 这种“猛烈攻击”既来自单 IP 机器人,也可能来自僵尸网络。 我们想摆脱两者。
要求 2:不要扰乱用户体验:
我们可以通过实施令人讨厌的验证程序(例如打电话给人工操作员、解决一堆验证码或类似的操作)来非常有效地解决机器人问题,但是这就像强迫每一位无辜的飞机乘客跳过疯狂的安全圈,只是为了抓住最愚蠢的恐怖分子的渺茫机会。 哦等等——我们确实这么做了。 但让我们看看是否可以在 woot.com 上这样做。
要求 3:避免“军备竞赛”:
正如您提到的,您不想陷入垃圾邮件机器人军备竞赛。 因此,您不能使用隐藏或混乱的表单字段、数学问题等简单的调整,因为它们本质上是可以轻松自动检测和规避的模糊措施。
要求 4:阻止“警报”机器人:
这可能是您的要求中最困难的一个。 即使我们可以进行有效的人工验证挑战,机器人仍然可以轮询您的首页并在有新优惠时提醒脚本编写者。 我们也希望让这些机器人变得不可行。 这是第一个要求的更强版本,因为机器人不仅不能发出损害性能的快速请求,甚至不能发出足够多的重复请求来及时向脚本编写者发送“警报”以赢得胜利报价。
好的,让我们看看是否可以满足所有四个要求。 首先,正如我所提到的,任何一项措施都无法解决问题。 你将不得不结合一些技巧来实现它,并且你将不得不吞下两个烦恼:
我意识到这些很烦人,但如果我们能让“小”数字足够小,我希望您会同意积极大于消极。
第一项措施:基于用户的限制:
第二项措施:某种形式的 IP 限制,几乎每个人都建议这样做:
第三项措施:用缓存的响应隐藏节流阀:
第四个措施:验证码:
第五招:诱饵废话:
第六项措施:僵尸网络节流:
好吧............我现在花了大部分时间思考这个问题,尝试不同的方法......全局延迟......基于cookie的令牌......排队服务...“陌生人节流”....但它就是行不通。 事实并非如此。 我意识到您尚未接受任何答案的主要原因是没有人提出一种方法来阻止分布式/僵尸网络/僵尸网络攻击......所以我真的很想破解它。 我相信我在不同线程,所以我对你的问题也寄予厚望。 但我的方法并不能转化为这一点。 您只有 IP 可供参考,并且足够大的僵尸网络不会在任何基于 IP 地址的分析中暴露出来。
所以你明白了:我的第六项措施毫无意义。 没有什么。 压缩。 除非僵尸网络足够小和/或足够快,能够陷入通常的 IP 限制,否则我看不到任何针对僵尸网络的有效措施,不涉及明确的人类- 验证,例如 CAPTHA。 抱歉,但我认为结合上述五项措施是最好的选择。 仅使用 abelenky 的 10 分钟缓存技巧,您可能就可以做得很好。
First, let me recap what we need to do here. I realize I'm just paraphrasing the original question, but it's important that we get this 100% straight, because there are a lot of great suggestions that get 2 or 3 out of 4 right, but as I will demonstrate, you will need a multifaceted approach to cover all of the requirements.
Requirement 1: Getting rid of the 'bot slamming':
The rapid-fire 'slamming' of your front page is hurting your site's performance and is at the core of the problem. The 'slamming' comes from both single-IP bots and - supposedly - from botnets as well. We want to get rid of both.
Requirement 2: Don't mess with the user experience:
We could fix the bot situation pretty effectively by implementing a nasty verification procedure like phoning a human operator, solving a bunch of CAPTCHAs, or similar, but that would be like forcing every innocent airplane passenger to jump through crazy security hoops just for the slim chance of catching the very stupidest of terrorists. Oh wait - we actually do that. But let's see if we can not do that on woot.com.
Requirement 3: Avoiding the 'arms race':
As you mention, you don't want to get caught up in the spambot arms race. So you can't use simple tweaks like hidden or jumbled form fields, math questions, etc., since they are essentially obscurity measures that can be trivially autodetected and circumvented.
Requirement 4: Thwarting 'alarm' bots:
This may be the most difficult of your requirements. Even if we can make an effective human-verification challenge, bots could still poll your front page and alert the scripter when there is a new offer. We want to make those bots infeasible as well. This is a stronger version of the first requirement, since not only can't the bots issue performance-damaging rapid-fire requests -- they can't even issue enough repeated requests to send an 'alarm' to the scripter in time to win the offer.
Okay, so let's se if we can meet all four requirements. First, as I mentioned, no one measure is going to do the trick. You will have to combine a couple of tricks to achieve it, and you will have to swallow two annoyances:
I realize these are annoying, but if we can make the 'small' number small enough, I hope you will agree the positives outweigh the negatives.
First measure: User-based throttling:
Second measure: Some form of IP throttling, as suggested by nearly everyone:
Third measure: Cloaking the throttle with cached responses:
Fourth measure: reCAPTCHA:
Fifth measure: Decoy crap:
Sixth measure: Botnet Throttling:
Okay............ I've now spent most of my evening thinking about this, trying different approaches.... global delays.... cookie-based tokens.. queued serving... 'stranger throttling'.... And it just doesn't work. It doesn't. I realized the main reason why you hadn't accepted any answer yet was that noone had proposed a way to thwart a distributed/zombie net/botnet attack.... so I really wanted to crack it. I believe I cracked the botnet problem for authentication in a different thread, so I had high hopes for your problem as well. But my approach doesn't translate to this. You only have IPs to go by, and a large enough botnet doesn't reveal itself in any analysis based on IP addresses.
So there you have it: My sixth measure is naught. Nothing. Zip. Unless the botnet is small and/or fast enough to get caught in the usual IP throttle, I don't see any effective measure against botnets that doesn't involve explicit human-verification such as CAPTHAs. I'm sorry, but I think combining the above five measures is your best bet. And you could probably do just fine with just abelenky's 10-minute-caching trick alone.
已经发布了一些其他/更好的解决方案,但为了完整性,我想我应该提到这一点:
如果您主要关心的是性能下降,并且您正在寻找真正的锤击,那么您'实际上,您正在处理 DoS 攻击,您可能应该尝试相应地处理它。 一种常见的方法是在每秒/分钟等出现一定数量的连接后,简单地丢弃来自防火墙中某个 IP 的数据包。 例如,标准 Linux 防火墙 iptables 具有标准操作匹配函数“hashlimit”,可用于将每个时间单位的连接请求与 IP 地址相关联。
虽然,这个问题可能更适合上一个 SO 播客中提到的下一个 SO 衍生品,但它尚未启动,所以我想可以回答:)
编辑:
正如 novatrust 所指出的,仍然有 ISP 实际上没有向其客户分配 IP,因此,此类 ISP 的脚本客户实际上会禁用该 ISP 的所有客户。
There are a few other / better solutions already posted, but for completeness, I figured I'd mention this:
If your main concern is performance degradation, and you're looking at true hammering, then you're actually dealing with a DoS attack, and you should probably try to handle it accordingly. One common approach is to simply drop packets from an IP in the firewall after a number of connections per second/minute/etc. For example, the standard Linux firewall, iptables, has a standard operation matching function 'hashlimit', which could be used to correlate connection requests per time unit to an IP-address.
Although, this question would probably be more apt for the next SO-derivate mentioned on the last SO-podcast, it hasn't launched yet, so I guess it's ok to answer :)
EDIT:
As pointed out by novatrust, there are still ISPs actually NOT assigning IPs to their customers, so effectively, a script-customer of such an ISP would disable all-customers from that ISP.
耗尽你的带宽。
让每个人随机等待
最多 45 秒的时间
或其他什么,取决于什么
你正在寻找的正是。 确切地
你的时间限制是什么?
eat up your bandwidth.
make everyone wait a random
amount of time of up to 45 seconds
or something, depending on what
you're looking for exactly. Exactly
what are your timing constraints?
首先,根据定义,不可能支持无状态(即真正匿名)交易,同时又能够将机器人与合法用户分开。
如果我们可以接受一个前提,即我们可以对全新的 woot 访问者的首页点击施加一些成本,我想我有一个可能的解决方案。 由于缺乏更好的名称,我将将此解决方案宽松地称为“访问 DMV”。
假设有一家汽车经销商每天提供不同的新车,并且在某些日子,您可以以每辆 5 美元的价格购买一辆异国情调的跑车(限 3 辆),外加 5 美元的目的地费用。
问题是,经销商要求您前往经销店并出示有效的驾驶执照,然后才允许您进门查看正在出售的汽车。 此外,您必须出示有效的驾驶执照才能进行购买。
因此,第一次来这家汽车经销商的访客(我们称他为鲍勃)会被拒绝进入,并被转介到 DMV 办公室(位于隔壁,交通便利)获取驾驶执照。
其他持有有效驾驶执照的游客在出示驾驶执照后即可进入。 一个整天闲逛、缠着推销员、抢小册子、喝光免费咖啡和饼干而惹麻烦的人最终会被拒之门外。
现在,回到没有驾照的鲍勃——他所要做的就是忍受一次 DMV 的访问。 之后,他可以随时前往经销店购买汽车,除非他不小心将钱包落在家里,或者他的驾照被销毁或吊销。
这个世界上的驾驶执照几乎不可能伪造。
访问 DMV 时首先需要在“从这里开始”队列中获取申请表。 鲍勃必须将填好的申请表带到 1 号窗口,在那里,众多乖戾的公务员中的第一个将接受他的申请,对其进行处理,如果一切正常,则在该窗口的申请表上盖章,然后将他发送到下一个窗口。 因此,鲍勃从一个窗口走到另一个窗口,等待他的申请的每一步完成,直到他最终到达终点并收到他的驾驶执照。
试图“短路”DMV 是没有意义的。 如果表格填写不正确,一式三份,或者在任何窗口给出错误答案,申请表将被撕毁,倒霉的客户将被送回起点。
有趣的是,无论办公室有多满或多空,在每个连续窗口获得服务所需的时间大约相同。 即使你是队列中唯一的人,工作人员似乎也喜欢让你在黄线后面等一分钟,然后才说:“下一个!”
然而,DMV 的情况并没有那么糟糕。 在等待和处理获取驾照的过程中,您可以在 DMV 大厅观看汽车经销店的非常有趣且信息丰富的电视广告。 事实上,该信息片的播放时间刚好足以涵盖您获得驾照所花费的时间。
稍微更技术性的解释:
正如我在最上面所说的,有必要在客户端-服务器关系上有一些状态,这允许您将人类与机器人分开。 您希望以一种不会过度惩罚匿名(未经身份验证)人类访问者的方式进行此操作。
这种方法可能需要 AJAX-y 客户端处理。 一个全新的 woot 访客会收到“欢迎新用户!”的提示。 充满文本和图形的页面(通过适当的服务器端限制)需要几秒钟才能完全加载。 当这种情况发生时(访问者可能正忙着阅读欢迎页面),他的识别令牌正在慢慢组装。
假设,为了讨论,令牌(又名“驾驶执照”)由 20 个块组成。为了获取每个连续的块,客户端代码必须向服务器提交有效的请求。服务器故意延迟(假设200 毫秒),然后发送下一个块以及发出下一个块请求所需的“标记”(即,从一个 DMV 窗口到下一个窗口所需的标记),总而言之,必须经过大约 4 秒才能完成该请求。 chunk-challenge-response-chunk-challenge-response-...-chunk-challenge-response-completion 过程
在此过程结束时,访问者将获得一个令牌,允许他转到产品描述页面,然后在转到购买页面。令牌是每个访问者的唯一 ID,可用于限制其活动。
在服务器端,您只接受具有有效令牌的客户端的页面视图。每个人最终都可以看到该页面,对缺少有效令牌的请求进行时间惩罚。
现在,为了使这对合法的人类访问者相对有利,需要使令牌发行过程在后台相对非侵入性地发生。 因此,欢迎页面需要故意稍微放慢速度,并带有有趣的文案和图形。
这种方法会强制限制机器人使用现有令牌,或者花费最短的设置时间来获取新令牌。 当然,这对于抵御使用分布式虚假访问者网络的复杂攻击没有多大帮助。
First of all, by definition, it is impossible to support stateless, i.e. truly anonymous, transactions while also being able to separate the bots from legitimate users.
If we can accept a premise that we can impose some cost on a brand-spanking-new woot visitor on his first page hit(s), I think I have a possible solution. For lack of a better name, I'm going to loosely call this solution "A visit to the DMV."
Let's say that there's a car dealership that offers a different new car each day, and that on some days, you can buy an exotic sports car for $5 each (limit 3), plus a $5 destination charge.
The catch is, the dealership requires you to visit the dealership and show a valid driver's license before you're allowed in through the door to see what car is on sale. Moreover, you must have said valid driver's license in order to make the purchase.
So, the first-time visitor (let's call him Bob) to this car dealer is refused entry, and is referred to the DMV office (which is conveniently located right next door) to obtain a driver's license.
Other visitors with a valid driver's license is allowed in, after showing his driver's license. A person who makes a nuisance of himself by loitering around all day, pestering the salesmen, grabbing brochures, and emptying the complimentary coffee and cookies will eventually be turned away.
Now, back to Bob without the license -- all he has to do is endure the visit to the DMV once. After that, he can visit the dealership and buy cars anytime he likes, unless he accidentally left his wallet at home, or his license is otherwised destroyed or revoked.
The driver's license in this world is nearly impossible to forge.
The visit to the DMV involves first getting the application form at the "Start Here" queue. Bob has to take the completed application to window #1, where the first of many surly civil servants will take his application, process it, and if everything is in order, stamp the application for the window and send him to the next window. And so, Bob goes from windows to window, waiting for each step of his application to go through, until he finally gets to the end and receives his drivere's license.
There's no point in trying to "short circuit" the DMV. If the forms are not filled out correctly in triplicate, or any wrong answers given at any window, the application is torn up, and the hapless customer is sent back to the start.
Interestingly, no matter how full or empty the office is, it takes about the same amount of time to get serviced at each successive window. Even when you're the only person in line, it seems that the personnel likes to make you wait a minute behind the yellow line before uttering, "Next!"
Things aren't quite so terrible at the DMV, however. While all the waiting and processing to get the license is going on, you can watch a very entertaining and informative infomercial for the car dealership while you're in the DMV lobby. In fact, the infomerical runs just long enough to cover the amount of time you spend getting your license.
The slightly more technical explanation:
As I said at the very top, it becomes necessary to have some statefulness on the client-server relationship which allows you to separate humans from bots. You want to do it in a way that doesn't overly penalize the anonymous (non-authenticated) human visitor.
This approach probably requires an AJAX-y client-side processing. A brand-spanking-new visitor to woot is given the "Welcome New User!" page full of text and graphics which (by appropriate server-side throttling) takes a few seconds to load completely. While this is happening (and the visitor is presumably busy reading the welcome page(s)), his identifying token is slowly being assembled.
Let's say, for discussion, the token (aka "driver's license) consists of 20 chunks. In order to get each successive chunk, the client-side code must submit a valid request to the server. The server incorporates a deliberate delay (let's say 200 millisecond), before sending the next chunk along with the 'stamp' needed to make the next chunk request (i.e., the stamps needed to go from one DMV window to the next). All told, about 4 seconds must elapse to finish the chunk-challenge-response-chunk-challenge-response-...-chunk-challenge-response-completion process.
At the end of this process, the visitor has a token which allows him to go to the product description page and, in turn, go to the purchasing page. The token is a unique ID to each visitor, and can be used to throttle his activities.
On the server side, you only accept page views from clients that have a valid token. Or, if it's important that everyone can ultimately see the page, put a time penalty on requests that is missing a valid token.
Now, for this to be relatiely benign to the legitimate human visitor,t make the token issuing process happen relatively non-intrusively in the background. Hence the need for the welcome page with entertaining copy and graphics that is deliberately slowed down slightly.
This approach forces a throttle-down of bots to either use an existing token, or take the minimum setup time to get a new token. Of course, this doesn't help as much against sophisticated attacks using a distributed network of faux visitors.
在实现 Tarpit 的应用程序前面的 apache 服务器上编写一个反向代理 (维基百科文章) 来惩罚机器人。 它只会管理过去几秒钟内连接的 IP 地址列表。 您检测到来自单个 IP 地址的突发请求,然后在响应之前以指数方式延迟这些请求。
当然,如果多个人位于 NAT 网络连接上,则可以来自同一个 IP 地址,但人们不太可能介意您的响应时间为 2 毫秒到 4 毫秒(甚至 400 毫秒),而机器人则会受到阻碍延迟的增加很快。
Write a reverse-proxy on an apache server in front of your application which implements a Tarpit (Wikipedia Article) to punish bots. It would simply manage a list of IP addresses that connected in the last few seconds. You detect a burst of requests from a single IP address and then exponentially delay those requests before responding.
Of course, multiple humans can come from the same IP address if they're on a NAT'd network connection but it's unlikely that a human would mind your response time going for 2mS to 4mS (or even 400mS) whereas a bot will be hampered by the increasing delay pretty quickly.
我没有看到您声称检查传入 IP 带来的巨大负担。 相反,我为我的一个客户做了一个项目,每五分钟分析一次 HTTP 访问日志(它本来可以是实时的,但由于某种我从未完全理解的原因,他不希望这样做)并且创建防火墙规则来阻止来自任何生成过多请求的 IP 地址的连接,除非可以确认该地址属于合法搜索引擎(google、yahoo 等)。
该客户端运行 Web 托管服务,并在三台服务器上运行该应用程序,这三台服务器总共处理 800-900 个域。 峰值活动在每秒数千次点击范围内,并且从未出现过性能问题 - 防火墙在丢弃来自黑名单地址的数据包方面非常有效。
是的,DDOS 技术确实存在,可以击败这个计划,但他并没有在现实世界中看到这种情况发生。 相反,他说这大大减少了他服务器上的负载。
I'm not seeing the great burden that you claim from checking incoming IPs. On the contrary, I've done a project for one of my clients which analyzes the HTTP access logs every five minutes (it could have been real-time, but he didn't want that for some reason that I never fully understood) and creates firewall rules to block connections from any IP addresses that generate an excessive number of requests unless the address can be confirmed as belonging to a legitimate search engine (google, yahoo, etc.).
This client runs a web hosting service and is running this application on three servers which handle a total of 800-900 domains. Peak activity is in the thousand-hits-per-second range and there has never been a performance issue - firewalls are very efficient at dropping packets from blacklisted addresses.
And, yes, DDOS technology definitely does exist which would defeat this scheme, but he's not seeing that happen in the real world. On the contrary, he says it's vastly reduced the load on his servers.
我的方法是专注于非技术解决方案(否则你将进入一场军备竞赛,你会失败,或者至少花费大量时间和金钱)。 我将重点关注计费/发货部分 - 您可以通过查找同一地址的多次送货或通过单一付款方式的多次收费来找到机器人。 您甚至可以在几周内跨项目执行此操作,因此如果用户获得了上一个项目(通过响应非常非常快),他这次可能会被分配某种“障碍”。
这也会有一个副作用(我认为是有益的,但对于你的情况来说,我可能在营销方面是错误的),可能会扩大幸运并购买商品的人的圈子。
My approach would be to focus on non-technological solutions (otherwise you're entering an arms race you'll lose, or at least spend a great deal of time and money on). I'd focus on the billing/shipment parts - you can find bots by either finding multiple deliveries to same address or by multiple charges to a single payment method. You can even do this across items over several weeks, so if a user got a previous item (by responding really really fast) he may be assigned some sort of "handicap" this time around.
This would also have a side effect (beneficial, I would think, but I could be wrong marketing-wise for your case) of perhaps widening the circle of people who get lucky and get to purchase woot.
即使使用验证码,您也无法完全阻止机器人。 然而,您可能会让编写和维护机器人变得很痛苦,从而减少机器人的数量。 特别是通过强迫他们每天更新他们的机器人,你会让大多数人失去兴趣。
以下是一些让编写机器人变得更加困难的想法:
需要运行 JavaScript 函数。 JavaScript 使得编写机器人变得更加痛苦。 如果他们没有运行 javascript,则可能需要验证码才能仍然允许实际的非 javascript 用户(最少)。
在表单中输入内容时对击键进行计时(再次通过 JavaScript)。 如果它不像人类,那就拒绝它。 在机器人中模仿人类打字是一件很痛苦的事情。
编写代码,每天使用新的随机值更新字段 ID。 这将迫使他们每天更新他们的机器人,这很痛苦。
编写代码以每天对字段重新排序(显然以某种对用户来说不是随机的方式)。 如果他们依赖字段顺序,这将使他们陷入困境,并再次强制对其机器人代码进行日常维护。
您可以更进一步,使用 Flash 内容。 编写针对 Flash 的机器人非常痛苦。
一般来说,如果你开始采取不阻止它们的心态,而是让它对它们更有利,你可能就能实现你正在寻找的目标。
You can't totally prevent bots, even with a captcha. However you can make it a pain to write and maintain a bot and therefore reduce the number. Particularly by forcing them to update their bots daily you'll be causing most to lose interest.
Here are a some ideas to make it harder to write bots:
Require running a javascript function. Javascript makes it much more of a pain to write a bot. Maybe require a captcha if they aren't running javascript to still allow actual non-javascript users (minimal).
Time the keystrokes when typing into the form (again via javascript). If it's not human-like then reject it. It's a pain to mimic human typing in a bot.
Write your code to update your field ID's daily with a new random value. This will force them to update their bot daily which is a pain.
Write your code to re-order your fields on a daily basis (obviously in some way that's not random to your users). If they're relying on the field order, this will trip them up and again force daily maintenance to their bot code.
You could go even further and use Flash content. Flash is totally a pain to write a bot against.
Generally if you start taking a mindset of not preventing them, but making it more work for them, you can probably achieve the goal you're looking for.
对未注册用户的所有产品公告均延迟 5 分钟。 休闲用户不会真正注意到这一点,非休闲用户无论如何都会注册。
Stick a 5 minute delay on all product announcements for unregistered users. Casual users won't really notice this and noncasual users will be registered anyhow.
对每分钟发出大量请求的用户代理进行时间限制。 例如,如果有人在 10 分钟内每 5 秒请求一个页面,那么他们可能不是用户……但是要正确处理这一点可能很棘手。
如果他们触发警报,请将每个请求重定向到具有尽可能少的 DB-IO 的静态页面,并显示一条消息,让他们知道他们将在 X 分钟内重新打开。
重要的是要补充一点,您可能应该只将其应用于页面请求,并忽略所有媒体请求(js、图像等)。
Time-block user agents that make so-many requests per minute. Eg if you've got somebody requesting a page exactly every 5 seconds for 10 minutes, they're probably not a user... But it could be tricky to get this right.
If they trigger an alert, redirect every request to a static page with as little DB-IO as possible with a message letting them know they'll be allowed back on in X minutes.
It's important to add that you should probably only apply this on requests for pages and ignore all the requests for media (js, images, etc).
防止 DoS 会挫败 @davebug 上面概述的第二个目标,“让网站保持不被机器人减慢的速度”,但不一定解决第一个问题,“将项目出售给非脚本人员”,
我确信脚本编写者可以写一些东西来滑行,但其速度仍然比人类填写订购表格的速度要快。
Preventing DoS would defeat #2 of @davebug's goals he outlined above, "Keep the site at a speed not slowed by bots" but wouldn't necessary solve #1, "Sell the item to non-scripting humans"
I'm sure a scripter could write something to skate just under the excessive limit that would still be faster than a human could go through the ordering forms.
好吧,那么垃圾邮件发送者正在与普通人竞争以赢得“垃圾沼泽”拍卖吗? 为什么不让下一次拍卖成为名副其实的“一袋垃圾”呢? 垃圾邮件发送者可以花很多钱买一袋狗屎,我们都嘲笑他们。
All right so the spammers are out competing regular people to win the "bog of crap" auction? Why not make the next auction be a literal "bag of crap"? The spammers get to pay good money for a bag full of doggy do, and we all laugh at them.
这里重要的是改变系统以消除服务器的负载,防止机器人赢得一袋垃圾,而不让机器人领主知道你在玩弄他们,否则他们会修改他们的策略。 我认为如果您没有进行一些处理,就没有任何方法可以做到这一点。
因此,您在主页上记录了点击次数。 每当有人点击该页面时,该连接就会与上次点击进行比较,如果速度太快,则会发送一个不带优惠的页面版本。 这可以通过某种负载平衡机制来完成,该机制将机器人(太快的点击)发送到仅提供主页的缓存版本的服务器; 真实的人会被发送到好的服务器。 这减轻了主服务器的负载,并使机器人认为它们仍在正确地提供页面服务。
如果可以以某种方式拒绝报价就更好了。 然后你仍然可以在虚假服务器上提出报价,但是当机器人填写表格时说“抱歉,你不够快”:) 然后他们肯定会认为他们仍在游戏中。
The important thing here is to change the system to remove load from your server, prevent bots from winning the bag of crap WITHOUT letting the botlords know you are gaming them or they will revise their strategy. I don't think there is any way to do this without some processing at your end.
So you record hits on your home page. Whenever someone hits the page that connection is compared to its last hit, and if it was too quick then it is sent a version of the page without the offer. This can be done by some sort of load balancing mechanism that sends bots (the hits that are too fast) to a server that simply serves cached versions of your home page; real people get sent to the good server. This takes the load off the main server and makes the bots think that they are still being served the pages correctly.
Even better if the offer can be declined in some way. Then you can still make the offers on the faux server but when the bot fills out the form say "Sorry, you weren't quick enough" :) Then they will definitely think they are still in the game.
大多数纯技术解决方案已经提供。 因此,我将提出对该问题的另一种看法。
据我了解,这些机器人是由那些真正试图购买您所销售的包袋的人设置的。 问题是 -
您可以让潜在的包包购买者订阅电子邮件,甚至短信更新,以便在出售时收到通知,而不是试图避开机器人。 您甚至可以给他们一两分钟的时间(销售开始的特殊 URL,随机生成并通过邮件/短信发送)。
当这些买家在您的网站上购买时,您可以在侧横幅或其他内容中向他们展示您想要的任何内容。 那些运行机器人的人会更愿意简单地注册到您的通知服务。
机器人运行者可能仍会在您的通知上运行机器人,以更快地完成购买。 一些解决方案可以提供一键购买。
顺便说一句,你提到你的用户没有注册,但听起来购买这些包的人不是随机购买者,而是期待这些销售的人。 因此,他们可能愿意注册以获得试图“赢得”袋子的优势。
从本质上讲,我的建议是尝试将问题视为社会问题,而不是技术问题。
阿萨夫
Most purely technical solutions have already been offered. I'll therefore suggest another view of the problem.
As I understand it, the bots are set up by people genuinely trying to buy the bags you're selling. The problem is -
Instead of trying to avoid the bots, you can enable potential bag-buyers to subscribe to an email, or even SMS update, to get notified when a sell will take place. You can even give them a minute or two head start (a special URL where the sell starts, randomly generated, and sent with the mail/SMS).
When these buyers go to buy they're in you're site, you can show them whatever you want in side banners or whatever. Those running the bots will prefer to simply register to your notification service.
The bots runners might still run bots on your notification to finish the buy faster. Some solutions to that can be offering a one-click buy.
By the way, you mentioned your users are not registered, but it sounds like those buying these bags are not random buyers, but people who look forward to these sales. As such, they might be willing to register to get an advantage in trying to "win" a bag.
In essence what I'm suggesting is try and look at the problem as a social one, rather than a technical one.
Asaf
您可以尝试让脚本更难阅读价格。 最简单的方法是通过将其转换为图像来实现,但文本识别算法仍然可以解决这个问题。 如果有足够多的脚本编写人员绕过它,您可以尝试将类似验证码的东西应用于此图像,但显然是以牺牲用户体验为代价的。 价格可以显示在 Flash 应用程序中,而不是图像。
或者,您可以尝试设计一种方法,以某种不影响渲染的方式“打乱”页面的 HTML。 我想不出一个好的例子,但我确信它在某种程度上是可行的。
You could try to make the price harder for scripts to read. This is achieved most simply by converting it to an image, but a text recognition algorithm could still get around this. If enough scripters get around it, you could try applying captcha-like things to this image, but obviously at the cost of user experience. Instead of an image, the price could go in a flash app.
Alternately, you could try to devise a way to "shuffle" the HTML pf a page in some way that doesn't affect the rendering. I can't think of a good example off the top of my head, but I'm sure it's somehow doable.