如何确保我的网站可以阻止自动化脚本、机器人?
我想确保我的网站阻止 Selenium 和 QTP 等自动化工具。有办法做到这一点吗? 网站上的哪些设置会导致 Selenium 失败?
I'd like to make sure that my website blocks automation tools like Selenium and QTP. Is there a way to do that ?
What settings on a website is Selenium bound to fail with ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
适当考虑对原始问题“你到底为什么要这样做?”的评论,你基本上需要遵循任何网站用来验证用户是否确实是人类的相同策略。要求用户进行身份验证或输入图像中的文本等方法可能会起作用,但这可能会产生阻止谷歌爬虫和其他所有内容的效果。
基于用户代理字符串或类似的事情做任何事情基本上都是无用的。这些都是微不足道的假货。
速率限制连接或类似的效果可能有限,但似乎您也会无意中阻止任何网络爬虫。
With due consideration to the comments on the original question asking "why on earth would you do this?", you basically need to follow the same strategy that any site uses to verify that a user is actually human. Methods such as asking users to authenticate or enter text from images or the like will probably work, but this will likely have the effect of blocking google crawlers and everything else.
Doing anything based on user agent strings or anything like that is mostly useless. Those are trivial to fake.
Rate-limiting connections or similar might have limited effectiveness, but it seems like you're going to inadvertently block any web crawlers too.
虽然这个问题看起来很奇怪,但很有趣,所以我尝试调查可能性
除了添加验证码(这是最好的也是唯一的最终解决方案)之外,您还可以通过将以下 JavaScript 添加到您的页面来阻止 Selenium(此示例将重定向到Google 页面,但你可以做任何你想做的事情):
我不知道如何阻止其他自动化工具,我不确定这是否不会阻止 Selenium IDE
While this questions seems to be strange it is funny, so I tried to investigate possibilities
Besides adding a CAPTCHA which is the best and the only ultimate solution, you can block Selenium by adding the following JavaScript to your pages (this example will redirect to the Google page, but you can do anything you want):
I do not know how can you block other automation tools and I am not sure if this will not block Selenium IDE
要 100% 确定没有自动机器人/脚本可以针对您的网站运行,请不要有在线网站。这肯定会满足您的要求。
由于众包和 OCR 方法,验证码即使不便宜也很容易被破解。
代理可以在野外免费找到,也可以以极低的成本获得批量代理。同样,对于限制连接速率或检测机器人毫无用处。
一种可能的方法是在您的应用程序逻辑中,通过电话验证、信用卡验证等方式来实现增加访问网站的时间和成本的方法。您的网站永远不会启动,因为在您的网站处于起步阶段时没有人会信任您的网站。
解决方案:不要将您的网站上线并期望能够有效消除机器人和脚本的运行。
to be 100% certain that no automated bots/scripts can be run against your websites, don't have a website online. This will meet your requirement with certainty.
CAPTCHA are easy to break if not cheap, thanks to crowdsourcing and OCR methods.
Proxies can be found in the wild for free or bulk are available at extremely low costs. Again, useless to limit connection rates or detect bots.
One possible approach can be in your application logic, implement ways to increase time and cost for access to the site by having things like phone verification, credit card verification. Your website will never get off the ground because nobody will trust your site at it's infancy.
Solution: Do not put your website online and expect to be able to effectively eliminate bots and scripts from running.