允许 Google 绕过 CAPTCHA 验证 - 明智与否?
我的网站有一个数据库查找功能;填写验证码后,您将获得 5 分钟的查找时间。还有一些自定义代码来检测任何自动化脚本。我这样做是因为我不希望有人对我的网站进行数据挖掘。
问题是 Google 在抓取我的网站时看不到查找结果。如果有人正在搜索查找结果中存在的字符串,我希望他们通过谷歌搜索找到此页面。
对我来说,显而易见的解决方案是使用 PHP 变量 $_SERVER['HTTP_USER_AGENT']
绕过 Google 机器人的验证码和自定义安全代码。我的问题是这是否明智。
然后,人们可以使用 Google 的缓存来查看查找结果,而无需填写验证码,但是 Google 自己的脚本检测方法会阻止他们对这些页面进行数据挖掘吗?
或者人们是否可以通过某种方式让 $_SERVER['HTTP_USER_AGENT']
显示为 Google 来绕过安全措施?
My web site has a database lookup; filling out a CAPTCHA gives you 5 minutes of lookup time. There is also some custom code to detect any automated scripts. I do this as I don't want someone data mining my site.
The problem is that Google does not see the lookup results when it crawls my site. If someone is searching for a string that is present in the result of a lookup, I would like them to find this page by Googling it.
The obvious solution to me is to use the PHP variable $_SERVER['HTTP_USER_AGENT']
to bypass the CAPTCHA and custom security code for the Google bots. My question is whether this is sensible or not.
People could then use Google's cache to view the lookup results without having to fill out the CAPTCHA, but would Google's own script detection methods prevent them from data mining these pages?
Or would there be some way for people to make $_SERVER['HTTP_USER_AGENT']
appear as Google to bypass the security measures?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
确实。用户代理很容易伪造。请参阅Firefox 的用户代理切换器。垃圾邮件机器人也很容易将其用户代理标头设置为 Google 机器人。
不过,它可能仍然值得一试。我想说的是,尝试一下,看看结果如何。如果遇到问题,您可能必须考虑其他方法。
识别 Google 机器人的另一种方法可能是它使用的 IP 范围。我不知道机器人是否使用定义的 IP 范围 - 情况可能并非如此,您必须找出答案。
更新:似乎可以通过分析其 IP 来验证 Google Bot。来自 Google 网站站长中心:如何验证 Googlebot
Definitely. The user agent is laughably easy to forge. See e.g. User Agent Switcher for Firefox. It's also easy for a spam bot to set its user agent header to the Google bot.
It might still be worth a shot, though. I'd say just try it out and see what the results are. If you get problems, you may have to think about another way.
An additional way to recognize the Google bot could be the IP range(s) it uses. I don't know whether the bot uses defined IP ranges - it could be that that's not the case, you'd have to find out.
Update: it seems to be possible to verify the Google Bot by analyzing its IP. From Google Webmaster Central: How to verify Googlebot
$_SERVER['HTTP_USER_AGENT']
参数不安全,如果人们真的想获得您的结果,可以伪造它。您的决定是一项商业决定,基本上您是否希望降低安全性并可能允许人们/机器人抓取您的网站,或者您是否希望对谷歌隐藏您的结果。the
$_SERVER['HTTP_USER_AGENT']
parameter is not secure, people can fake it if they really want to get your results. your decision is a business one, basically do you wish to lower security and potentially allow people/bots to scrape your site, or do you want your results hidden from google.