如何防止 Bing 不定期地淹没我的网站?
Bingbot 每天会在几个小时内非常频繁地访问我的网站,而在其余时间里会非常轻松。
我要么想平滑其爬行,降低其速率限制,要么完全阻止它。它并没有真正发送任何真正的访客。
有没有办法可以平滑其爬行或对其进行速率限制?
Bingbot will hit my site pretty hard for a couple of hours each day, and will be extremely light for the rest of the time.
I'd either like to smooth out its crawls, reduce its rate limit, or block it altogether. It doesn't really send through any real visitors.
Is there a way I can smooth its crawling, or rate limit it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
他们的网站站长博客表示,他们支持向您的
robots.txt
文件添加抓取延迟参数以限制 bingbot。网站管理员常见问题解答 PDF。
这些其他链接可能也有帮助:
https://www.bing.com/webmasters/about
http://www.bing.com/community/webmaster/f/12252/t/651373.aspx
Their webmaster blog says that they support adding a crawl-delay parameter to your
robots.txt
file to throttle the bingbot.There's a bit more explanation in the webmaster FAQ PDF.
These other links might be helpful as well:
https://www.bing.com/webmasters/about
http://www.bing.com/community/webmaster/f/12252/t/651373.aspx
您可以通过如下设置 IPTables 将爬虫的连接数限制为 fi 5(需要防火墙的 root 访问权限):
2bits.com的文章
IPTables的设置: iptables -I INPUT -p tcp -m connlimit
--connlimit-above 5 -j REJECT
这将每个 IP 地址的连接限制为不超过 5 个同时连接。这种“定量”连接,并防止爬虫同时访问站点。
You can limit the number of connections from the crawler to f.i. 5 by setting IPTables like this (requires root access to the firewall):
The article at 2bits.com
the setting of IPTables: iptables -I INPUT -p tcp -m connlimit
--connlimit-above 5 -j REJECT
This limits connections from each IP address to no more than 5 simultaneous connections. This sort of "rations" connections, and prevents crawlers from hitting the site simultaneously.
您可以使用 HTACCESS 禁止他的 IP。
有关更多信息,您可以在这里找到:关于机器人的博客阻塞
You can ban his IP using HTACCESS.
More about that you may find here: Blog about bot blocking