阻止滥用机器人爬行?
Is this a good idea??
What does abusive crawling mean? How is that bad for my site?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
Is this a good idea??
What does abusive crawling mean? How is that bad for my site?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(2)
并不真地。无论如何,大多数“坏机器人”都会忽略 robots.txt 文件。
滥用爬行通常意味着抓取。这些机器人的出现是为了收集电子邮件地址或更常见的内容。
至于如何阻止他们?这确实很棘手,而且往往并不明智。反爬行技术往往不够完美,会给普通人带来问题。
可悲的是,就像零售业的“萎缩”一样,这是在网络上开展业务的成本。
Not really. Most "bad bots" ignore the robots.txt file anyway.
Abuse crawling usually means scraping. These bots are showing up to harvest email addresses or more commonly, content.
As to how you can stop them? That's really tricky and often not wise. Anti-crawl techniques have a tendency to be less than perfect and cause problems for regular humans.
Sadly, like "shrinkage" in retail, it's a cost of doing business on the web.
用户代理(包括爬虫)没有义务尊重您的 robots.txt。您能做的最好的事情就是尝试识别滥用访问模式(通过网络日志等),并阻止相应的 IP。
A user-agent (which includes crawlers) is under no obligation to honour your robots.txt. The best you can do is try to identify abusive access patterns (via web-logs, etc.), and block the corresponding IP.