如何停止从我的 php 页面抓取链接
我有一个主页,其中包含一些链接和邮件 ID,我需要停止从该网页中抓取我的网址和邮件 ID... 我使用过 robots.txt 但大多数坏爬虫不会尊重这一点......
i have a home page with some links and mail ids i need to stop scraping my urls and mail-ids form that web page...
i have used robots.txt but most of the bad crawlers wont respect that....
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好吧,你总是可以尝试用 JavaScript 或图像或其他东西来混淆你的 URL。但请不要这样做。你只会激怒使用旧浏览器的人和使用屏幕阅读器的盲人。只需使用垃圾邮件过滤器来阻止人们向您的电子邮件地址发送垃圾邮件。
如果您有一个内容丰富的网站,并且您想阻止人们抓取您的内容,您可以尝试将访问者限制为每十秒十次点击。这对于大多数访问者来说已经足够了,但它会显着降低内容抓取的速度。您可以随时调整此算法,并禁止严重违规者的 IP。
Well, you can always try obfuscating your URLs with javascript or images or something. But please don't do that. You'll just anger people with old browsers and blind people who use screen readers. Just use a spam filter to stop people spamming your e-mail address.
If you have a content-heavy site and you want to stop people from scraping your content, you might try limiting visitors to ten hits every ten seconds. That'll be enough for most visitors, but it'll significantly decrease the speed of content scrapers. You can tweak this algorithm as you go, and ban the IPs of serious offenders.
您可以编码一些链接,例如
foo@bar.com
而不是[电子邮件受保护]
。You could encode some links, e.g.
foo@bar.com
instead of[email protected]
.使用对真实用户隐藏的蜜罐链接。禁止 robots.txt 中的网址并在其上添加 nofollow,以便受人尊敬的引擎永远不会命中它。页面加载时使用 JavaScript 隐藏链接,这样合法用户就不会点击它。暂时阻止点击该链接的任何人的 IP 或会话。
Use a honeypot link that is hidden from real users. Disallow the url in robots.txt and add a nofollow on it so that respectable engines won't ever hit it. Hide the link with javascript when the page loads so legit users will not click it. Temporarily block the IP or session of anyone that hits the link.