如何防止爬虫跟踪链接?
我正在建立一个网站,允许卖家:
- 在我的网站上列出他们的产品
- 让每个产品链接回卖家的网站
- 对每个点击的链接收费
我现在需要做的是以某种方式确保我只记录实际的人类用户通过链接进入卖家网站。如果是机器人爬行网站,我就不应该向卖家收费。
有没有办法告诉机器人不要点击某个链接?我不认为它是 nofollow
因为这并不是为了阻止对内容的访问。
I'm building a site that will allow sellers to:
- list their products on my site
- have each product link back to the seller's site
- be charged for each link clicked
What I need to do now is to somehow make sure that I am only logging actual human users following the links to the sellers site. If it's a bot crawling the site, I shouldn't be charging the sellers for that.
Is there a way for me tell bots not to follow a certain link? I don't think it's nofollow
as that is not intended to block access to content.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
告诉机器人不要跟踪链接的方法是将 rel=nofollow 添加到您的 中。标签。
假设您在转发到外部 URL 之前也在本地登录,您还可以检查用户代理字符串。
事实上,如果您要要求人们根据推荐数量付费,那么最好根据每次付费点击记录 IP 地址和用户代理,以防您的统计数据受到质疑。
The way to tell a bot not to follow a link is precisely to add rel=nofollow to your <a> tag.
Assuming you are also logging locally before forwarding to the external url you could also check the user agent string.
In fact, if you are going to ask people to pay based on number of referrals it might be an idea to log IP address and user agent against each paid for click in case your stats are ever questioned.
您只需添加一个 [robots.txt] 文件,例如这个。
您可以在网上找到有关 [robots.txt] 文件的更多信息,例如维基百科。
You just add a [robots.txt] file, e.g. like this one.
You can find more info about [robots.txt] files on the net, e.g. in Wikipedia.
通常,您可以通过用户代理字符串来识别它们。您可以在这里找到一个列表,不能说它是完美的,但它是扩展的良好基础: PHP/MySQL - 机器人的数组过滤器
Robots.txt 是另一种方式,更多相关信息请点击这里
Typicall you can identify them by the user agent string. You can find a list here, can't say it's perferct, but it's a good base to extend: PHP/MySQL - an array filter for bots
Robots.txt is another way, more about it here