如何用 .htaccess 替换 robots.txt
我有一个小情况,我必须删除我的 robots.txt 文件,因为我不希望机器人爬虫获取链接。
另外我希望用户可以访问它们,并且我不希望它们被搜索引擎缓存。
此外,由于各种原因,我无法添加任何用户身份验证。
所以我正在考虑使用 mod-rewrite 来禁止搜索引擎爬虫抓取它,同时允许所有其他爬虫这样做。
我试图实现的逻辑是编写一个条件来检查传入的用户代理是否是搜索引擎,如果是,则将它们重定向到 401。
唯一的问题是我不知道如何实现它。 :(
有人可以帮我吗。
提前致谢。
问候,
I have a small situation where i have to remove my robots.txt file because i don't want and robots crawlers to get the links.
Also i want them to be accessible by the user and i don't want them to be cached by the search engines.
Also i cannot add any user authentications for various reasons.
So i am thinking about using mod-rewrite to disable search engine crawlers from crawling it while allowing all others to do it.
The logic i am trying to implement is write a condition to check if the incomming user agent is a search engine and if yes then re-direct them to 401.
The only problem is i don't know how to implement it. :(
Can somebody help me with it.
Thanks in advance.
Regards,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我可能理解错了,但我认为
robots.txt 会做你想做的事 - 不让任何爬虫进入,同时保持网站对普通用户开放。
或者您是否需要专门从网络服务器中删除 robots.txt(出于什么原因?)?
I may be understanding you wrong, but I think
in robots.txt will do just what you want - not let any crawler in, while keeping website open for normal users.
Or do you need to specifically remove robots.txt (for what reason?) from the web server?