robots.txt配置
我对这个 robots 文件有一些疑问。
User-agent: *
Disallow: /administrator/
Disallow: /css/
Disallow: /func/
Disallow: /images/
Disallow: /inc/
Disallow: /js/
Disallow: /login/
Disallow: /recover/
Disallow: /Scripts/
Disallow: /store/com-handler/
Disallow: /store/img/
Disallow: /store/theme/
Disallow: /store/StoreSys.swf
Disallow: config.php
这将禁用每个文件夹内所有文件的爬网程序,对吗? 或者我必须在每个文件夹名称末尾添加一个星号?
我认为这应该可以做到。但我不确定是否必须在 User-agent
之后添加 Allow: /
我认为不需要。
这个 robots 文件有什么问题吗?
PS:如果有人可以推荐一个供本地使用的验证应用程序,我会很高兴。
谢谢。
I have a few doubts about this robots file.
User-agent: *
Disallow: /administrator/
Disallow: /css/
Disallow: /func/
Disallow: /images/
Disallow: /inc/
Disallow: /js/
Disallow: /login/
Disallow: /recover/
Disallow: /Scripts/
Disallow: /store/com-handler/
Disallow: /store/img/
Disallow: /store/theme/
Disallow: /store/StoreSys.swf
Disallow: config.php
This is going to disable crawlers for all files inside each folder right?
Or i have to add a asterisk at the end of each folder name?
I think this should do it. But i'm not sure if have to add Allow: /
right after User-agent
i suppose it isn't needed.
Anything wrong in this robots file?
PS: If someone can suggest a validation app for local use, i would be glad.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果我明白你想要什么,那就很好。例如,
两者都被阻止,但
被允许。请注意,“允许”是一个较少受支持的扩展,仅设计用于对抗先前的“禁止”。例如,如果您
决定允许使用特定图像, 则可以使用它。因此,
所有其他图像仍然被阻止。您可以看到 http://www.searchtools.com/robots/robots-txt.html 了解更多信息,包括跳棋列表。
It's fine as is, if I understand what you want. E.g.
are both blocked, but
is allowed. Note that Allow is a less supported extension designed only to counter a previous Disallow. You might use it if, for instance, despite your
you decide you want a particular image allowed. So,
All other images remain blocked. You can see http://www.searchtools.com/robots/robots-txt.html for more info, including a list of checkers.