robots.txt 带有 Disallow 和允许的元标记
我负责一个具有基本 URL 的网站,例如: https://hello.world.com/my-site/
在 https://hello.world.com/robots 中有一个 robots.txt 文件.txt 包含以下内容:
User-agent: *
Disallow: /
我无法以任何方式编辑、删除或影响该文件。
但是,我可以将
https://hello.world.com/my-site/ 下所有页面中的标签。 我知道我可以添加,例如:<meta name="robots" content="index,follow">
我的问题是:Google 和其他搜索引擎是否会更倾向于 https://hello.world.com/my-site/ ,或https://hello.world.com/robots.txt?
I am responsible for a site with a base URL such as:
https://hello.world.com/my-site/
There is a robots.txt file in https://hello.world.com/robots.txt with the following content:
User-agent: *
Disallow: /
There is no way for me to edit or delete or affect that file in any way.
I can, however, put <meta> tags in all the pages under https://hello.world.com/my-site/ .
I know I can add, for example:
<meta name="robots" content="index,follow">
My question is: will Google and other search engines give more preference to my meta tag under https://hello.world.com/my-site/ , or to https://hello.world.com/robots.txt?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Robots.txt指令是爬行器指令,而元标记是索引器指令。所有索引器指令都需要抓取。因此,如果 robots.txt 设置为禁止,您在元中所做的任何操作都不会产生影响。
来自https://developers.google.com/webmasters/control-crawl-索引/文档/robots_meta_tag
(另请参阅:http://moz.com/blog/robots-exclusion-protocol-101 )
Robots.txt directives are crawler directives, while meta tags are indexer directives. All indexer directives require crawling. Therefore, nothing you do in your meta will make a difference if robots.txt is set to disallow.
From https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
(See also: http://moz.com/blog/robots-exclusion-protocol-101)
注册并登录您的谷歌网站管理员控制台,看看您是否可以覆盖其中的 robots.txt 设置 - 有一个部分,但我不知道它是否允许您覆盖或只是提供提示。
一定要继续尝试更改 robots.txt 文件 - 元标记无法覆盖 robots.txt 文件,因为 robots.txt 文件本质上等同于“crawl”/“nocrawl”之类的消息,而不是“index”/“noindex” - 所以当谷歌发现它无法抓取时,它从不检查是否可以索引,但即使可以,它仍然无法抓取。
register for and login to your google webmaster console to see if you can over-ride robots.txt settings in there - there is a section for it but i don't know if it lets you override or just gives tips.
definitely keep trying to change the robots.txt file - meta tags can not override robots.txt files, because robots.txt files essentially equate to a message like "crawl" / "nocrawl" rather than "index" / "noindex" - so when google sees it can't crawl, it never checks to see if it can index but even if it could it still can't crawl.
Google 将使用两者,首先使用 robots.txt 来查找他们可以访问的路径。
然后 Google 寻找元,通过元,您可以通过脚本更好地控制将哪些页面放入索引和/或关注中。
我认为你应该同时使用两者。将所有 Google 不应该看到的目录(例如 /js)放在 robots.txt 中,并通过控制器脚本控制元标记,因此您可以设置“noindex,follow”作为示例。您不能对 robots.txt 执行类似“noindex,follow”的操作。
Google will use both, first the robots.txt to look with Path they can access.
And then Google looks for the Meta, with the Meta you can better control from a Script what pages they put in the Index and/or Follow.
I think you should use both. Put all Directories Google should not see like /js in the robots.txt und control the Meta Tag from the Controller Script, so you can set "noindex,follow" as an example. You can't do something like "noindex,follow" with the robots.txt.