robots.txt 带有 Disallow 和允许的元标记

发布于 2024-10-18 17:40:49 字数 903 浏览 2 评论 0原文

我负责一个具有基本 URL 的网站,例如: https://hello.world.com/my-site/

https://hello.world.com/robots 中有一个 robots.txt 文件.txt 包含以下内容:

User-agent: *
Disallow: /

我无法以任何方式编辑、删除或影响该文件。

但是,我可以将

https://hello.world.com/my-site/ 下所有页面中的标签。 我知道我可以添加,例如:

<meta name="robots" content="index,follow">

我的问题是:Google 和其他搜索引擎是否会更倾向于 https://hello.world.com/my-site/ ,或https://hello.world.com/robots.txt

I am responsible for a site with a base URL such as:
https://hello.world.com/my-site/

There is a robots.txt file in https://hello.world.com/robots.txt with the following content:

User-agent: *
Disallow: /

There is no way for me to edit or delete or affect that file in any way.

I can, however, put <meta> tags in all the pages under https://hello.world.com/my-site/ .
I know I can add, for example:

<meta name="robots" content="index,follow">

My question is: will Google and other search engines give more preference to my meta tag under https://hello.world.com/my-site/ , or to https://hello.world.com/robots.txt?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

多像笑话 2024-10-25 17:40:49

Robots.txt指令是爬行器指令,而元标记是索引器指令。所有索引器指令都需要抓取。因此,如果 robots.txt 设置为禁止,您在元中所做的任何操作都不会产生影响。

来自https://developers.google.com/webmasters/control-crawl-索引/文档/robots_meta_tag

机器人元标记和 X-Robots-Tag HTTP 标头在以下情况下被发现:
URL 已被抓取。如果某个页面被禁止爬行
robots.txt 文件,然后是有关索引或服务的任何信息
将找不到指令,因此将被忽略。如果
必须遵循索引或服务指令,URL 包含
不能禁止抓取这些指令。

(另请参阅:http://moz.com/blog/robots-exclusion-protocol-101 )

Robots.txt directives are crawler directives, while meta tags are indexer directives. All indexer directives require crawling. Therefore, nothing you do in your meta will make a difference if robots.txt is set to disallow.

From https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag

Robots meta tags and X-Robots-Tag HTTP headers are discovered when a
URL is crawled. If a page is disallowed from crawling through the
robots.txt file, then any information about indexing or serving
directives will not be found and will therefore be ignored. If
indexing or serving directives must be followed, the URLs containing
those directives cannot be disallowed from crawling.

(See also: http://moz.com/blog/robots-exclusion-protocol-101)

﹎☆浅夏丿初晴 2024-10-25 17:40:49

注册并登录您的谷歌网站管理员控制台,看看您是否可以覆盖其中的 robots.txt 设置 - 有一个部分,但我不知道它是否允许您覆盖或只是提供提示。

一定要继续尝试更改 robots.txt 文件 - 元标记无法覆盖 robots.txt 文件,因为 robots.txt 文件本质上等同于“crawl”/“nocrawl”之类的消息,而不是“index”/“noindex” - 所以当谷歌发现它无法抓取时,它从不检查是否可以索引,但即使可以,它仍然无法抓取。

register for and login to your google webmaster console to see if you can over-ride robots.txt settings in there - there is a section for it but i don't know if it lets you override or just gives tips.

definitely keep trying to change the robots.txt file - meta tags can not override robots.txt files, because robots.txt files essentially equate to a message like "crawl" / "nocrawl" rather than "index" / "noindex" - so when google sees it can't crawl, it never checks to see if it can index but even if it could it still can't crawl.

茶色山野 2024-10-25 17:40:49

Google 将使用两者,首先使用 robots.txt 来查找他们可以访问的路径。

然后 Google 寻找元,通过元,您可以通过脚本更好地控制将哪些页面放入索引和/或关注中。

我认为你应该同时使用两者。将所有 Google 不应该看到的目录(例如 /js)放在 robots.txt 中,并通过控制器脚本控制元标记,因此您可以设置“noindex,follow”作为示例。您不能对 robots.txt 执行类似“noindex,follow”的操作。

Google will use both, first the robots.txt to look with Path they can access.

And then Google looks for the Meta, with the Meta you can better control from a Script what pages they put in the Index and/or Follow.

I think you should use both. Put all Directories Google should not see like /js in the robots.txt und control the Meta Tag from the Controller Script, so you can set "noindex,follow" as an example. You can't do something like "noindex,follow" with the robots.txt.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文