robots.txt 带有 Disallow 和允许的元标记

发布于 2024-10-18 17:40:49 字数 903 浏览 2 评论 0原文

我负责一个具有基本 URL 的网站，例如： https://hello.world.com/my-site/

在 https://hello.world.com/robots 中有一个 robots.txt 文件.txt 包含以下内容：

User-agent: *
Disallow: /

我无法以任何方式编辑、删除或影响该文件。

但是，我可以将

https://hello.world.com/my-site/ 下所有页面中的标签。我知道我可以添加，例如：

<meta name="robots" content="index,follow">

我的问题是：Google 和其他搜索引擎是否会更倾向于 https://hello.world.com/my-site/ ，或https://hello.world.com/robots.txt？

原文

I am responsible for a site with a base URL such as:
https://hello.world.com/my-site/

There is a robots.txt file in https://hello.world.com/robots.txt with the following content:

User-agent: *
Disallow: /

There is no way for me to edit or delete or affect that file in any way.

I can, however, put <meta> tags in all the pages under https://hello.world.com/my-site/ .
I know I can add, for example:

<meta name="robots" content="index,follow">

My question is: will Google and other search engines give more preference to my meta tag under https://hello.world.com/my-site/ , or to https://hello.world.com/robots.txt?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

多像笑话 2024-10-25 17:40:49

Robots.txt指令是爬行器指令，而元标记是索引器指令。所有索引器指令都需要抓取。因此，如果 robots.txt 设置为禁止，您在元中所做的任何操作都不会产生影响。

来自https://developers.google.com/webmasters/control-crawl-索引/文档/robots_meta_tag

机器人元标记和 X-Robots-Tag HTTP 标头在以下情况下被发现：
URL 已被抓取。如果某个页面被禁止爬行
robots.txt 文件，然后是有关索引或服务的任何信息
将找不到指令，因此将被忽略。如果
必须遵循索引或服务指令，URL 包含
不能禁止抓取这些指令。

（另请参阅：http://moz.com/blog/robots-exclusion-protocol-101 )

回复收藏 0 原文

﹎☆浅夏丿初晴 2024-10-25 17:40:49

注册并登录您的谷歌网站管理员控制台，看看您是否可以覆盖其中的 robots.txt 设置 - 有一个部分，但我不知道它是否允许您覆盖或只是提供提示。

一定要继续尝试更改 robots.txt 文件 - 元标记无法覆盖 robots.txt 文件，因为 robots.txt 文件本质上等同于“crawl”/“nocrawl”之类的消息，而不是“index”/“noindex” - 所以当谷歌发现它无法抓取时，它从不检查是否可以索引，但即使可以，它仍然无法抓取。

回复收藏 0 原文