阻止允许内容之间的文件夹

发布于 2024-11-06 19:21:33 字数 466 浏览 0 评论 0 原文

我有一个具有以下结构的网站:

http://www.example.com/folder1/folder2/folder3

我想禁止在 folder1folder2 中建立索引。 但我希望机器人能够索引 folder3 下的所有内容。

有没有办法用 robots.txt 做到这一点?

据我所知,我认为指定文件夹内的所有内容都是不允许的。

以下内容可以实现我的目标吗?

user-agent: *
Crawl-delay: 0

Sitemap: <Sitemap url>

Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /

I have a site with the following structure:

http://www.example.com/folder1/folder2/folder3

I would like to disallow indexing in folder1, and folder2.
But I would like the robots to index everything under folder3.

Is there a way to do this with the robots.txt?

For what I read I think that everything inside a specified folder is disallowed.

Would the following achieve my goal?

user-agent: *
Crawl-delay: 0

Sitemap: <Sitemap url>

Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

把昨日还给我 2024-11-13 19:21:33

是的,它有效...但是谷歌有一个工具可以测试您的 robots.txt 文件,

您只需要使用谷歌网站管理员工具(https://www.google.com/webmasters/tools/

并打开“网站配置 -> 抓取工具访问”部分

Yes, it works... however google has a tool to test your robots.txt file

you only need to go on google webmaster tools (https://www.google.com/webmasters/tools/)

and open the section "site configuration -> crawler access"

深海夜未眠 2024-11-13 19:21:33

您所需要的只是:

user-agent: *
Crawl-delay: 0

Sitemap: 

Allow: /folder1/folder2/folder3
Disallow: /folder1/
Allow: /

至少 googlebot 会看到对该目录的更具体允许,并禁止 folder1 及以后的任何内容。这篇帖子支持了这一点谷歌员工。

All you would need is:

user-agent: *
Crawl-delay: 0

Sitemap: 

Allow: /folder1/folder2/folder3
Disallow: /folder1/
Allow: /

At least googlebot will see the more specific allowing of that one directory and disallow anything from folder1 and on. This is backed up by this post by a Google employee.

叫嚣ゝ 2024-11-13 19:21:33

记录中不允许出现换行符,因此您的原始 robots.txt 应如下所示:

user-agent: *
Crawl-delay: 0
Sitemap: <Sitemap url>
Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /

可能的改进:

  • 指定 Allow: / 是多余的,因为无论如何它都是默认值。

  • 指定 Disallow: /folder1/folder2/ 是多余的,因为 Disallow: /folder1/ 就足够了。

  • 由于 Sitemap 不是按记录,但对于所有机器人,您可以将其指定为单独的块。

因此,您的 robots.txt 可能如下所示:(

User-agent: *
Crawl-delay: 0
Allow: /folder1/folder2/folder3
Disallow: /folder1/

Sitemap: http://example.com/sitemap

请注意,Allow 字段不是原始 robots.txt 规范,所以不要指望所有机器人都能理解它。)

Line breaks in records are not allowed, so your original robots.txt should look like this:

user-agent: *
Crawl-delay: 0
Sitemap: <Sitemap url>
Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /

Possible improvements:

  • Specifying Allow: / is superfluous, as it’s the default anyway.

  • Specifying Disallow: /folder1/folder2/ is superfluous, as Disallow: /folder1/ is sufficient.

  • As Sitemap is not per record, but for all bots, you could specify it as a separate block.

So your robots.txt could look like this:

User-agent: *
Crawl-delay: 0
Allow: /folder1/folder2/folder3
Disallow: /folder1/

Sitemap: http://example.com/sitemap

(Note that the Allow field is not part of the original robots.txt specification, so don’t expect all bots to understand it.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文