robots.txt 如何禁止除站点地图中的 URL 之外的所有 URL

发布于 2024-09-25 23:48:56 字数 1559 浏览 5 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我也只是我 2024-10-02 23:48:56

这不是一个与 robots.txt 相关的答案,它与整个机器人协议有关,我过去非常频繁地使用这种技术,它的作用就像一个魅力。

据我了解,您的网站是动态的,那么为什么不使用机器人元标记?正如 x0n 所说,30MB 的文件可能会给您和爬虫带来问题,而且向 30MB 文件附加新行是一个令人头疼的 I/O 问题。
无论如何,在我看来,最好的选择是注入您不想索引的页面,例如:

<META NAME="ROBOTS" CONTENT="NOINDEX" />

该页面仍会被抓取,但不会被索引。您仍然可以通过 robots.txt 中的站点地图引用提交站点地图,您不必注意不要包含在使用元标记自动删除的站点地图页面中,并且所有主要搜索引擎都支持它,据我记得也是百度的。

It's not a robots.txt related answer, it's related to the Robots protocol as a whole and I used this technique extremely often in the past, and it works like a charm.

As far as I understand your site is dynamic, so why not make use of the robots meta tag? As x0n said, a 30MB file will likely create issues both for you and the crawlers plus appending new lines to a 30MB files is an I/O headache.
Your best bet, in my opinion anyway, is to inject into the pages you don't want indexed something like:

<META NAME="ROBOTS" CONTENT="NOINDEX" />

The page would still be crawled, but it won't be indexed. You can still submit the sitemaps through a sitemap reference in the robots.txt, you don't have to watch out to not include in the sitemaps pages which are robotted out with a meta tag, and it's supported by all the major search engines, as far as I remember by Baidu as well.

反差帅 2024-10-02 23:48:56

您必须为站点地图中的每个元素添加一个Allow 条目。这很麻烦,但很容易通过在站点地图中读取的内容以编程方式执行某些操作,或者如果站点地图本身以编程方式创建,则可以基于相同的代码。

请注意,Allow 是 robots.txt 协议的扩展,尽管 google 支持,但并非所有搜索引擎都支持。

You will have to add an Allow entry for each element in the sitemap. This is cumbersome, but it's easy to do something programmatically with something that reads in the sitemap, or if the sitemap is being created progarmmatically itself, then base it on the same code.

Note that Allow is an extension to the robots.txt protocol, and not supported by all search-engines, though it is supported by google.

星軌x 2024-10-02 23:48:56

通过登录 http://www.google.com/webmasters/,您可以直接向 Google 提交站点地图搜索引擎。

By signing into http://www.google.com/webmasters/ you can submit sitemaps directly to google's search engine.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文