This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
这不是一个与 robots.txt 相关的答案,它与整个机器人协议有关,我过去非常频繁地使用这种技术,它的作用就像一个魅力。
据我了解,您的网站是动态的,那么为什么不使用机器人元标记?正如 x0n 所说,30MB 的文件可能会给您和爬虫带来问题,而且向 30MB 文件附加新行是一个令人头疼的 I/O 问题。
无论如何,在我看来,最好的选择是注入您不想索引的页面,例如:
该页面仍会被抓取,但不会被索引。您仍然可以通过 robots.txt 中的站点地图引用提交站点地图,您不必注意不要包含在使用元标记自动删除的站点地图页面中,并且所有主要搜索引擎都支持它,据我记得也是百度的。
It's not a robots.txt related answer, it's related to the Robots protocol as a whole and I used this technique extremely often in the past, and it works like a charm.
As far as I understand your site is dynamic, so why not make use of the robots meta tag? As x0n said, a 30MB file will likely create issues both for you and the crawlers plus appending new lines to a 30MB files is an I/O headache.
Your best bet, in my opinion anyway, is to inject into the pages you don't want indexed something like:
The page would still be crawled, but it won't be indexed. You can still submit the sitemaps through a sitemap reference in the robots.txt, you don't have to watch out to not include in the sitemaps pages which are robotted out with a meta tag, and it's supported by all the major search engines, as far as I remember by Baidu as well.
您必须为站点地图中的每个元素添加一个
Allow
条目。这很麻烦,但很容易通过在站点地图中读取的内容以编程方式执行某些操作,或者如果站点地图本身以编程方式创建,则可以基于相同的代码。请注意,
Allow
是 robots.txt 协议的扩展,尽管 google 支持,但并非所有搜索引擎都支持。You will have to add an
Allow
entry for each element in the sitemap. This is cumbersome, but it's easy to do something programmatically with something that reads in the sitemap, or if the sitemap is being created progarmmatically itself, then base it on the same code.Note that
Allow
is an extension to the robots.txt protocol, and not supported by all search-engines, though it is supported by google.通过登录 http://www.google.com/webmasters/,您可以直接向 Google 提交站点地图搜索引擎。
By signing into http://www.google.com/webmasters/ you can submit sitemaps directly to google's search engine.