robots.txt 如何不允许引擎抓取包含以下内容的网址：“http:

发布于 2024-09-18 18:28:55 字数 491 浏览 5 评论 0原文

Disallow: /*“http:

是我一直在使用的 - 我的猜测是我可能需要以某种方式转义引号。在 Google 网站管理员工具中，它甚至不会读取引号（它允许您查看 robots.txt 文件并在几个网址上对其进行测试）。

在 Google 网站管理员工具上，它会显示 robots.txt 文件，但该行不带引号。

Disallow: /*http:

任何建议将不胜感激。

主要问题是脚本格式不正确，并且网站存在抓取错误：

http://www.domain.com/“http://www.domain.com/directory/directory/dir_ectory/dir_ectory/pagetitle"

是我们收到抓取错误的页面之一的示例。我的假设是修复 robots.txt 页面将阻止这些页面出现在网站站长工具的抓取错误中。

原文

Disallow: /*“http:

is what I've been using - my guess is I may need to escape the quotation mark somehow. In Google webmaster tools, it's not even reading that quotation mark (where it allows you to see the robots.txt file and test it on a few urls).

On Google Webmaster Tools, it displays the robots.txt file without the quotes for this line.

Disallow: /*http:

Any suggestions would be appreciated.

The main issue is that a script was incorrectly formatted and there are crawl errors to the site:

http://www.domain.com/“http://www.domain.com/directory/directory/dir_ectory/dir_ectory/pagetitle"

Is an example of one of the pages we get a crawl error for. My assumption is fixing the robots.txt page will stop these pages from showing up in our crawl errors in Webmaster Tools.

分享到QQ

分享到微博