robots.txt 如何不允许引擎抓取包含以下内容的网址:“http:
Disallow: /*“http:
是我一直在使用的 - 我的猜测是我可能需要以某种方式转义引号。在 Google 网站管理员工具中,它甚至不会读取引号(它允许您查看 robots.txt 文件并在几个网址上对其进行测试)。
在 Google 网站管理员工具上,它会显示 robots.txt 文件,但该行不带引号。
Disallow: /*http:
任何建议将不胜感激。
主要问题是脚本格式不正确,并且网站存在抓取错误:
http://www.domain.com/“http://www.domain.com/directory/directory/dir_ectory/dir_ectory/pagetitle"
是我们收到抓取错误的页面之一的示例。我的假设是修复 robots.txt 页面将阻止这些页面出现在网站站长工具的抓取错误中。
Disallow: /*“http:
is what I've been using - my guess is I may need to escape the quotation mark somehow. In Google webmaster tools, it's not even reading that quotation mark (where it allows you to see the robots.txt file and test it on a few urls).
On Google Webmaster Tools, it displays the robots.txt file without the quotes for this line.
Disallow: /*http:
Any suggestions would be appreciated.
The main issue is that a script was incorrectly formatted and there are crawl errors to the site:
http://www.domain.com/“http://www.domain.com/directory/directory/dir_ectory/dir_ectory/pagetitle"
Is an example of one of the pages we get a crawl error for. My assumption is fixing the robots.txt page will stop these pages from showing up in our crawl errors in Webmaster Tools.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论