在 Google 中仍能找到非索引文件 (?)
为什么我的页面 /admin/login.asp 在 Google 中通过查询“inurl:admin/login.asp”找到,而没有通过“site:www.domain.xx”查询找到?
我的 robots.txt 中有这行代码:
User-agent: *
Disallow: /admin/
页面的 HTML 代码中有这样的代码:
<meta name="robots" content="noindex, nofollow" />
有什么想法吗?
How is it possible that my page /admin/login.asp is found in Google with the query "inurl:admin/login.asp" while it isn't with the "site:www.domain.xx" query?
I've this line of code in my robots.txt:
User-agent: *
Disallow: /admin/
And this in the HTML code of the page:
<meta name="robots" content="noindex, nofollow" />
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以检查 Google 网站管理员 Google 是否正确解释 robots.txt。您还可以请求从那里的索引中删除 URL。
You can check on Google Webmaster if the robots.txt is interpreted correctly by Google. You can also request the removal of a URL from the index there.
当您在 Google 搜索结果页 (SERP) 中找到该 URL 时,它的标题是否与您的标签中的标题相同?它还有描述/片段吗?
我认为正在发生的事情是 Google 通过您网站上的链接了解 URL,因此它会尝试对其进行抓取和索引。但是,由于它被 robots.txt 阻止,因此不允许抓取该页面,因此它无法看到登录页面上的 noindex 元标记。
由于 Google 不知道不应为该页面建立索引,因此会将 URL 添加到其索引中。然而,像这样的页面在 SERP 中往往只有标题和 URL,而且几乎总是没有描述/片段。有时,SERP 中的标题看起来像是他们已经抓取了页面,但他们实际上所做的是尝试根据指向它的链接的锚文本生成标题。
让页面不显示在 SERP 中的可靠方法是删除
Disallow: /admin/
命令,并允许 Googlebot 抓取页面并查看 noindex,nofollow 元标记。noindex命令将从SERP中删除该页面,nofollow将帮助通知Googlebot不要优先考虑它在您的登录页面上找到的链接(这将有助于保持您的抓取效率,但并不能保证Google不会抓取它在页面上找到的链接)。
When you find the URL in the Google search result page (SERP), does it have the same title as found in your tag? And does it also have a description / snippet?
What I think is happening is that Google knows about the URL from a link on your site, so it'll attempt to crawl and index it. However, since it's blocked by robots.txt, it's not allowed to crawl the page, hence it can't see the noindex meta tag that's on your login page.
Since it doesn't know that it shouldn't index the page, Google will add the URL to it's index. However, pages like this tend to only have a title and URL in the SERP, and they almost always don't have a description/snippet. Sometimes the title in the SERP looks like they've crawled the page, but what they're actually doing is trying to generate a title based on the anchor text of the links that are pointing at it.
The sure fire way of having the page not show up in the SERP is to remove the
Disallow: /admin/
command, and allow Googlebot to crawl the page and see the noindex,nofollow meta tag.The noindex command will remove the page from the SERPs, and the nofollow will help inform Googlebot not to give priority to the links that it finds on your login page (this will help maintain your crawl efficiency, but does not guarantee Google won't crawl the links it finds on the page).