让搜索机器人不抓取已删除的页面?

发布于 2024-10-31 14:49:55 字数 614 浏览 4 评论 0原文

目前,我们的网站使用 Kentico CMS,我们曾经有一个名为pages/page1.aspx 的页面。我们删除了该页面,但每天 google、bing 和 yahoo sarch 机器人都会尝试阅读该页面。因为该页面不存在,CMS 会抛出以下错误(在日志中)。

Event URL:  /pages/page1.aspx
URL referrer:   
User agent:     Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Message: The file '/pages/page1.aspx' does not exist.
Stack Trace:
at System.Web.UI.Util.CheckVirtualFileExists(VirtualPath virtualPath)
// and the rest of the stacktrace

当我们收到太多此类错误时,整个站点就会崩溃(必须清除 .Net 临时文件并重新启动应用程序池)。基本上我可以转到一个不存在的页面,多次点击刷新并关闭该网站。极其糟糕。但是,首先,我怎样才能让机器人不尝试访问此页面?

提前致谢。

Currently we are using a Kentico CMS for out web site and we used to have a page called pages/page1.aspx. We removed that page but every day the google, bing and yahoo sarch robot tries to read that page. Because the page doesn't exist the CMS throws the following error (in the log)

Event URL:  /pages/page1.aspx
URL referrer:   
User agent:     Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Message: The file '/pages/page1.aspx' does not exist.
Stack Trace:
at System.Web.UI.Util.CheckVirtualFileExists(VirtualPath virtualPath)
// and the rest of the stacktrace

When we get too many of these errors the whole site crashes (have to clear .Net temp files and restart app pool). Basically I can go to a page that doesn't exist, hit refresh many times and take the site down. Extremely bad. However, first thing, how can I get the bots to not try to access this page?

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

时光礼记 2024-11-07 14:49:55

如果只是单个页面或几个页面导致此问题,请修改 robots.txt 以告诉合法的搜索引擎不会检查它。

我还会检查找不到页面时您发送的 HTTP 响应是什么?您发送的内容可能会导致蜘蛛认为它应该继续检查?也许您应该尝试永久重定向到您的主页,而不是 404?

最后,WTF?我会和 Ketnico 的人讨论这个错误。

If it's just a single page, or a few pages that are causing this, modify robots.txt to tell the legitimate search engines not to check it.

I'd also check what HTTP response you're sending when the page is not found? You might be sending something that causes the spider to think it should keep checking? Instead of a 404 maybe you should try permanently redirecting to your home page?

Finally, WTF? I'd talk to the Ketnico folks about this bug.

酒几许 2024-11-07 14:49:55

我认为您有配置错误。虽然 robots.txt 文件有望解决此问题,但机器人可以选择忽略该文件。

更好的解决方案是正确设置错误页面。当您转到不存在的页面时会发生什么?听起来您的系统显示黄色屏幕,这是一个未处理的异常,一直向用户冒泡。我会检查您的错误页面设置,以便用户(和机器人)重定向到 404 错误页面。我猜测当雅虎和其他人看到该 404 页面时,他们将停止尝试对其建立索引。

I think that you have a configuration error. While a robots.txt file would hopefully correct this issue, bots can choose to ignore that file.

A better solution would be to setup your error pages correctly. What happens when you go to a page that doesn't exist? It sounds like your system is showing a yellow screen, which is an unhandled exception bubbling all the way up to the user. I would check your error page setup so that users (and robots) get redirected to a 404 error page. I'm guessing that when Yahoo and others see that 404 page, they will stop trying to index it.

不语却知心 2024-11-07 14:49:55

您是否尝试过使用 robots.txt 文件?

Have you tried using a robots.txt file?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文