如何验证站点地图生成的索引是否返回 200 代码?
我已经为 Google 生成了 Sitemap 索引。我遇到的唯一问题是如何验证生成的所有索引(URL)是否有效。根据指南,它是这样说的:
您编写一个脚本来针对您的应用程序测试站点地图中的每个 URL 服务器并确认每个链接返回 HTTP 200 (OK) 代码。损坏的链接可能表示不匹配 站点地图生成器的 URL 格式配置之间的关系
我只是想看看是否有人有关于如何编写此类脚本的经验?
I have generated the Sitemap indexes for Google. The only issue which I have is that how to verify that all the indexes(URL's) which got generated work or not. Based on the guide it says something like this:
you write a script to test each URL in the sitemap against your application
server and confirm that each link returns an HTTP 200 (OK) code. Broken links may indicate a mismatch
between the URL formatting configuration of the Sitemap Generator
I just wanted to see if somebody had such experience on how to write such script?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
谷歌网站管理员工具将在“站点配置 -> 站点地图”中向您报告任何 HTTP 错误和重定向(几乎所有不是 HTTP 200 的内容),另外在“诊断 -> 抓取错误 -> 站点地图”中是另一个查看在抓取站点地图中列出的网址时发生的错误。
如果这不是你想要的,我只会做一些日志文件 grep-ing。 (grep 表示“googlebot”和您在站点地图中列出的网址的标识符)
您可以编写自己的爬虫程序来预先检查您的网站是否返回 HTTP 200,但是,如果它现在为您返回 HTTP 200 ,并不意味着它将在下周/下月/下年为 googlebot 返回 HTTP 200。所以我建议坚持使用谷歌网站管理员工具和日志文件分析(用IE可视化:munin,cacti,...)
google webmaster tools will report you within "site configuration -> sitemaps" any HTTP errors and redirects (pretty much everything that is not an HTTP 200), additionally in the "Diagnostics -> Crawl Errors -> in Sitemaps" is another view of errors that occured while crawling urls that were listed within the sitemaps.
if that is not what you want, i would just do some logfile grep-ing. (grep for "googlebot" and an identifier of the urls that you listed within your sitemaps)
you could propably write your own crawler to pre-check if your sites return an HTTP 200, but well, if it returns an HTTP 200 for you now, does not mean it will return an HTTP 200 for googlebot next week / month / year. so i recommend to stick with google webmaster tools and logfile analysis (visualized with i.e.: munin, cacti, ...)
您是如何创建站点地图的?我认为大多数站点地图工具只会包含响应“200 OK”的 URL。
请注意,有些网站会出现混乱,并且总是响应 200,而不是对无效 URL 响应 404。这样的网站有麻烦:)
How did you create the sitemap? I would think most sitemap tools would only include URLs that responded with "200 OK"
Do note that some websites mess up and always respond with response 200 instead of e.g. 404 for invalid URLs. Such websites have trouble ahead :)