更新 sitemap.xml 有哪些好处?

发布于 2024-08-03 04:50:06 字数 362 浏览 1 评论 0原文

以下文字来自 sitemaps.org。与爬虫完成工作相比,这样做有什么好处?

站点地图是一种简单的方法 网站管理员通知搜索引擎 关于他们网站上的页面 可用于爬行。在其 最简单的形式,站点地图是 XML 列出站点 URL 的文件 以及有关每个的附加元数据 URL(上次更新时间,如何 通常它通常会改变,以及如何改变 相对于其他方面来说,这一点很重要 网站中的 URL)以便搜索 引擎可以更智能地爬行 该网站。

编辑 1:我希望获得足够的好处,以便我可以证明该功能的开发是合理的。目前我们的系统不动态提供站点地图,因此我们必须使用爬虫创建一个站点地图,这不是一个很好的过程。

The text below is from sitemaps.org. What are the benefits to do that versus the crawler doing their job?

Sitemaps are an easy way for
webmasters to inform search engines
about pages on their sites that are
available for crawling. In its
simplest form, a Sitemap is an XML
file that lists URLs for a site along
with additional metadata about each
URL (when it was last updated, how
often it usually changes, and how
important it is, relative to other
URLs in the site) so that search
engines can more intelligently crawl
the site.

Edit 1: I am hoping to get enough benefits so I canjustify the development of that feature. At this moment our system does not provide sitemaps dynamically, so we have to create one with a crawler which is not a very good process.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

南巷近海 2024-08-10 04:50:06

爬虫也很“懒”,因此如果您向它们提供包含所有站点 URL 的站点地图,它们更有可能为您站点上的更多页面建立索引。

它们还使您能够对页面进行优先级排序,以便爬虫知道它们更改的频率,哪些内容对于保持更新更重要等等,这样它们就不会浪费时间抓取未更改的页面,错过那些已更改的页面,或索引您不太关心的页面(以及丢失您关心的页面)。

还有很多自动化 在线工具,可用于抓取整个网站并生成站点地图。如果您的网站不太大(少于几千个网址),那么这些效果会很好。

Crawlers are "lazy" too, so if you give them a sitemap with all your site URLs in it, they are more likely to index more pages on your site.

They also give you the ability to prioritize your pages so the crawlers know how frequently they change, which ones are more important to keep updated, etc. so they don't waste their time crawling pages that haven't changed, missing ones that do, or indexing pages you don't care much about (and missing pages that you do).

There are also lots of automated tools online that you can use to crawl your entire site and generate a sitemap. If your site isn't too big (less than a few thousand urls) those will work great.

彼岸花似海 2024-08-10 04:50:06

好吧,正如那段所说,站点地图还提供有关给定 URL 的元数据,而爬虫可能无法纯粹通过爬行来推断这些数据。站点地图充当爬网程序的目录,以便它可以确定内容的优先级并为重要内容建立索引。

Well, like that paragraph says sitemaps also provide meta data about a given url that a crawler may not be able to extrapolate purely by crawling. The sitemap acts as table of contents for the crawler so that it can prioritize content and index what matters.

素衣风尘叹 2024-08-10 04:50:06

站点地图有助于告诉爬虫哪些页面更重要,以及它们预计更新的频率。这些信息确实无法仅通过扫描页面本身找到。

爬虫对您网站的扫描页面数量以及它们跟踪链接的深度有限制。如果您有很多不太相关的页面、同一页面有很多不同的 URL,或者需要很多步骤才能到达的页面,则爬虫程序将在到达最有趣的页面之前停止。站点地图提供了另一种方法来轻松找到最有趣的页面,而无需跟踪链接并筛选重复项。

The sitemap helps telling the crawler which pages are more important, and also how often they can be expected to be updated. This is information that really can't be found out by just scanning the pages themselves.

Crawlers have a limit to how many pages the scan of your site, and how many levels deep they follow links. If you have a lot of less relevant pages, a lot of different URLs to the same page, or pages that need many steps to get to, the crawler will stop before it comes to the most interresting pages. The site map offers an alternative way to easily find the most interresting pages, without having to follow links and sorting out duplicates.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文