如何在高度动态的网站上生成站点地图?

发布于 2024-07-26 16:23:35 字数 239 浏览 3 评论 0原文

不断生成新页面的高度动态网站是否应该使用站点地图? 如果是这样,像 stackoverflow.com 这样的网站如何重新生成站点地图? 如果每次有人添加问题时都不断地重新生成站点地图,那么似乎会耗尽宝贵的服务器资源。 它是否按设定的时间间隔(例如每四个小时)生成新的站点地图? 我很好奇大型的动态网站如何实现这一点。

Should a highly dynamic website that is constantly generating new pages use a sitemap? If so, how does a site like stackoverflow.com go about regenerating a sitemap? It seems like it would be a drain on precious server resources if it was constantly regenerating a sitemap every time someone adds a question. Does it generate a new sitemap at set intervals (e.g. every four hours)? I'm very curious how large, dynamic websites make this work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

心作怪 2024-08-02 16:23:35

在 Stackoverflow(以及所有 Stack Exchange 站点)上,会创建一个 sitemap.xml 文件,其中包含系统上发布的每个问题的链接。 当发布新问题时,他们只需将另一个条目附加到站点地图文件的末尾。 添加到文件末尾并不需要大量资源,但文件很大。

这是像谷歌这样的搜索引擎可以有效抓取网站的唯一方法。

Jeff Atwood 在博客文章中对此进行了讨论站点地图的重要性

此内容来自 Google 网站站长帮助页面在站点地图上

在以下情况下,站点地图特别有用:

  • 您的网站包含动态内容。
  • 您网站上的网页在搜索过程中不容易被 Googlebot 发现
    抓取过程 - 例如,页面
    具有丰富的 AJAX 或 Flash。
  • 您的网站是新网站,指向该网站的链接很少。 (Googlebot 通过以下方式抓取网络
    从一页到以下链接
    另一个,所以如果你的网站不好
    联系起来,我们可能很难
    发现它。)
  • 您的网站有大量内容页面,但链接不佳
    彼此相连,或者没有链接
    全部。

On Stackoverflow (and all Stack Exchange sites), a sitemap.xml file is created which contains a link to every question posted on the system. When a new question is posted, they simply append another entry to the end of the sitemap file. It isn't that resource intensive to add to the end of the file but the file is quite large.

That is the only way search engines like Google can effectively crawl the site.

Jeff Atwood talks about it in a blog post: The Importance of Sitemaps

This is from Google's webmaster help page on sitemaps:

Sitemaps are particularly helpful if:

  • Your site has dynamic content.
  • Your site has pages that aren't easily discovered by Googlebot during
    the crawl process - for example, pages
    featuring rich AJAX or Flash.
  • Your site is new and has few links to it. (Googlebot crawls the web by
    following links from one page to
    another, so if your site isn't well
    linked, it may be hard for us to
    discover it.)
  • Your site has a large archive of content pages that are not well linked
    to each other, or are not linked at
    all.
青衫负雪 2024-08-02 16:23:35

每次发布问题时无需重新生成 Google 站点地图 XML。 直接从数据库(和一点缓存)按需生成 XML 文件要简单得多。

为了减少负载,可以将站点地图拆分为许多站点地图。 按日/月进行分区可以让您告诉 Google 经常检索今天的站点地图,但只偶尔获取六个月前的站点地图。

There's no need to regenerate the Google sitemap XML each time a question is posted. It's far simpler just to have the XML file generated on-demand directly from the database (and a little caching).

To reduce load, the sitemap can be split into many sitemaps. Partitioning it by day/month would allow you to tell Google to retrieve today's sitemap frequently, but only fetch the sitemap from six months ago once in a while.

温折酒 2024-08-02 16:23:35

我想在这里分享我的解决方案,以防它也对某人有所帮助。
我阅读了这个问题和许多其他问题后才决定该怎么做。

我的网站结构。

静态页面

  • 首页(高度动态。缓存 30 分钟)
  • 艺术家、专辑、歌曲、播放列表和专辑(分页列表)
  • 法律(包含条款等的静态页面)

...等

动态页面< /strong>

  • 艺术家、专辑、歌曲、播放列表和专辑详细信息页面

我的方法。

sitemap.xml:此网址生成一个 ,第一项为 /sitemap-main.xml艺术家专辑歌曲等的数量被计数并除以1,000(我想要在每个站点地图中的网址数量。限制为50,000 )。 我把这个数字四舍五入。

例如,1900 首歌曲 = 1.9 = 2。
我生成。 将网址 /sitemap-songs-0.xml/sitemap-songs-1.xml 添加到索引。 我对所有其他项目重复这一点。 基本上,我正在分页。

返回的输出未缓存。 我希望这个永远新鲜。


sitemap-main.xml:列出所有静态页面。 实际上,您可以为此使用静态文件,因为您只需要偶尔更新一次。


sitemap-songs-0.xml、sitemap-albums-0.xml 等:我在 SlimPhp 2 中使用单个路由。

$app->get('/sitemap-:type-:page.xml', function ($type, $page) use ($app) {...

我使用一个简单的 switch 语句来生成相关文件。 如果对于此页面,我有 1,000 个项目,即上面指定的限制,我会将文件缓存 2 周。
否则,我只会缓存几个小时。

我想这可以帮助其他人实现他们自己的系统。

I'd like to share my solution here just in case it helps someone as well.
It took me reading this question and many others to decide what to do.

My site structure.

Static pages

  • Home (Highly dynamic. Cached for 30 mins)
  • Artists, Albums, Songs, Playlists and Albums (Paginated List)
  • Legal (Static page with Terms etc)

...etc

Dynamic Pages

  • Artists, Albums, Songs, Playlists and Albums detail pages

My approach.

sitemap.xml: This url generates a <sitemapindex /> with the first item being /sitemap-main.xml. The number of Artists, Albums, Songs etc are counted and divided by 1,000 (number of urls I want in each sitemap. the limit is 50,000). I round this number up.

So for e.g, 1900 songs = 1.9 = 2.
I generate. add the urls /sitemap-songs-0.xml and /sitemap-songs-1.xml to the index. I repeat this for all other items. Basically, I am paginating.

The output is returned uncached. I want this to always be fresh.


sitemap-main.xml: This lists all the static pages. You can actually use a static file for this as you will only need to update it once in a while.


sitemap-songs-0.xml, sitemap-albums-0.xml, etc: I use a single route for this in SlimPhp 2.

$app->get('/sitemap-:type-:page.xml', function ($type, $page) use ($app) {...

I use a simple switch statement to generate the relevant files. If for this page, I got 1,000 items, the limit specified above, I cache the file for 2 Weeks.
Else, I only cache it for a few hours.

I guess this can help anyone else implement their own system.

活雷疯 2024-08-02 16:23:35

对于高度动态的站点,我在我的服务器上编写了一个每天运行的 cron 作业。 它每天都会对我的后端进行休息调用,并根据所有新生成的内容生成新的站点地图,并以 xml 文件的形式返回站点地图。 这个新的站点地图将覆盖之前的站点地图,并根据所有更改更新我的网站。 我认为为每个新添加的动态内容更改站点地图并不是一个好方法

For a highly dynamic site, I wrote a cron job at my server which runs on daily basis. It makes a rest call to my backend every day, and generates a new sitemap according to all newly generated content, and returns the sitemap in the form of an xml file. This new sitemap overrides the previous one and keeps my website updated according to all the changes. Changing sitemap for each newly added dynamic content is not a good approach I think

心如狂蝶 2024-08-02 16:23:35

即使在像 StackOverflow 这样的平台上,也存在一定数量的静态组织; 有常见问题解答、标签页、问题页、用户页、徽章页等; 我想说,在一个非常动态的网站中,处理站点地图的最佳方法是拥有一个分类地图; 站点地图中的每个节点都可以指向动态生成的数据的页面(问题页面的节点、用户页面的节点等)。

当然,站点地图甚至可能不适合给定的站点; 那里需要一定的判断力。

Even on something like StackOverflow, there is a certain amount of static organization; there are FAQs, tag pages, question pages, user pages, badge pages, etc.; I'd say in a very dynamic site, the best way to approach a sitemap would be to have a map of the categorizations; each node in the sitemap can point to a page of the dynamically generated data (a node for a question page, a node for a user page, etc.).

Of course, a sitemap may not even be appropriate for a given site; there's a certain amount of judgment call required there.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文