如何在高度动态的网站上生成站点地图?
不断生成新页面的高度动态网站是否应该使用站点地图? 如果是这样,像 stackoverflow.com 这样的网站如何重新生成站点地图? 如果每次有人添加问题时都不断地重新生成站点地图,那么似乎会耗尽宝贵的服务器资源。 它是否按设定的时间间隔(例如每四个小时)生成新的站点地图? 我很好奇大型的动态网站如何实现这一点。
Should a highly dynamic website that is constantly generating new pages use a sitemap? If so, how does a site like stackoverflow.com go about regenerating a sitemap? It seems like it would be a drain on precious server resources if it was constantly regenerating a sitemap every time someone adds a question. Does it generate a new sitemap at set intervals (e.g. every four hours)? I'm very curious how large, dynamic websites make this work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在 Stackoverflow(以及所有 Stack Exchange 站点)上,会创建一个 sitemap.xml 文件,其中包含系统上发布的每个问题的链接。 当发布新问题时,他们只需将另一个条目附加到站点地图文件的末尾。 添加到文件末尾并不需要大量资源,但文件很大。
这是像谷歌这样的搜索引擎可以有效抓取网站的唯一方法。
Jeff Atwood 在博客文章中对此进行了讨论:站点地图的重要性
此内容来自 Google 网站站长帮助页面在站点地图上:
On Stackoverflow (and all Stack Exchange sites), a sitemap.xml file is created which contains a link to every question posted on the system. When a new question is posted, they simply append another entry to the end of the sitemap file. It isn't that resource intensive to add to the end of the file but the file is quite large.
That is the only way search engines like Google can effectively crawl the site.
Jeff Atwood talks about it in a blog post: The Importance of Sitemaps
This is from Google's webmaster help page on sitemaps:
每次发布问题时无需重新生成 Google 站点地图 XML。 直接从数据库(和一点缓存)按需生成 XML 文件要简单得多。
为了减少负载,可以将站点地图拆分为许多站点地图。 按日/月进行分区可以让您告诉 Google 经常检索今天的站点地图,但只偶尔获取六个月前的站点地图。
There's no need to regenerate the Google sitemap XML each time a question is posted. It's far simpler just to have the XML file generated on-demand directly from the database (and a little caching).
To reduce load, the sitemap can be split into many sitemaps. Partitioning it by day/month would allow you to tell Google to retrieve today's sitemap frequently, but only fetch the sitemap from six months ago once in a while.
我想在这里分享我的解决方案,以防它也对某人有所帮助。
我阅读了这个问题和许多其他问题后才决定该怎么做。
我的网站结构。
静态页面
...等
动态页面< /strong>
我的方法。
sitemap.xml:此网址生成一个
,第一项为/sitemap-main.xml
。艺术家
、专辑
、歌曲
等的数量被计数并除以1,000(我想要在每个站点地图中的网址数量。限制为50,000 )。 我把这个数字四舍五入。例如,1900 首歌曲 = 1.9 = 2。
我生成。 将网址
/sitemap-songs-0.xml
和/sitemap-songs-1.xml
添加到索引。 我对所有其他项目重复这一点。 基本上,我正在分页。返回的输出未缓存。 我希望这个永远新鲜。
sitemap-main.xml:列出所有静态页面。 实际上,您可以为此使用静态文件,因为您只需要偶尔更新一次。
sitemap-songs-0.xml、sitemap-albums-0.xml 等:我在 SlimPhp 2 中使用单个路由。
我使用一个简单的 switch 语句来生成相关文件。 如果对于此页面,我有 1,000 个项目,即上面指定的限制,我会将文件缓存 2 周。
否则,我只会缓存几个小时。
我想这可以帮助其他人实现他们自己的系统。
I'd like to share my solution here just in case it helps someone as well.
It took me reading this question and many others to decide what to do.
My site structure.
Static pages
...etc
Dynamic Pages
My approach.
sitemap.xml: This url generates a
<sitemapindex />
with the first item being/sitemap-main.xml
. The number ofArtists
,Albums
,Songs
etc are counted and divided by 1,000 (number of urls I want in each sitemap. the limit is 50,000). I round this number up.So for e.g, 1900 songs = 1.9 = 2.
I generate. add the urls
/sitemap-songs-0.xml
and/sitemap-songs-1.xml
to the index. I repeat this for all other items. Basically, I am paginating.The output is returned uncached. I want this to always be fresh.
sitemap-main.xml: This lists all the static pages. You can actually use a static file for this as you will only need to update it once in a while.
sitemap-songs-0.xml, sitemap-albums-0.xml, etc: I use a single route for this in SlimPhp 2.
I use a simple switch statement to generate the relevant files. If for this page, I got 1,000 items, the limit specified above, I cache the file for 2 Weeks.
Else, I only cache it for a few hours.
I guess this can help anyone else implement their own system.
对于高度动态的站点,我在我的服务器上编写了一个每天运行的 cron 作业。 它每天都会对我的后端进行休息调用,并根据所有新生成的内容生成新的站点地图,并以 xml 文件的形式返回站点地图。 这个新的站点地图将覆盖之前的站点地图,并根据所有更改更新我的网站。 我认为为每个新添加的动态内容更改站点地图并不是一个好方法
For a highly dynamic site, I wrote a cron job at my server which runs on daily basis. It makes a rest call to my backend every day, and generates a new sitemap according to all newly generated content, and returns the sitemap in the form of an xml file. This new sitemap overrides the previous one and keeps my website updated according to all the changes. Changing sitemap for each newly added dynamic content is not a good approach I think
即使在像 StackOverflow 这样的平台上,也存在一定数量的静态组织; 有常见问题解答、标签页、问题页、用户页、徽章页等; 我想说,在一个非常动态的网站中,处理站点地图的最佳方法是拥有一个分类地图; 站点地图中的每个节点都可以指向动态生成的数据的页面(问题页面的节点、用户页面的节点等)。
当然,站点地图甚至可能不适合给定的站点; 那里需要一定的判断力。
Even on something like StackOverflow, there is a certain amount of static organization; there are FAQs, tag pages, question pages, user pages, badge pages, etc.; I'd say in a very dynamic site, the best way to approach a sitemap would be to have a map of the categorizations; each node in the sitemap can point to a page of the dynamically generated data (a node for a question page, a node for a user page, etc.).
Of course, a sitemap may not even be appropriate for a given site; there's a certain amount of judgment call required there.