在 Google App Engine 上创建大型站点地图?

发布于 2024-09-08 10:57:30 字数 185 浏览 3 评论 0 原文

我有一个包含大约 100,000 个独特页面的网站。

(1) 如何为所有这些链接创建站点地图?我应该将它们平铺在一个大型站点地图协议兼容文件中吗?

(2) 需要在 Google App Engine 上实现此功能,其中有 1000 个项目查询限制,并且我所有的个人站点 URL 都存储为单独的条目。我该如何解决这个问题?

I have a site with around 100,000 unique pages.

(1) How do I create a Sitemap for all these links? Should I just list them flat in a large sitemap protocol compatible file?

(2) Need to implement this on Google App Engine where there is a 1000 item query limit, and all my individual site URLs are stored as separate entries. How do I solve this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

没有伤那来痛 2024-09-15 10:57:30

站点地图不得大于 10MB,列出的 URL 不得超过 50,000 个,因此您需要以某种方式将其分解。

您将需要某种分片策略。我不知道你的数据是什么样的,所以现在假设每次你创建一个页面实体时,你都会为其分配一个 1 到 500 之间的随机整数

。 google.com/support/webmasters/bin/answer.py?hl=zh-CN&answer=71453" rel="noreferrer">站点地图索引,并为每个索引值生成一个站点地图链接:

<?xml version="1.0" encoding="UTF-8"?>
   <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://example.appspot.com/sitemap?random=1</loc>
   </sitemap>
   <sitemap>
      <loc>http://example.appspot.com/sitemap?random=2</loc>
   </sitemap>
   ...
   <sitemap>
      <loc>http://example.appspot.com/sitemap?random=500</loc>
   </sitemap>
</sitemapindex>

最后,在站点地图页面上,查询页面并过滤随机索引。如果您有 100,000 个页面,则每个站点地图将提供大约 200 个 URL。

这里稍微不同的策略是为每个页面提供一个自动递增的数字 ID。为此,您需要一个计数器对象,该对象以事务方式锁定并在每次创建新页面时递增。这样做的缺点是您无法并行创建新页面实体。好处是您可以更好地控制页面的布局,因为您的第一个站点地图可能是第 1-1000 页,依此类推。

Site Maps must be no larger than 10MB and list no more than 50,000 URLs, so you're going to need to break it up somehow.

You're going to need some kind of sharding strategy. I don't know what your data looks like, so for now let's say every time you create a page entity, you assign it a random integer between 1 and 500.

Next, create a Sitemap index, and spit out a sitemap link for each of your index values:

<?xml version="1.0" encoding="UTF-8"?>
   <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://example.appspot.com/sitemap?random=1</loc>
   </sitemap>
   <sitemap>
      <loc>http://example.appspot.com/sitemap?random=2</loc>
   </sitemap>
   ...
   <sitemap>
      <loc>http://example.appspot.com/sitemap?random=500</loc>
   </sitemap>
</sitemapindex>

Finally, on your sitemap page, query for pages and filter for your random index. If you have 100,000 pages this will give you about 200 URLs per sitemap.

A slightly different strategy here would be to give each page an auto-incrementing numeric ID. To do so, you need a counter object that is transactionally locked and incremented each time a new page is created. The downside of this is that you can't parallelize creation of new page entities. The upside is that you would have a bit more control over how your pages are laid out, as your first sitemap could be pages 1-1000, and so on.

梦忆晨望 2024-09-15 10:57:30

您可以使用查询游标来规避 1000 查询物品限制;不过,即使使用游标也可能无法完全解决您的问题,因为生成包含 100,000 个项目的站点地图很容易超出允许运行单个请求的时间。此外,动态生成站点地图可能会轻松耗尽全部或大量资源配额。

如果您的数据不是很动态,我会考虑生成静态站点地图文件并将其作为部署包的一部分。即使您的数据非常动态,您也可能希望采用每天仅重新生成一次的策略,并进行部署以将其放在服务器上。

You can use Query Cursors to circumvent the 1000 query item limit; although, even using cursors probably won't entirely solve your problem, as generating a sitemap with 100,000 items in it could easily exceed the amount of time that a single request is allowed to run. Also, generating the sitemap dynamically could easily use up all or a large amount of your resource quota.

If your data is not very dynamic, I would consider generating a static sitemap file and including it as part of your deployment package. Even if your data is very dynamic, you probably want to adopt a strategy of regenerating it only once per day and doing a deployment to put it up on the server.

梦醒灬来后我 2024-09-15 10:57:30

我遇到了类似的问题,但为了重新发明轮子,我只是插入了 Google Sitemap Generator http://sitemap-generators.googlecode.com/svn/trunk/docs/en/sitemap-generator.html 。它对我有用,因为我的应用程序是基于 python 的。

I had a similar issue but instead to reinvent the wheel I just plugged-in the Google Sitemap Generator http://sitemap-generators.googlecode.com/svn/trunk/docs/en/sitemap-generator.html . It worked for me as my app is python based.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文