在 Django 中缓存站点地图

发布于 2024-08-18 07:05:52 字数 1417 浏览 7 评论 0 原文

我使用 Django 的默认站点地图应用程序实现了一个简单的站点地图类。由于执行时间较长，我添加了手动缓存：

class ShortReviewsSitemap(Sitemap):
    changefreq = "hourly"
    priority = 0.7

    def items(self):
        # Try to retrieve from cache
        result = get_cache(CACHE_SITEMAP_SHORT_REVIEWS, "sitemap_short_reviews")
        if result!=None:
            return result

        result = ShortReview.objects.all().order_by("-created_at")

        # Store in cache
        set_cache(CACHE_SITEMAP_SHORT_REVIEWS, "sitemap_short_reviews", result)

        return result

    def lastmod(self, obj):
        return obj.updated_at

问题是 Memcached最多只允许 1MB 的对象。这个文件大于 1 MB，因此将其存储到缓存中失败：

>7 SERVER_ERROR object too large for cache

问题是 Django 有一种自动方式来决定何时应将站点地图文件划分为较小的文件。根据文档：

如果有的话，您应该创建一个索引文件您的站点地图超过 50,000 个网址。在这种情况下，Django 将自动对站点地图进行分页，索引将反映这一点。

您认为启用缓存站点地图的最佳方式是什么？

侵入 Django 站点地图框架以将单个站点地图大小限制为（比方说）10,000 条记录似乎是最好的主意。为什么首先选择 50,000 人？谷歌建议？随机数？
或者也许有一种方法可以让 Memcached 存储更大的文件？
或者也许一旦保存，站点地图就应该作为静态文件提供？这意味着我必须手动将结果存储在文件系统中，并在下次请求站点地图时从那里检索结果（可能每天在 cron 作业中清理目录），而不是使用 Memcached 进行缓存。

所有这些看起来都非常低水平，我想知道是否存在明显的解决方案......

原文

I implemented a simple sitemap class using Django's default sitemap application. As it was taking a long time to execute, I added manual caching:

class ShortReviewsSitemap(Sitemap):
    changefreq = "hourly"
    priority = 0.7

    def items(self):
        # Try to retrieve from cache
        result = get_cache(CACHE_SITEMAP_SHORT_REVIEWS, "sitemap_short_reviews")
        if result!=None:
            return result

        result = ShortReview.objects.all().order_by("-created_at")

        # Store in cache
        set_cache(CACHE_SITEMAP_SHORT_REVIEWS, "sitemap_short_reviews", result)

        return result

    def lastmod(self, obj):
        return obj.updated_at

The problem is that Memcached allows only maximum a 1 MB object. This one was bigger that 1 MB, so storing it into the cache failed:

>7 SERVER_ERROR object too large for cache

The problem is that Django has an automated way of deciding when it should divide the sitemap file into smaller ones. According to the documentation:

You should create an index file if one
of your sitemaps has more than 50,000
URLs. In this case, Django will
automatically paginate the sitemap,
and the index will reflect that.

What do you think would be the best way to enable caching sitemaps?

Hacking into Django sitemaps framework to restrict a single sitemap size to, let's say, 10,000 records seems like the best idea. Why was 50,000 chosen in the first place? Google advice? Random number?
Or maybe there is a way to allow Memcached to store bigger files?
Or perhaps once saved, the sitemaps should be made available as static files? This would mean that instead of caching with Memcached I'd have to manually store the results in the filesystem and retrieve them from there next time when the sitemap is requested (perhaps cleaning the directory daily in a cron job).

All those seem very low level and I'm wondering if an obvious solution exists...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

杀お生予夺 2024-08-25 07:05:52

50k 不是硬编码参数。

您可以使用类 django.contrib.sitemaps.GenericSitemap 代替：

class LimitGenericSitemap(GenericSitemap):
    limit = 2000

50k is not a hard coded parameter.

You can use class django.contrib.sitemaps.GenericSitemap instead:

class LimitGenericSitemap(GenericSitemap):
    limit = 2000

回复收藏 0 原文

℉絮湮 2024-08-25 07:05:52

您还可以以 gzip 格式提供站点地图，这使得它们小得多。 XML 非常适合 gzip 压缩。我有时会做的事情：在 cronjob 中创建 gzip 压缩的站点地图文件，并根据需要经常渲染它们。通常，每天一次就足够了。其代码可能如下所示。只需确保从您的域根目录提供 sitemap.xml.gz 即可：

    from django.contrib.sitemaps import GenericSitemap
    from django.contrib.sitemaps.views import sitemap
    from django.utils.encoding import smart_str
    import gzip
    sitemaps = {
        'page': GenericSitemap({'queryset': MyModel.objects.all().order_by('-created'), 'date_field': 'created'}),
    }
    f = gzip.open(settings.STATIC_ROOT+'/sitemap.xml.gz', 'wb')
    f.write(smart_str(sitemap(request, sitemaps=sitemaps).render().content))
    f.close()

这应该可以帮助您开始。

You can serve sitemaps also in gzip format, which makes them a lot smaller. XML is suited perfectly for gzip compression. What I sometimes do: Create the gzipped sitemap file(s) in a cronjob and render them as often as necessary. Usually, once a day will suffice. The code for this may look like this. Just make sure to have your sitemap.xml.gz served from your domain root:

    from django.contrib.sitemaps import GenericSitemap
    from django.contrib.sitemaps.views import sitemap
    from django.utils.encoding import smart_str
    import gzip
    sitemaps = {
        'page': GenericSitemap({'queryset': MyModel.objects.all().order_by('-created'), 'date_field': 'created'}),
    }
    f = gzip.open(settings.STATIC_ROOT+'/sitemap.xml.gz', 'wb')
    f.write(smart_str(sitemap(request, sitemaps=sitemaps).render().content))
    f.close()

This should get you started.

回复收藏 0 原文