如何使用 django 获取每个作者最新的 3 本书

发布于 2024-11-02 01:06:39 字数 664 浏览 1 评论 0原文

使用以下 django 模型：

class Author(models.Model):
   name = models.CharField(max_length=100)
   age = models.IntegerField()

class Book(models.Model):
    name = models.CharField(max_length=300)
    author = models.ForeignKey(Author)
    pubdate = models.DateField()
    class Meta:
        ordering = ('-pubdate')

如何获取每位作者最新出版的五本书？

我曾考虑过迭代每个作者并将该作者出版的书籍切片为 5。

for a in Author.objects.all():
    books = Book.objects.filter(author = a )[:5]
    print books #and/or process the entries...

但是，如果表有很多记录（可能有数千本书），这可能会很慢且效率低下。

那么，有没有其他方法可以使用 django （或 sql 查询）来完成此任务？

原文

Using the following django models:

class Author(models.Model):
   name = models.CharField(max_length=100)
   age = models.IntegerField()

class Book(models.Model):
    name = models.CharField(max_length=300)
    author = models.ForeignKey(Author)
    pubdate = models.DateField()
    class Meta:
        ordering = ('-pubdate')

How can i get the five latest books published by each author?

I had considered iterate each author and get books published by the author slicing to 5.

for a in Author.objects.all():
    books = Book.objects.filter(author = a )[:5]
    print books #and/or process the entries...

But, if the tables has a lot of records (maybe thousands of books), this could be slow and inefficient.

So, is there any other way to accomplish this with django (or a sql query) ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

已下线请稍等 2024-11-09 01:06:39

我建议：

for a in Author.objects.all():
    books = a.book_set.all().order_by('-pub_date')[:5]
    print books #and/or process the entries...

或者，如果顺序应该始终相同，正如您定义的 Meta 那样，

    books = a.book_set.all()[:5]

应该可以解决问题

I would suggest :

for a in Author.objects.all():
    books = a.book_set.all().order_by('-pub_date')[:5]
    print books #and/or process the entries...

or, if the order should always be the same, as you define Meta,

    books = a.book_set.all()[:5]

should do the trick

回复收藏 0 原文

傲娇萝莉攻 2024-11-09 01:06:39

如果您担心查询的速度，请在 pubdate 字段上构建索引：

pubdate = models.DateField(db_index=True)

这应该避免每次运行查询时扫描整个表。

postgres 中的 SQL 类似于：

select b1.name, b1.author
from books b1
where b1.id in (
    select b2.id
    from books b2
    where b1.author = b2.author
    order by b2.pubdate desc
    limit 3)
order by b1.author, b1.name

If you're worried about the speed of the query, build an index on your pubdate field:

pubdate = models.DateField(db_index=True)

This should avoid scanning the entire table each time you run the query.

The SQL, in postgres, would be something like:

select b1.name, b1.author
from books b1
where b1.id in (
    select b2.id
    from books b2
    where b1.author = b2.author
    order by b2.pubdate desc
    limit 3)
order by b1.author, b1.name

回复收藏 0 原文

冷夜 2024-11-09 01:06:39

对我来说，我有‍‍ 3 位作者，每个作者都至少有 300 篇帖子。
我测试了两种方法，差异非常引人注目！
根据django-debug-toolbar显示，第一种方法有3个类似的查询，其中3是这里的作者数量，它需要总共 8 个查询（+180 毫秒） 和 CPU（400 - 800 毫秒），但第二个总共需要 5 个查询（+80 毫秒） ) 和 CPU (+70 - 90 ms)。

我还看到有一个额外的查询SELECT ... FROMauthor，我无法弄清楚为什么它被生成它似乎根本没有被使用。

from .models import Author, Po‍‍st

最简单和最熟悉的方法

from itertools import chain

authors = Author.objects.all().prefetch_related("posts")

posts = list(chain.from_iterable(author.posts.order_by("-created")[:3]
             for author in authors)
        )

更复杂和不熟悉的方法

from django.db.models import OuterRef, Prefetch, Subquery

subquery = Subquery(
        Post.objects.filter(author__id=OuterRef("author__id"))
        .order_by("-created")
        .values_list("id", flat=True)[:4]
)

authors = Author.objects.all().prefetch_related(
    Prefetch("posts", queryset=Post.objects.filter(id__in=subquery))
)

posts = list(chain.from_iterable(source.posts.all()
            for source in feed_sources)
)

如果您只需要帖子而不是作者：

以下获取数据的方式会少做一次查询，并且只返回帖子。

subquery = Subquery(
    Post.objects.filter(author__id=OuterRef("author__id"))
    .order_by("-created")
    .values_list("id", flat=True)[:4]
)

posts = (
    Post.objects.select_related("author")
    .exclude(author=an_author)
    .filter(id__in=subquery)
)

For me, I had‍‍ 3 authors and each of which had more than at least 300 posts.
I tested two approaches and the difference is eye catching!
according to what django-debug-toolbar showed, first approach it had 3 similar queries which the 3 is the number of authors here and it takes a total of 8 queries (+180 ms) and CPU (400 - 800 ms), but the second one takes a total of 5 queries (+80 ms) and CPU (+70 - 90 ms).

also I see there is one extra query SELECT ... FROM author which I couldn't figure it out why it had been produced it seems not to be used at all.

from .models import Author, Po‍‍st

The simplest and most familiar approach

from itertools import chain

authors = Author.objects.all().prefetch_related("posts")

posts = list(chain.from_iterable(author.posts.order_by("-created")[:3]
             for author in authors)
        )

The more complicated and unfamiliar approach

from django.db.models import OuterRef, Prefetch, Subquery

subquery = Subquery(
        Post.objects.filter(author__id=OuterRef("author__id"))
        .order_by("-created")
        .values_list("id", flat=True)[:4]
)

authors = Author.objects.all().prefetch_related(
    Prefetch("posts", queryset=Post.objects.filter(id__in=subquery))
)

posts = list(chain.from_iterable(source.posts.all()
            for source in feed_sources)
)

If you just need the Posts and not the Authors:

the following way of fetching data does one less query and only returns the posts.

subquery = Subquery(
    Post.objects.filter(author__id=OuterRef("author__id"))
    .order_by("-created")
    .values_list("id", flat=True)[:4]
)

posts = (
    Post.objects.select_related("author")
    .exclude(author=an_author)
    .filter(id__in=subquery)
)

回复收藏 0 原文

~没有更多了~