Django 使用 Annotate 而不是 Distinct()

发布于 2024-12-02 00:19:25 字数 501 浏览 1 评论 0原文

我读到过，distinct() API 调用有时会出现一些性能问题。我想尝试通过 orm 重写查询，避免使用不同的（至少分析差异）。

我的理解是，values() 在底层执行 Group By。但是，当我测试这两种方法时，对象的计数会有所不同，具体取决于我使用的是distinct()还是values()/annotate()。

   zip_codes = Location.objects.values('zip_code').annotate(zip_count=Count('zip_code')).exclude(zip_code=None).count()

VS。

  zip_codes = Location.objects.values_list('zip_code', flat=True).exclude(zip_code=None).distinct()

对这里出了什么问题有什么想法吗？

谢谢！

原文

I have read that distinct() API call has some performance issues at times. I wanted to try to rewrite a query through the orm which avoided using distinct (at least profile the difference).

My understanding is that values() performs a Group By under the hood. When I test out the two methods, though, the Count of objects differs depending on whether I use distinct() or values()/annotate().

   zip_codes = Location.objects.values('zip_code').annotate(zip_count=Count('zip_code')).exclude(zip_code=None).count()

VS.

  zip_codes = Location.objects.values_list('zip_code', flat=True).exclude(zip_code=None).distinct()

any thoughts on what is wrong here?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゝ偶尔ゞ 2024-12-09 00:19:25

我刚刚用类似的查询快速对照我拥有的数据库检查了您的查询。计数是相同的，因此我不确定您的数据是否导致了问题。

不过，我也对这个前提表示高度怀疑。 DISTINCT确实是一个CPU密集型查询。但是，COUNT(*) 也是如此，您的第二个查询将首先使用 group by 运行计数聚合，然后对结果运行 COUNT。我会把钱花在更快的单个 DISTINCT 调用上（我还会检查您使用的任何数据库后端）。所有这些与 django 的 ORM 关系不大，而与数据库后端关系更大。

也请思考一下这一点。与基于注释的查询相比，基于不同的查询要完成的任务要清晰一个数量级。您是否有证据支持 DISTINCT 在您的情况下会很慢，或者更好的是它现在正在形成瓶颈？如果不是，您就已经进入了过早优化的范围，应该认真重新考虑您的路径。

过早优化。

优化只有在重要的时候才重要。当它很重要时，它就非常重要，但在你知道它很重要之前，不要浪费很多时间去做这件事。即使您知道它很重要，您也需要知道它的重要性在哪里。没有性能数据，您将不知道要优化什么，而且很可能会优化错误的东西。
结果将是晦涩难懂的，难以编写，难以调试，并且难以维护无法解决问题的代码。因此它具有双重缺点：(a) 增加软件开发和软件维护成本，(b) 对性能没有任何影响。

换句话说，清楚地编写您的软件，然后当您发现问题时，追溯到源头并修复它。在此之前你所做的任何事情都会适得其反。花时间考虑哪些索引对您的数据库很重要，以及在哪里使用 select_lated。这些比您在这里担心的有效 10000%（除非您一直在计算邮政编码，在这种情况下，让我向您介绍缓存）

I just quickly checked your queries against a database I have with a similar query. The counts was identical so I'm not sure what about your data is resulting in issues.

I'd also be HIGHLY skeptical of the premise though. DISTINCT is indeed a cpu intensive query. However, so is COUNT(*) and your second query is going to first run an count aggregate with a group by and then run a COUNT on the results. I'd be put money on the single DISTINCT call being faster (I'd also check with whichever database backend you're using to see). All of this has very little to do with django's ORM and a whole heck of a lot more to do with your database backend.

Also think about this. The distinct based query is an order of magnitude clearer as to what it's accomplishing compared to the annotate based one. Do you have evidence to support that DISTINCT is going to be slow in your situation, or better still that it's forming a bottlneck right now? If not you're well into the range of premature optimization and should heavily reconsider your path.

Premature Optimization.

Optimization matters only when it matters. When it matters, it matters a lot, but until you know that it matters, don't waste a lot of time doing it. Even if you know it matters, you need to know where it matters. Without performance data, you won't know what to optimize, and you'll probably optimize the wrong thing.
The result will be obscure, hard to write, hard to debug, and hard to maintain code that doesn't solve your problem. Thus it has the dual disadvantage of (a) increasing software development and software maintenance costs, and (b) having no performance effect at all.

In other words write your software clearly and then when you find a problem trace it to the source and fix it. Anything you do before that is counterproductive. Spend your time worrying about which indexes are going to matter on your db, and where to use select_related. Those are 10000% more effective than what you are worrying about here (unless you are counting zip codes all the time, in which case let me introduce you to caching)

回复收藏 0 原文

~没有更多了~