在 Django 中,检查空查询集的最有效方法是什么?

发布于 2024-11-26 19:27:27 字数 323 浏览 0 评论 0原文

我听到了使用以下内容的建议:

if qs.exists():
    ...

if qs.count():
    ...

try:
    qs[0]
except IndexError:
    ...

从下面的评论复制:“我正在寻找这样的语句”在 MySQL 和 PostgreSQL 中,count() 对于短查询更快,exists() 对于长查询更快,并使用 QuerySet [0] 当您可能需要第一个元素并且想要检查它是否存在时。然而,当 count() 更快时,它只会稍微快一些,因此建议在两者之间进行选择时始终使用 contains() 。”

I've heard suggestions to use the following:

if qs.exists():
    ...

if qs.count():
    ...

try:
    qs[0]
except IndexError:
    ...

Copied from comment below: "I'm looking for a statement like "In MySQL and PostgreSQL count() is faster for short queries, exists() is faster for long queries, and use QuerySet[0] when it's likely that you're going to need the first element and you want to check that it exists. However, when count() is faster it's only marginally faster so it's advisable to always use exists() when choosing between the two."

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

笑脸一如从前 2024-12-03 19:27:27

query.exists() 是最有效的方法。

特别是在 postgres 上,count() 可能非常昂贵,有时比普通的选择查询更昂贵。

exists() 运行没有 select_lated、字段选择或排序的查询,并且仅获取单个记录。这比通过表连接和排序来计算整个查询要快得多。

qs[0] 仍将包括 select_lated、字段选择和排序;所以会更贵。

Django源代码在这里(django/db/models/sql/query.py RawQuery.has_results):

https://github.com/django/django/blob/60e52a047e55bc4cd5a93a8bd4d07baed27e9a22/django/db/models/sql/query.py#L499

def has_results(self, using):
    q = self.clone()
    if not q.distinct:
        q.clear_select_clause()
    q.clear_ordering(True)
    q.set_limits(high=1)
    compiler = q.get_compiler(using=using)
    return compiler.has_results()

前几天让我陷入困境的另一个问题是调用if 语句中的 QuerySet。执行并返回整个查询!

如果变量 query_set 可能是 None (未设置函数的参数),则使用:

if query_set is None:
    # 

not:

if query_set:
   # you just hit the database

query.exists() is the most efficient way.

Especially on postgres count() can be very expensive, sometimes more expensive then a normal select query.

exists() runs a query with no select_related, field selections or sorting and only fetches a single record. This is much faster then counting the entire query with table joins and sorting.

qs[0] would still includes select_related, field selections and sorting; so it would be more expensive.

The Django source code is here (django/db/models/sql/query.py RawQuery.has_results):

https://github.com/django/django/blob/60e52a047e55bc4cd5a93a8bd4d07baed27e9a22/django/db/models/sql/query.py#L499

def has_results(self, using):
    q = self.clone()
    if not q.distinct:
        q.clear_select_clause()
    q.clear_ordering(True)
    q.set_limits(high=1)
    compiler = q.get_compiler(using=using)
    return compiler.has_results()

Another gotcha that got me the other day is invoking a QuerySet in an if statement. That executes and returns the whole query !

If the variable query_set may be None (unset argument to your function) then use:

if query_set is None:
    # 

not:

if query_set:
   # you just hit the database
铁憨憨 2024-12-03 19:27:27

contains() 通常比 count() 更快,但并非总是如此(请参见下面的测试)。 count() 可用于检查是否存在和长度。

仅当您确实需要该对象时才使用 qs[0]。如果您只是测试存在性,那么速度会慢得多。

在 Amazon SimpleDB 上,400,000 行:

  • qs:325.00 usec/pass
  • qs.exists():144.46 usec/pass
  • qs.count() 144.33 usec/pass
  • qs[0]: 324.98 usec/pass

在 MySQL 上,57 行:

  • qs:1.07 usec/pass
  • qs.exists():1.21 usec/pass
  • qs.count():1.16 usec/pass
  • qs[0]: 1.27 usec/pass

我对每个通道使用了随机查询来降低数据库级缓存的风险。测试代码:

import timeit

base = """
import random
from plum.bacon.models import Session
ip_addr = str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))
try:
    session = Session.objects.filter(ip=ip_addr)%s
    if session:
        pass
except:
    pass
"""

query_variatons = [
    base % "",
    base  % ".exists()",
    base  % ".count()",
    base  % "[0]"
    ]

for s in query_variatons:
    t = timeit.Timer(stmt=s)
    print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100000)

exists() is generally faster than count(), though not always (see test below). count() can be used to check for both existence and length.

Only use qs[0]if you actually need the object. It's significantly slower if you're just testing for existence.

On Amazon SimpleDB, 400,000 rows:

  • bare qs: 325.00 usec/pass
  • qs.exists(): 144.46 usec/pass
  • qs.count() 144.33 usec/pass
  • qs[0]: 324.98 usec/pass

On MySQL, 57 rows:

  • bare qs: 1.07 usec/pass
  • qs.exists(): 1.21 usec/pass
  • qs.count(): 1.16 usec/pass
  • qs[0]: 1.27 usec/pass

I used a random query for each pass to reduce the risk of db-level caching. Test code:

import timeit

base = """
import random
from plum.bacon.models import Session
ip_addr = str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))
try:
    session = Session.objects.filter(ip=ip_addr)%s
    if session:
        pass
except:
    pass
"""

query_variatons = [
    base % "",
    base  % ".exists()",
    base  % ".count()",
    base  % "[0]"
    ]

for s in query_variatons:
    t = timeit.Timer(stmt=s)
    print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100000)
上课铃就是安魂曲 2024-12-03 19:27:27

这取决于使用上下文。

根据文档

使用 QuerySet.count()

...如果您只想计数,而不是执行 len(queryset)。

使用 QuerySet.exists()

...如果您只想查找是否至少存在一个结果,而不是查询集。

但是:

不要过度使用count()和exists()

如果您需要查询集中的其他数据,只需对其进行评估即可。

因此,我认为如果您只想检查空的 QuerySet,QuerySet.exists() 是最推荐的方法。另一方面,如果您想稍后使用结果,最好对其进行评估。

我还认为你的第三个选择是最昂贵的,因为你需要检索所有记录只是为了检查是否存在。

It depends on use context.

According to documentation:

Use QuerySet.count()

...if you only want the count, rather than doing len(queryset).

Use QuerySet.exists()

...if you only want to find out if at least one result exists, rather than if queryset.

But:

Don't overuse count() and exists()

If you are going to need other data from the QuerySet, just evaluate it.

So, I think that QuerySet.exists() is the most recommended way if you just want to check for an empty QuerySet. On the other hand, if you want to use results later, it's better to evaluate it.

I also think that your third option is the most expensive, because you need to retrieve all records just to check if any exists.

魄砕の薆 2024-12-03 19:27:27

@Sam Odio 的解决方案 是一个不错的起点,但该方法存在一些缺陷,即:

  1. 随机 IP 地址可能会结束up 匹配 0 或很少的结果
  2. 异常会扭曲结果,所以我们应该避免处理异常

因此,我决定排除一些肯定不匹配的东西,而不是过滤可能匹配的东西,希望仍然避免数据库缓存,同时也保证行数相同。

我只针对本地 MySQL 数据库进行了测试,数据集为:

>>> Session.objects.all().count()
40219

Timing code:

import timeit
base = """
import random
import string
from django.contrib.sessions.models import Session
never_match = ''.join(random.choice(string.ascii_uppercase) for _ in range(10))
sessions = Session.objects.exclude(session_key=never_match){}
if sessions:
    pass
"""
s = base.format('count')

query_variations = [
    "",
    ".exists()",
    ".count()",
    "[0]",
]

for variation in query_variations:
    t = timeit.Timer(stmt=base.format(variation))
    print "{} => {:02f} usec/pass".format(variation.ljust(10), 1000000 * t.timeit(number=100)/100000)

outputs:

           => 1390.177710 usec/pass
.exists()  => 2.479579 usec/pass
.count()   => 22.426991 usec/pass
[0]        => 2.437079 usec/pass

因此您可以看到 count()exists() 慢大约 9 倍这个数据集。

[0] 也很快,但需要异常处理。

@Sam Odio's solution was a decent starting point but there's a few flaws in the methodology, namely:

  1. The random IP address could end up matching 0 or very few results
  2. An exception would skew the results, so we should aim to avoid handling exceptions

So instead of filtering something that might match, I decided to exclude something that definitely won't match, hopefully still avoiding the DB cache, but also ensuring the same number of rows.

I only tested against a local MySQL database, with the dataset:

>>> Session.objects.all().count()
40219

Timing code:

import timeit
base = """
import random
import string
from django.contrib.sessions.models import Session
never_match = ''.join(random.choice(string.ascii_uppercase) for _ in range(10))
sessions = Session.objects.exclude(session_key=never_match){}
if sessions:
    pass
"""
s = base.format('count')

query_variations = [
    "",
    ".exists()",
    ".count()",
    "[0]",
]

for variation in query_variations:
    t = timeit.Timer(stmt=base.format(variation))
    print "{} => {:02f} usec/pass".format(variation.ljust(10), 1000000 * t.timeit(number=100)/100000)

outputs:

           => 1390.177710 usec/pass
.exists()  => 2.479579 usec/pass
.count()   => 22.426991 usec/pass
[0]        => 2.437079 usec/pass

So you can see that count() is roughly 9 times slower than exists() for this dataset.

[0] is also fast, but it needs exception handling.

拥醉 2024-12-03 19:27:27

我认为第一种方法是最有效的方法(您可以轻松地用第二种方法来实现它,所以也许它们几乎是相同的)。最后一个需要实际从数据库中获取整个对象,因此它几乎肯定是最昂贵的。

但是,就像所有这些问题一样,了解特定数据库、模式和数据集的唯一方法就是亲自测试。

I would imagine that the first method is the most efficient way (you could easily implement it in terms of the second method, so perhaps they are almost identical). The last one requires actually getting a whole object from the database, so it is almost certainly the most expensive.

But, like all of these questions, the only way to know for your particular database, schema and dataset is to test it yourself.

演多会厌 2024-12-03 19:27:27

我也陷入了这个困境。是的,在大多数情况下,exists() 更快,但这在很大程度上取决于您尝试执行的查询集的类型。例如对于一个简单的查询,如:
my_objects = MyObject.objets.all() 您将使用my_objects.exists()。但是,如果您要执行如下查询: MyObject.objects.filter(some_attr='anything').exclude(something='what').distinct('key').values() 可能您需要测试哪一个更适合(exists()count()len(my_objects))。请记住,数据库引擎是执行查询的引擎,要获得良好的性能结果,很大程度上取决于数据结构和查询的形成方式。您可以做的一件事是,审核查询并针对数据库引擎自行测试它们,然后比较您的结果,您会惊讶于 django 有时是多么天真,尝试 QueryCountMiddleware 来查看执行的所有查询,你就会明白我在说什么。

I was also in this trouble. Yes exists() is faster for most cases but it depends a lot on the type of queryset you are trying to do. For example for a simple query like:
my_objects = MyObject.objets.all() you would use my_objects.exists(). But if you were to do a query like: MyObject.objects.filter(some_attr='anything').exclude(something='what').distinct('key').values() probably you need to test which one fits better (exists(), count(), len(my_objects)). Remember the DB engine is the one who will perform the query, and to get a good result in performance, it depends a lot on the data structure and how the query is formed. One thing you can do is, audit the queries and test them on your own against the DB engine and compare your results you will be surprised by how naive sometimes django is, try QueryCountMiddleware to see all the queries executed, and you will see what i am talking about.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文