Django:为什么 Foo.objects.extra(...) 比 Foo.objects.raw 快得多?

发布于 2024-12-07 16:09:49 字数 3266 浏览 1 评论 0原文

所以我试图优化一个相当奇怪的查询,但这是一个遗留数据库,所以我用我所拥有的来凑合。这些是我正在尝试的查询。此时它们提供相同的输出。 w 是我的查询集。

def future_schedule(request):

    past = datetime.date.today()-datetime.timedelta(days=730)

    extra_select = {
        'addlcomplete': 'SELECT Complete FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID',
        'addldate': 'SELECT AddlDate FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID'
    }
    extra_where = ['''(Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0) '''
    ]
    extra_params = [past, past]

    w = Checkin.objects.extra(select=extra_select, where=extra_where, params=extra_params)

# OR This one

    w = Checkin.objects.raw('''SELECT Checkin.SampleID, Checkin.ShortSampleID, Checkin.Company, A.Complete, Checkin.HasDates, A.AddlDate FROM Checkin LEFT JOIN (SELECT ShortSampleID, Complete, AddlDate FROM tblAdditionalDates) A ON A.ShortSampleID = Checkin.ShortSampleID WHERE (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0)''')

这两者都返回相同数量的记录 (322)。 .extra 在渲染 HTML 方面比 .raw 查询快大约 10 秒,并且对于所有密集目的,.raw 查询甚至稍微简单一些。有谁知道这可能是为什么?根据我的结构,.raw 可能是我获取所需数据的唯一方法(我需要 extra_select 字典中的 addlcomplete 和 addldate 并在having子句中使用它们来进一步过滤查询集),但我当然不喜欢如何这需要很长时间。是模板层慢还是实际查询层慢?我怎样才能最好地调试这个?

感谢您在糟糕的数据结构中寻求优化的帮助。

更新 1:2011-10-03

所以我安装了 django-debugtoolbar 来窥探一下,我启用了 MySQL 常规日志记录并提出了以下内容:

使用 .filter().extra() 总查询计数为 2。使用 .raw() 总查询计数为 1984!!! (不忽略幽灵般的文学参考)

我的模板使用重新组合,然后循环该重新组合。没有遵循任何关系,没有使用除内置函数之外的模板标签。 Select_lated 没有被使用,我仍然只收到 2 个查询。查看mysql日志,果然——1984条查询。

当查看执行的查询时,基本上看起来每个 {{ Modelinstance.field }} django 都在执行 SELECT pk, field FROM Model WHERE Model.pk = Modelinstance.pk如果你问我的话,这似乎是完全错误的。我在这里遗漏了一些东西还是 django 真的在查询中疯狂运行吗?

结束更新 1

更新 2 请参阅下面的答案

格雷格

So I am trying to optimize a fairly odd query, but this is a legacy database so I make do with what I have. These are the queries I am trying. They provide the same output at this point. w is my queryset.

def future_schedule(request):

    past = datetime.date.today()-datetime.timedelta(days=730)

    extra_select = {
        'addlcomplete': 'SELECT Complete FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID',
        'addldate': 'SELECT AddlDate FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID'
    }
    extra_where = ['''(Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0) '''
    ]
    extra_params = [past, past]

    w = Checkin.objects.extra(select=extra_select, where=extra_where, params=extra_params)

# OR This one

    w = Checkin.objects.raw('''SELECT Checkin.SampleID, Checkin.ShortSampleID, Checkin.Company, A.Complete, Checkin.HasDates, A.AddlDate FROM Checkin LEFT JOIN (SELECT ShortSampleID, Complete, AddlDate FROM tblAdditionalDates) A ON A.ShortSampleID = Checkin.ShortSampleID WHERE (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0)''')

Both of these return the same number of records (322). .extra is about 10 seconds faster in rendering the HTML than the .raw query and for all intensive purposes, the .raw query is mildly less complex even. Does anyone have any insight as to why this might be? Based on my structure, .raw may be the only way I get the data I need (I need the addlcomplete and addldate in the extra_select dict and use them in a Having clause to further filter the queryset) but I certainly don't like how long it is taking. Is it on the template layer that it is slower or the actual query layer? How can I best debug this?

Thank for your help in this quest for optimization amidst poor data structures.

UPDATE 1: 2011-10-03

So I installed django-debugtoolbar to snoop around a bit and I eneabled MySQL general logging and came up with the following:

using .filter() or .extra() Total Query count is 2. Using .raw() Total Query count is 1984!!! (Spooky literary reference not ignored)

My template is using a regroup and then looping through that regroup. No relations are being followed, no template tags other than builtins are being used. Select_related is NOT being used and I still only get the 2 queries. Looking at the mysql log, sure enough - 1984 queries.

When looking at the queries that were executed, basically it looks like for every {{ Modelinstance.field }} django was doing a SELECT pk, field FROM Model WHERE Model.pk = Modelinstance.pk This seems completely wrong if you ask me. Am I missing something here or is django really running wild with queries?

END UPDATE 1

UPDATE 2
See answer below

Greg

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

骄傲 2024-12-14 16:09:49

Ok. Here are my final conclusions. While Furbeenator is correct about the internal Django optimizations, turns out there is a much larger, user error that caused the slowdown and the aforementioned thousands of queries.

It is clearly documented in the Raw queryset docs that when you defer fields (i.e. not using SELECT * FROM ...) and are selecting only certain fields specifically (SELECT Checkin.Sampleid, ... the fields that you don't select can still be accessed but with another database call. So, if you are selecting a subset of fields in your raw query and you forgot a field in your query that you use in your template, Django performs a database lookup to find that field you are referencing in your template rather than complaining about it not existing or whatever. So, let's say you leave out 5 fields from your query (which is what I did) that you end up referencing in your template and you have 300 records that you are looping through. This incurs 1500 extra database hits to get those 5 fields for each record.

So, beware of hidden references and thank god for Django Debug Toolbar

蘑菇王子 2024-12-14 16:09:49

从优化部分:数据库访问优化,他们提出了优化方法,其中之一是 extra() 方法。然后他们提到了.raw()。我的假设是他们使 raw() 更加健壮和强大,因此它在优化方面提供了最大的灵活性。 执行原始 SQL 查询 可以让你做的不仅仅是额外的( )。我的直觉是,它只是更注重灵活性而不是性能,并且在可能的情况下应该使用 extra() 而不是 raw() 。

From the Optimization section: Database access optimization, they suggest ways to optimize, one of which is the extra() method. Then they mention .raw(). It is my assumption that they made raw() much more robust and powerful so it offered maximum flexibility over optimization. Performing raw SQL queries allows you to do a lot more than extra(). My hunch is that it is just geared more toward flexibility than performance and extra() should be used over raw() where possible.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文