往返 MySql 的开销?

发布于 2024-08-10 11:45:43 字数 439 浏览 2 评论 0原文

因此,我构建 django 应用程序已经有一段时间了,并且享受了一切:只使用 ORM,从不编写自定义 SQL。

一旦你有大量用户特定内容(即照片、朋友、其他数据等),网站的主页(用户将花费 80% - 90% 时间的主要界面)就会变得很慢,

所以我弹出了在 sql 记录器中(预装了 pinax,我只是在设置中启用了它),想象一下当它报告超过 500 个数据库查询时我的惊讶!使用手动编码的 sql,我几乎不会在最复杂的页面上运行超过 50 个。

事后看来,这并不令人惊讶,但似乎这并不是一件好事。

...即使只有十几个左右的查询需要 1ms+

所以我想知道,到 mysql 的往返有多少开销? django 和 mysql 在同一服务器上运行,因此不应该有任何与网络相关的开销。

So I've been building django applications for a while now, and drinking the cool-aid and all: only using the ORM and never writing custom SQL.

The main page of the site (the primary interface where users will spend 80% - 90% of their time) was getting slow once you have a large amount of user specific content (ie photos, friends, other data, etc)

So I popped in the sql logger (was pre-installed with pinax, I just enabled it in the settings) and imagine my surprise when it reported over 500 database queries!! With hand coded sql I hardly ever ran more than 50 on the most complex pages.

In hindsight it's not all together surprising, but it seems that this can't be good.

...even if only a dozen or so of the queries take 1ms+

So I'm wondering, how much overhead is there on a round trip to mysql? django and mysql are running on the same server so there shouldn't be any networking related overhead.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

回首观望 2024-08-17 11:45:43

仅仅因为您使用 ORM 并不意味着您不应该进行性能调优。

我和你一样,我的一个应用程序的主页性能很低。我发现我正在执行数百个查询来显示该页面。我查看了我的代码,意识到通过仔细使用 select_lated(),我的查询将带来更多我需要的数据 - 我从数百个查询增加到数十个。

您还可以运行 SQL 分析器并查看是否没有索引可以帮助您最常见的查询 - 您知道,标准数据库内容。

我想,缓存也是你的朋友。如果很多页面没有变化,是否需要每次都查询数据库?

如果所有其他方法都失败了,请记住:ORM 很棒,是的 - 您应该尝试使用它,因为它是 Django 哲学; 但你还没有嫁给它

如果您确实有一个用例,研究和调整 ORM 导航没有帮助,并且您确定使用标准查询可以做得更好:在这种情况下使用原始 sql。

Just because you are using an ORM doesn't mean that you shouldn't do performance tuning.

I had - like you - a home page of one of my applications that had low performance. I saw that I was doing hundreds of queries to display that page. I went looking at my code and realized that with some careful use of select_related() my queries would bring more of the data I needed - I went from hundreds of queries to tens.

You can also run a SQL profiler and see if there aren't indices that would help your most common queries - you know, standard database stuff.

Caching is also your friend, I would think. If a lot of a page is not changing, do you need to query the database every single time?

If all else fails, remember: the ORM is great, and yes - you should try to use it because it is the Django philosophy; but you are not married to it.

If you really have a usecase where studying and tuning the ORM navigation didn't help, if you are sure that you could do it much better with a standard query: use raw sql for that case.

維他命╮ 2024-08-17 11:45:43

每个查询的开销只是图片的一部分。 Django 和 Mysql 服务器之间的实际往返时间可能非常小,因为大多数查询都会在不到一毫秒的时间内返回。更大的问题是,向数据库发出的查询数量很快就会淹没它。一个页面 500 次查询已经太多了,甚至 50 对我来说也已经很多了。如果 10 个用户查看复杂的页面,那么您现在的查询数量将高达 5000 个。

当调用者从广域网访问数据库时,到数据库服务器的往返时间更重要,其中往返时间很容易在 20 毫秒到 100 毫秒之间。

我肯定会考虑使用某种缓存。

The overhead of each queries is only part of the picture. The actual round trip time between your Django and Mysql servers is probably very small since most of your queries are coming back in less than a one millisecond. The bigger problem is that the number of queries issued to your database can quickly overwhelm it. 500 queries for a page is way to much, even 50 seems like a lot to me. If ten users view complicated pages you're now up to 5000 queries.

The round trip time to the database server is more of a factor when the caller is accessing the database from a Wide Area Network, where roundtrips can easily be between 20ms and 100ms.

I would definitely look into using some kind of caching.

深陷 2024-08-17 11:45:43

有一些方法可以减少查询量。

  1. 使用.filter().all()来获取一堆东西;在视图功能(或通过 {%if%} 的模板)中进行选择。 Python 可以比 MySQL 更快地处理一批行。

    “但是我可以向模板发送太多内容”。确实如此,但您将执行更少的 SQL 请求。衡量一下哪个更好。

    这就是你以前编写 SQL 时所做的事情。这并没有错 - 它不会破坏 ORM - 但它优化了底层数据库工作并将处理放入视图函数和模板中。

  2. 避免在模板中进行查询导航。当您执行 {{foo.bar.baz.quux}} 时,将使用 SQL 获取与 foo 关联的 bar,然后获取 bazbar 关联,然后是与 baz 关联的 quux。您可以通过一些仔细的 .filter() 和 Python 处理来减少此查询业务,以在视图函数中组装有用的元组。

    这又是您手工编写 SQL 时经常做的事情。在这种情况下,您可以在视图函数中收集大批量的 ORM 管理对象,并在 Python 中进行过滤,而不是通过大量单独的 ORM 请求。

    这不会破坏 ORM。它将使用情况从大量小查询更改为一些较大查询。

There are some ways to reduce the query volume.

  1. Use .filter() and .all() to get a bunch of things; pick and choose in the view function (or template via {%if%}). Python can process a batch of rows faster than MySQL.

    "But I could send too much to the template". True, but you'll execute fewer SQL requests. Measure to see which is better.

    This is what you used to do when you wrote SQL. It's not wrong -- it doesn't break the ORM -- but it optimizes the underlying DB work and puts the processing into the view function and the template.

  2. Avoid query navigation in the template. When you do {{foo.bar.baz.quux}}, SQL is used to get the bar associated with foo, then the baz associated with the bar, then the quux associated with baz. You may be able to reduce this query business with some careful .filter() and Python processing to assemble a useful tuple in the view function.

    Again, this was something you used to do when you hand-crafted SQL. In this case, you gather larger batches of ORM-managed objects in the view function and do your filtering in Python instead of via a lot of individual ORM requests.

    This doesn't break the ORM. It changes the usage profile from lots of little queries to a few bigger queries.

给不了的爱 2024-08-17 11:45:43

数据库调用总是存在开销,在您的情况下,开销并没有那么糟糕,因为应用程序和数据库位于同一台计算机上,因此没有网络延迟,但仍然存在很大的成本。

当您向数据库发出请求时,它必须通过执行许多操作来准备服务该请求,包括:

  • 将资源(内存缓冲区、临时表等)分配给将处理请求的数据库服务器连接/线程、
  • 反序列化sql和参数(即使在一台机器上也是必要的,因为这是一个进程间请求,除非您使用嵌入式数据库)
  • 检查查询是否存在于查询缓存中,如果不优化它并将其放入缓存中。
    • 另请注意,如果您的查询未参数化(即值未与 SQL 分离),则可能会导致语句缓存未命中,这与每个请求都会导致查询被分析和优化的含义相同。时间。
  • 处理查询。
  • 准备并将结果返回给客户。

这只是大多数数据库管理系统处理 SQL 请求时所做的事情的概述。即使查询本身运行得相对较快,您也会产生 500 倍的开销。底线数据库交互甚至与本地数据库的交互也没有您想象的那么便宜。

There is always overhead in database calls, in your case the overhead is not that bad because the application and database are on the same machine so there is no network latency but there is still a significant cost.

When you make a request to the database it has to prepare to service that request by doing a number of things including:

  • Allocating resources (memory buffers, temp tables etc) to the database server connection/thread that will handle the request,
  • De-serializing the sql and parameters (this is necessary even on one machine as this is an inter-process request unless you are using an embeded database)
  • Checking whether the query exists in the query cache if not optimise it and put it in the cache.
    • Note also that if your queries are not parametrised (that is the values are not separated from the SQL) this may result in cache misses for statements that should be the same meaning that each request results in the query being analysed and optimized each time.
  • Process the query.
  • Prepare and return the results to the client.

This is just an overview of the kinds of things the most database management systems do to process an SQL request. You incur this overhead 500 times even if the the query itself runs relatively quickly. Bottom line database interactions even to local database are not as cheap as you might expect.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文