什么时候多重查询比多重联接更好?

发布于 2024-12-19 18:41:10 字数 719 浏览 1 评论 0原文

SO中有很多类似的“多查询与单查询”类型的问题。
但我没有看到任何有一般性的结论,因此我对此仍然感到困惑。

因此,我会用其他术语来问它:

什么时候运行多个查询而不是带有多个联接的单个查询更好?

我不是在要求简单的情况,显然是联接两个或三个表比执行 3 个查询要快得多。

我在想,例如,您有 10 多个联接,其中一些联接是多对多关系,因此您的最终查询具有 GROUP_CONCAT、LEFT 和 INNER 联接的混合等。

例如,您想要 < em>产品名称,还有他们所有的图片,还有他们所有的标签,还有他们所有的视频,还有所有的购买方向
最好使用复杂的连接和 group_concat 进行很长的查询(如果不能使用不同的,很多时候真的很难管理),或者执行对产品详细信息的查询,对图像的查询,对产品的另一个查询标签等?

如果有助于澄清问题,我可以写一个具体的例子。但我希望对这种情况有一个一般规则。
极限在哪里?何时使用连接的单个查询比多个查询更糟糕?

并且在这些情况下什么时候运行多个 SELECT 查询更好:
在事务中运行它们更快(autocommit = false)?
将单个查询中的多个选择与多个子选择合并起来更快吗?

谢谢!

There are a lot of similar questions of the type "Multiple Queries vs Single Query" in SO.
But I didn't see any with a general conclusion, therefore I'm still confused about this.

So, I will ask it in other terms:

When is better to run Multiple Queries instead of a Single Query with Multiple Joins?

I'm not asking for the trivial cases, obviously joining two, or 3 tables is much faster than executing 3 queries.

I'm thinking in cases for example where you have 10+ joins, and some of those joins are many to many relations, so your final query has GROUP_CONCAT, a mix of LEFT and INNER joins, etc.

For example, you want the product name, but also all their images, and also all their tags, and also all their videos, and also all the directions where you can buy it.
Is better to make a very long query with complex joins and group_concat (which is many times really difficult to manage if you can't use distinct), or executing a query for the product details, a query for the images, another one for the tags, etc. ?

I can write a particular example if it helps to clarify the question. But I was hoping a general rule for this situations.
Where is the limit? when a single query with Joins is worst than multiple queries?

and also, in those cases when is better to run multiple SELECT queries:
is faster to run them inside a transaction (autocommit = false) ?
is faster to merge those multiple selects inside a single query with multiple subselects?

Thanks !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

手长情犹 2024-12-26 18:41:10

极限在哪里?当使用连接的单个查询比多个查询更糟糕时?

我认为划定限制并不容易,这在很大程度上取决于您的场景和情况。可能有多个因素,如索引、分区、连接列、行数、查询结构等

多个连接,例如连接 5 列,其中连接列是键,大多数行的值不同(例如性别)并且具有适当的值索引可能比仅连接两个没有适当索引的表的查询更快。

我猜想一个人可能会为自己设定限制,例如你可以决定这个特定的用例(例如插入或选择)所花费的时间不得超过1秒,如果超过1秒,可能需要更多的优化。

Where is the limit? when a single query with Joins is worst than multiple queries?

I dont think it is easy to draw a limit, it depends a lot on your scenario and situations. There might be multiple factors like indexes, partitioning, joining columns, number of rows, structure of query e.t.c.

multiple joins, eg joining 5 columns, where joining columns are keys, values are not same for most of rows (eg gender) and have proper indexes might be faster then the query which joins only two tables without proper indexes.

I guess One might set limits for oneself, eg you can decide that this particular use case (eg insert, or selecting) must not take more than 1 second, if it is taking more than that, more optimization might be required.

誰認得朕 2024-12-26 18:41:10

老实说,“这取决于”是唯一有效的答案。不存在也不可能存在“如果大于 X 连接则将其分解”的硬性规则。 (如果有的话,那么 X 就必须每隔几年改变一次。我今天写的东西可能会让 10 年前的平均服务器陷入困境。)

话虽如此,确定该截止点的最佳工具是经验。您编写、测试和试验代码、交叉连接的次数越多,您对“现在”必须使用的硬件和数据集就越熟悉,您就越能够编写最佳查询。这绝对不是说只有嘲笑 SQL-92 标准扩展的大师才能编写最佳查询。通过合理的努力,新程序员可以编写出“足够好”的代码,正如其名称所示,通常对于大多数任务来说已经足够好了。

"It depends" is honestly the only valid answer. There is and can be no hard-and-fast "if greater than X joins then break it up" rule. (If there were, then X would have to change every few years. Stuff I write today would probably bog down the average server 10 years ago.)

With that said, the best tool for deteriming that cutoff point is experience. The more you write, test, and experiment with code, CROSS JOIN the more familiar you are with the hardware and data sets you have to work with "now", the better you will be able to write optimal queries. This is absolutely not to say that only gurus who sneer at the extensions of the SQL-92 standards can write optimmal queries. With reasonable effort new programmers can produce code that is "Good Enough" and, as the name says, that generally is good enough for most tasks.

成熟的代价 2024-12-26 18:41:10
Where is the limit? when a single query with Joins is worst than multiple queries?

这取决于优化器。随着查询变得更加复杂,优化器选择不良执行计划的风险也会增加。

只需选择处理表格的顺序即可在 N 内完成!方式,其中N是查询的表的数量。 5 个表就有 120 种方法,10 个表就有 3628800 种方法。这只是优化器必须做出的决策之一。

Where is the limit? when a single query with Joins is worst than multiple queries?

That would depend on the optimiser. As the query grows more complex, the risk of the optimiser selecting a poor execution plan increases.

Just selecting the order in which to process the tables can be done in N! ways, where N is the number of tables queried. With 5 tables there are 120 ways, with 10 tables a whopping 3628800. And that is just for one of the decisions that the optimiser must make.

风柔一江水 2024-12-26 18:41:10

我想说,当您一次需要所有相关数据或者相关数据非常大(例如带有图像的 LOBS...)时,您会加入而不是运行单独的选择。

如果您不需要一次需要大量相关数据,那么请考虑“延迟初始化”,即在需要时查询大量数据。

I would say you would join rather than run separate selects when you need the related data all at once OR if the related data is really big (e.g. LOBS with images...).

If you don't need the large related data all at once, then think "lazy initialization", where you query that large data when asked for.

染墨丶若流云 2024-12-26 18:41:10

我还想说,当传输的数据比单个查询大几个数量级时。每行的重复数据可能是一个严重的杀手。

我有一次查询,单独产生了大约 10 兆的传输数据,但通过内部联接,由于字段重复了很多次,产生了 900 兆的下载数据。该软件 80% 的时间都花在下载查询结果上。这就是软件分析发挥作用的地方,它会告诉您在软件中花费最多时间的地方。

I would also say, when the data transferred is orders of magnitudes larger than the individual queries. Duplicated data per row can be a serious killer.

I had a query once, that individually, produced about 10megs of transferred data, but with the inner joins, produced 900 megs of data downloaded due to fields being repeated so many times. The software spent 80% of its time just downloading the results of the query. This is where software profiling comes into play, which will tell you where in your software you are spending the most time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文