优化多对一关系的 SQL 查询

发布于 2024-12-05 22:40:29 字数 866 浏览 0 评论 0原文

我有两个具有多对一关系的表,我将其称为 Parent_Table 和 Child_Table (即父项有零个或多个子项,但子项只有一个父项)。我需要计算至少有一个满足某些条件的孩子的父母的数量。哪个查询是最佳的?

选项 1(很确定不是这个)

SELECT COUNT(DISTINCT(pt.ID)) 
FROM PARENT_TABLE pt
JOIN CHILD_TABLE ct
ON pt.ID =  ct.PARENT_ID
WHERE <parent meets some condition>
AND <child meets some condition>

选项 2

SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE pt.ID in
(
SELECT ct.PARENT_ID
FROM CHILD_TABLE ct
WHERE <child meets condition>
)
AND <parent meets some condition>

选项 3(我猜是最快的)

SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE EXISTS
(
SELECT 1
FROM CHILD_TABLE ct
WHERE ct.PARENT_ID = pt.ID
AND <child meets condition>
)
AND <parent meets some condition>

或者完全是另一回事?是取决于每个表的大小,还是取决于两个条件的复杂程度,或者数据是否排序?

编辑:数据库是Oracle。

I've got two tables with a many to one relationship which I'll call Parent_Table and Child_Table (i.e. a parent has zero or more children, but children have exactly one parent). I need to count the number of parents who have at least one child that fulfills some condition. Which query is optimal?

Option 1 (pretty sure it's not this one)

SELECT COUNT(DISTINCT(pt.ID)) 
FROM PARENT_TABLE pt
JOIN CHILD_TABLE ct
ON pt.ID =  ct.PARENT_ID
WHERE <parent meets some condition>
AND <child meets some condition>

Option 2

SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE pt.ID in
(
SELECT ct.PARENT_ID
FROM CHILD_TABLE ct
WHERE <child meets condition>
)
AND <parent meets some condition>

Option 3 (my guess as the fastest)

SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE EXISTS
(
SELECT 1
FROM CHILD_TABLE ct
WHERE ct.PARENT_ID = pt.ID
AND <child meets condition>
)
AND <parent meets some condition>

Or is it something else entirely? Does it depend on the sizes of each table, or the complexity of the two conditions, or whether the data is sorted?

EDIT: Database is Oracle.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

恍梦境° 2024-12-12 22:40:29

第一个查询很慢,其他查询在大多数数据库上应该运行得很快。

在不了解数据库的情况下,很难说更多:

但是:count(*) 通常比 count(names_field) 更快并且永远不会慢
count(distinct (afield)) 很慢

或者完全是另外一回事?

这取决于数据库和数据库的确切版本。

是否取决于每个表的大小

是的,这起着很大的作用

或者两个条件的复杂性

可能

或者数据是否排序?

如果您想要快速选择,则必须对用于连接的所有字段建立索引。
where 子句中使用的所有字段都必须是索引的或低基数的。

The first query is slow, the others should run fast on most DB's.

Without knowing the DB it's hard to say more:

But: count(*) is often faster than count(names_field) and never slower
count(distinct (afield)) is slow

Or is it something else entirely?

That depends on the DB and the exact version of the DB.

Does it depend on the sizes of each table

Yes, that plays a big part

or the complexity of the two conditions

Possible

or whether the data is sorted?

If you want a fast select, all fields used to join must be indexed.
And all fields used in a where clause must either be indexed or low-cardinality.

就此别过 2024-12-12 22:40:29

对我来说,第一个似乎是最好的,因为它是最容易阅读的,但这显然不能回答你的问题。

您真正要做的是为每个查询生成执行计划并分析它们(我认为大多数流行的 DBMS 都有一个工具可以做到这一点)。它将为您提供每个查询的成本值。

如果您不能这样做,我想您可以多次运行查询并比较执行时间。

或者完全是另外一回事?是取决于每个表的大小,还是取决于两个条件的复杂程度,或者数据是否排序?

所有这些以及更多。

For me the first one seems the best since it's the easiest to read, but that obviously doesn't answer your question.

What you really have to do is generate execution plans for each of the queries and analyze them (I think most of the popular DBMS have a tool to do that). It will give you a cost value for each query.

If you can't do that I guess you could run the queries a bunch of times and compare the execution time.

Or is it something else entirely? Does it depend on the sizes of each table, or the complexity of the two conditions, or whether the data is sorted?

All of that and more.

空城之時有危險 2024-12-12 22:40:29

正如评论者所说,回答这个问题的最佳方法是运行查询和测量。

然而,一般来说,数据库引擎非常非常有效地优化连接 - 我很确定您会发现 3 个查询之间几乎没有区别,并且查询优化器完全有可能将它们全部转换为相同的基本查询(2 和 3)是等价的)。

到目前为止,对查询影响最大的是“子项满足某些条件”和“父项满足某些条件”子句。我会集中精力优化这一点。

Like the commenters say, the best way to answer this question is to run the queries and measure.

However, in general, database engines optimize joins very, very efficiently - I'm pretty sure you will find almost no difference between the 3 queries, and it's entirely possible the query optimizers will turn them all into the same basic query (2 and 3 are equivalent as it is).

By far the biggest impact on the query will be the "child meet some condition" and "parent meets some condition" clauses. I'd concentrate on optimizing this bit.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文