是否可以在聚合查询中向 MAX() 调用添加条件?
背景
我的典型用例:
# Table
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
2 C text1
3 C text1
4 D text2
5 D text3
# Query
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
这很好;它会删除正在处理的记录集中的所有“重复类别”,为每个类别提供“最高”ID。
然后我可以继续使用此 ID 再次提取所有数据:
# Query
SELECT * FROM `table` JOIN (
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
) _ USING(`id`)
# Result
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
3 C text1
5 D text3
请注意,这与以下内容不同:
SELECT MAX(`id`) AS `id`, `category`, `dataUID` FROM `table`
GROUP BY `category`
Per 文档:
在标准 SQL 中,包含 GROUP BY 子句的查询不能引用 选择列表中未命名的非聚合列 GROUP BY 子句。例如,这个查询在标准 SQL 中是非法的 因为选择列表中的名称列不会出现在 分组依据:
SELECT o.custid, c.name, MAX(o. payment) FROM 订单 AS o, 客户 AS c WHERE o.custid = c.custid GROUP BY o.custid;
为了使查询合法,必须从查询中省略名称列 选择列表或在 GROUP BY 子句中命名。
MySQL扩展了GROUP BY的使用,使得选择列表可以引用 未在 GROUP BY 子句中命名的非聚合列。 这意味着 前面的查询在 MySQL 中是合法的。您可以使用此功能 通过避免不必要的列排序来获得更好的性能 分组。 但是,这主要是当每个值中的所有值 未在 GROUP BY 中命名的非聚合列对于每个列都是相同的 组。
[..]
此扩展假定未分组的列将具有相同的分组值。否则,结果是不确定的。
因此,我会得到 dataUID
的未指定值 - 例如,text2
或 text3
表示 id
5
的结果。
在我的实际案例中,这实际上是其他领域的问题;碰巧的是,对于 dataUID
列,通常我并不真正关心我得到哪个值。
问题
然而!
如果给定 category
的任何行具有 NULL
dataUID
,并且至少另一行具有非 NULL< /code>
dataUID
,我希望 MAX
忽略 NULL
的。
所以:
id category dataUID
---------------------------
4 D text2
5 D (NULL)
目前,由于我选择了 ID 最大的行,所以我得到:
5 D (NULL)
但是,因为 dataUID
是 NULL
,所以我想要:
4 D text2
如何我明白了吗?如何在聚合 MAX
的使用中添加条件逻辑?
我想也许可以将 MAX
交给一个元组并提取 id
之后从中取出:
GET_SECOND_PART_SOMEHOW(MAX((IF(`dataUID` NOT NULL, 1, 0), `id`))) AS `id`
但我不认为 MAX 会接受这样的任意表达式,更不用说元组了,而且我不知道如何在之后检索元组的第二部分-事实。
Background
My typical use case:
# Table
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
2 C text1
3 C text1
4 D text2
5 D text3
# Query
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
This is fine; it will strip out any "duplicate categories" in the recordset that's being worked on, giving me the "highest" ID for each category.
I can then go on use this ID to pull out all the data again:
# Query
SELECT * FROM `table` JOIN (
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
) _ USING(`id`)
# Result
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
3 C text1
5 D text3
Note that this is not the same as:
SELECT MAX(`id`) AS `id`, `category`, `dataUID` FROM `table`
GROUP BY `category`
Per the documentation:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the name column in the select list does not appear in the
GROUP BY:SELECT o.custid, c.name, MAX(o.payment) FROM orders AS o, customers AS c WHERE o.custid = c.custid GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the
select list or named in the GROUP BY clause.MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group.[..]
This extension assumes that the nongrouped columns will have the same group-wise values. Otherwise, the result is indeterminate.
So I'd get an unspecified value for dataUID
— as an example, either text2
or text3
for result with id
5
.
This is actually a problem for other fields in my real case; as it happens, for the dataUID
column specifically, generally I don't really care which value I get.
Problem
However!
If any of the rows for a given category
has a NULL
dataUID
, and at least one other row has a non-NULL
dataUID
, I'd like MAX
to ignore the NULL
ones.
So:
id category dataUID
---------------------------
4 D text2
5 D (NULL)
At present, since I pick out the row with the maximum ID, I get:
5 D (NULL)
But, because the dataUID
is NULL
, instead I want:
4 D text2
How can I get this? How can I add conditional logic to the use of aggregate MAX
?
I thought of maybe handing MAX
a tuple and pulling the id
out from it afterwards:
GET_SECOND_PART_SOMEHOW(MAX((IF(`dataUID` NOT NULL, 1, 0), `id`))) AS `id`
But I don't think MAX
will accept arbitrary expressions like that, let alone tuples, and I don't know how I'd retrieve the second part of the tuple after-the-fact.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
对 @ypercube 的答案稍作调整。要获取您可以使用的
id
,然后将其插入
join
slight tweak to @ypercube's answer. To get the
id
s you can useAnd then plug that into a
join
最终,这比我想象的要容易,因为事实证明 MySQL将接受
MAX
内的任意表达式。我可以通过在 id 中注入一个前导字符作为排序提示来获得我想要的排序:
演练:
所以:
然后再次删除排序提示就很简单了。
特别感谢 @JlStone 通过
COALESCE
为我设置了在MAX
调用中嵌入表达式的路径,并直接操作提供给MAX< 的值/代码>。
This was easier than I thought, in the end, because it turns out MySQL will accept an arbitrary expression inside
MAX
.I can get the ordering I want by injecting a leading character into
id
to serve as an ordering hint:Walk-through:
So:
It's then a simple matter to chop the ordering hint off again.
Thanks in particular to @JlStone for setting me, via
COALESCE
, on the path to embedding expressions inside the call toMAX
and directly manipulating the values supplied toMAX
.据我所知,您可以使用 COALESCE分组语句内部。例如。
嗯,看来我第一次读得很快。我想也许你想要这样的东西?
或者也许
From what I can remember you can use COALESCE inside of grouping statements. For example.
hm seems I read to quickly the first time. I think maybe you want something like this?
or perhaps
您需要子句 OVER
考虑按 desc 排序最后会移动空值。
you need clause OVER
Consider that sorting by desc moves null values at last.