是否可以在聚合查询中向 MAX() 调用添加条件?

发布于 2024-12-21 14:53:28 字数 2694 浏览 4 评论 0原文

背景

我的典型用例:

# Table

id     category    dataUID
---------------------------
0         A        (NULL)
1         B        (NULL)
2         C        text1
3         C        text1
4         D        text2
5         D        text3

# Query

SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`

这很好;它会删除正在处理的记录集中的所有“重复类别”,为每个类别提供“最高”ID。

然后我可以继续使用此 ID 再次提取所有数据:

# Query

SELECT * FROM `table` JOIN (
   SELECT MAX(`id`) AS `id` FROM `table`
   GROUP BY `category`
) _ USING(`id`)

# Result

id     category    dataUID
---------------------------
0         A        (NULL)
1         B        (NULL)
3         C        text1
5         D        text3

请注意,这与以下内容不同:

SELECT MAX(`id`) AS `id`, `category`, `dataUID` FROM `table`
GROUP BY `category`

Per 文档

在标准 SQL 中,包含 GROUP BY 子句的查询不能引用 选择列表中未命名的非聚合列 GROUP BY 子句。例如,这个查询在标准 SQL 中是非法的 因为选择列表中的名称列不会出现在 分组依据:

SELECT o.custid, c.name, MAX(o. payment) FROM 订单 AS o, 客户
AS c WHERE o.custid = c.custid GROUP BY o.custid;

为了使查询合法,必须从查询中省略名称列 选择列表或在 GROUP BY 子句中命名。

MySQL扩展了GROUP BY的使用,使得选择列表可以引用 未在 GROUP BY 子句中命名的非聚合列。 这意味着 前面的查询在 MySQL 中是合法的。您可以使用此功能 通过避免不必要的列排序来获得更好的性能 分组。 但是,这主要是当每个值中的所有值 未在 GROUP BY 中命名的非聚合列对于每个列都是相同的 组。

[..]

此扩展假定未分组的列将具有相同的分组值。否则,结果是不确定的。

因此,我会得到 dataUID 的未指定值 - 例如,text2 text3 表示 id 5 的结果。

在我的实际案例中,这实际上是其他领域的问题;碰巧的是,对于 dataUID 列,通常我并不真正关心我得到哪个值。


问题

然而!

如果给定 category 的任何行具有 NULL dataUID,并且至少另一行具有非 NULL< /code> dataUID,我希望 MAX 忽略 NULL 的。

所以:

id     category    dataUID
---------------------------
4         D        text2
5         D        (NULL)

目前,由于我选择了 ID 最大的行,所以我得到:

5         D        (NULL)

但是,因为 dataUIDNULL,所以我想要:

4         D        text2

如何我明白了吗?如何在聚合 MAX 的使用中添加条件逻辑?


我想也许可以将 MAX 交给一个元组并提取 id之后从中取出:

GET_SECOND_PART_SOMEHOW(MAX((IF(`dataUID` NOT NULL, 1, 0), `id`))) AS `id`

但我不认为 MAX 会接受这样的任意表达式,更不用说元组了,而且我不知道如何在之后检索元组的第二部分-事实。

Background

My typical use case:

# Table

id     category    dataUID
---------------------------
0         A        (NULL)
1         B        (NULL)
2         C        text1
3         C        text1
4         D        text2
5         D        text3

# Query

SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`

This is fine; it will strip out any "duplicate categories" in the recordset that's being worked on, giving me the "highest" ID for each category.

I can then go on use this ID to pull out all the data again:

# Query

SELECT * FROM `table` JOIN (
   SELECT MAX(`id`) AS `id` FROM `table`
   GROUP BY `category`
) _ USING(`id`)

# Result

id     category    dataUID
---------------------------
0         A        (NULL)
1         B        (NULL)
3         C        text1
5         D        text3

Note that this is not the same as:

SELECT MAX(`id`) AS `id`, `category`, `dataUID` FROM `table`
GROUP BY `category`

Per the documentation:

In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the name column in the select list does not appear in the
GROUP BY:

SELECT o.custid, c.name, MAX(o.payment)   FROM orders AS o, customers
AS c   WHERE o.custid = c.custid   GROUP BY o.custid;

For the query to be legal, the name column must be omitted from the
select list or named in the GROUP BY clause.

MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause.
This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group.

[..]

This extension assumes that the nongrouped columns will have the same group-wise values. Otherwise, the result is indeterminate.

So I'd get an unspecified value for dataUID — as an example, either text2 or text3 for result with id 5.

This is actually a problem for other fields in my real case; as it happens, for the dataUID column specifically, generally I don't really care which value I get.


Problem

However!

If any of the rows for a given category has a NULL dataUID, and at least one other row has a non-NULL dataUID, I'd like MAX to ignore the NULL ones.

So:

id     category    dataUID
---------------------------
4         D        text2
5         D        (NULL)

At present, since I pick out the row with the maximum ID, I get:

5         D        (NULL)

But, because the dataUID is NULL, instead I want:

4         D        text2

How can I get this? How can I add conditional logic to the use of aggregate MAX?


I thought of maybe handing MAX a tuple and pulling the id out from it afterwards:

GET_SECOND_PART_SOMEHOW(MAX((IF(`dataUID` NOT NULL, 1, 0), `id`))) AS `id`

But I don't think MAX will accept arbitrary expressions like that, let alone tuples, and I don't know how I'd retrieve the second part of the tuple after-the-fact.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

诗酒趁年少 2024-12-28 14:53:29

@ypercube 的答案稍作调整。要获取您可以使用的id

SELECT COALESCE(MAX(CASE
                      WHEN dataUID IS NOT NULL THEN id
                    END), MAX(id)) AS id
FROM   table
GROUP  BY category  

,然后将其插入join

slight tweak to @ypercube's answer. To get the ids you can use

SELECT COALESCE(MAX(CASE
                      WHEN dataUID IS NOT NULL THEN id
                    END), MAX(id)) AS id
FROM   table
GROUP  BY category  

And then plug that into a join

女中豪杰 2024-12-28 14:53:29

最终,这比我想象的要容易,因为事实证明 MySQL接受 MAX 内的任意表达式。

我可以通过在 id 中注入一个前导字符作为排序提示来获得我想要的排序:

SUBSTRING(MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`))) FROM 2)

演练:

id     category    dataUID    IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)
--------------------------------------------------------------------------------------
0         A        (NULL)                             a0
1         B        (NULL)                             a1
2         C        text1                              b2
3         C        text1                              b3
4         D        text2                              b4
5         D        (NULL)                             a5

所以:

SELECT
   `category`, MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)) AS `max_id_with_hint`
FROM `table`
GROUP BY `category`

category   max_id_with_hint
------------------------------
A          a0
B          a1
C          b3
D          b4

然后再次删除排序提示就很简单了。

特别感谢 @JlStone 通过 COALESCE 为我设置了在 MAX 调用中嵌入表达式的路径,并直接操作提供给 MAX< 的值/代码>。

This was easier than I thought, in the end, because it turns out MySQL will accept an arbitrary expression inside MAX.

I can get the ordering I want by injecting a leading character into id to serve as an ordering hint:

SUBSTRING(MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`))) FROM 2)

Walk-through:

id     category    dataUID    IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)
--------------------------------------------------------------------------------------
0         A        (NULL)                             a0
1         B        (NULL)                             a1
2         C        text1                              b2
3         C        text1                              b3
4         D        text2                              b4
5         D        (NULL)                             a5

So:

SELECT
   `category`, MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)) AS `max_id_with_hint`
FROM `table`
GROUP BY `category`

category   max_id_with_hint
------------------------------
A          a0
B          a1
C          b3
D          b4

It's then a simple matter to chop the ordering hint off again.

Thanks in particular to @JlStone for setting me, via COALESCE, on the path to embedding expressions inside the call to MAX and directly manipulating the values supplied to MAX.

静待花开 2024-12-28 14:53:29

据我所知,您可以使用 COALESCE分组语句内部。例如。

SELECT MAX(COALESCE(`id`,1)) ...

嗯,看来我第一次读得很快。我想也许你想要这样的东西?

SELECT * FROM `table` JOIN (
   SELECT MAX(`id`) AS `id` FROM `table`
   WHERE `dataUID` IS NOT NULL
   GROUP BY `category`
) _ USING(`id`)

或者也许

SELECT MAX(`id`) AS `id`, 
  COALESCE (`dataUID`, 0) as `dataUID`
FROM `table`
GROUP BY `category`

From what I can remember you can use COALESCE inside of grouping statements. For example.

SELECT MAX(COALESCE(`id`,1)) ...

hm seems I read to quickly the first time. I think maybe you want something like this?

SELECT * FROM `table` JOIN (
   SELECT MAX(`id`) AS `id` FROM `table`
   WHERE `dataUID` IS NOT NULL
   GROUP BY `category`
) _ USING(`id`)

or perhaps

SELECT MAX(`id`) AS `id`, 
  COALESCE (`dataUID`, 0) as `dataUID`
FROM `table`
GROUP BY `category`
七色彩虹 2024-12-28 14:53:29
select * 
from t1
join (
  select max(id) as id,
    max(if(dataGUID is NULL, NULL, id)) as fallbackid,
    category
  from t1 group by category) as ids
on if(ids.id = fallbackid or fallbackid is null, id, fallbackid) = t1.id; 
select * 
from t1
join (
  select max(id) as id,
    max(if(dataGUID is NULL, NULL, id)) as fallbackid,
    category
  from t1 group by category) as ids
on if(ids.id = fallbackid or fallbackid is null, id, fallbackid) = t1.id; 
琉璃梦幻 2024-12-28 14:53:29
SELECT t.*
FROM table AS t
  JOIN
  ( SELECT DISTINCT category
    FROM table
  ) AS tdc
  ON t.id = 
  COALESCE(
    ( SELECT MAX(id) AS id 
      FROM table 
      WHERE category = tdc.category
        AND dataUID IS NOT NULL
    ) 
  , ( SELECT MAX(id) AS id 
      FROM table 
      WHERE category = tdc.category
        AND dataUID IS NULL
    ) 
  )
SELECT t.*
FROM table AS t
  JOIN
  ( SELECT DISTINCT category
    FROM table
  ) AS tdc
  ON t.id = 
  COALESCE(
    ( SELECT MAX(id) AS id 
      FROM table 
      WHERE category = tdc.category
        AND dataUID IS NOT NULL
    ) 
  , ( SELECT MAX(id) AS id 
      FROM table 
      WHERE category = tdc.category
        AND dataUID IS NULL
    ) 
  )
别忘他 2024-12-28 14:53:29

您需要子句 OVER

SELECT id, category,dataUID 
FROM
 (
    SELECT ROW_NUMBER() OVER (PARTITION BY category ORDER BY id desc, dataUID desc ) rn, 
    id, category,dataUID FROM table 
 ) q
WHERE rn=1

考虑按 desc 排序最后会移动空值。

you need clause OVER

SELECT id, category,dataUID 
FROM
 (
    SELECT ROW_NUMBER() OVER (PARTITION BY category ORDER BY id desc, dataUID desc ) rn, 
    id, category,dataUID FROM table 
 ) q
WHERE rn=1

Consider that sorting by desc moves null values at last.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文