在 group by 子句中包含非聚合列（有一点小问题）

发布于 2024-12-26 12:32:33 字数 2028 浏览 2 评论 0原文

我有一个看起来像这样的表：

timestamp                value           person
===============================================
2010-01-12 00:00:00       33              emp1
2010-01-12 11:00:00       22              emp1
2010-01-12 09:00:00       16              emp2
2010-01-12 08:00:00       16              emp2
2010-01-12 12:12:00       45              emp3
2010-01-12 13:44:00       64              emp4
2010-01-12 06:00:00       33              emp1
2010-01-12 15:00:00       12              emp5

我想找到与每个人相关的最大值。明显的查询是：

select person,max(value) from table group by person

现在我想包含与每个最大值（值）关联的时间戳。我无法在上面的查询中使用时间戳列，因为众所周知，它不会出现在 group by 子句中。所以我写了这个：

select x.timestamp,x.value,x.person from table as x,
(select person,max(value) as maxvalue from table group by person order by maxvalue 
 desc) as y
where x.person = y.person
and x.value = y.maxvalue

这在一定程度上是有效的。我现在看到：

timestamp                value           person
===============================================
2010-01-12 13:44:00       64              emp4
2010-01-12 12:12:00       45              emp3
2010-01-12 06:00:00       33              emp1
2010-01-12 00:00:00       33              emp1
2010-01-12 08:00:00       16              emp2
2010-01-12 09:00:00       16              emp2
2010-01-12 15:00:00       12              emp5

问题是现在我得到了 emp1 和 emp2 的所有条目，它们最终具有相同的 max(value)。

假设在 emp1 和 emp2 中，我只想查看具有最新时间戳的条目。 IOW，我想要这个：

timestamp                value           person
===============================================
2010-01-12 13:44:00       64              emp4
2010-01-12 12:12:00       45              emp3
2010-01-12 06:00:00       33              emp1
2010-01-12 09:00:00       16              emp2
2010-01-12 15:00:00       12              emp5

我必须编写什么样的查询？是否可以扩展我编写的嵌套查询以实现我想要的效果，或者是否必须从头开始重写所有内容？

如果它很重要的话，因为我使用的是 Sqlite，时间戳实际上存储为儒略日。我使用 datetime() 函数将它们转换回每个查询中的字符串表示形式。

原文

I have a table that looks like something like this:

timestamp                value           person
===============================================
2010-01-12 00:00:00       33              emp1
2010-01-12 11:00:00       22              emp1
2010-01-12 09:00:00       16              emp2
2010-01-12 08:00:00       16              emp2
2010-01-12 12:12:00       45              emp3
2010-01-12 13:44:00       64              emp4
2010-01-12 06:00:00       33              emp1
2010-01-12 15:00:00       12              emp5

I wanted to find the maximum value associated with each person. The obvious query was:

select person,max(value) from table group by person

Now I wanted to include the timestamp associated with each max(value). I could not use timestamp column in the above query because as everyone knows, it won't appear in the group by clause. So I wrote this instead:

select x.timestamp,x.value,x.person from table as x,
(select person,max(value) as maxvalue from table group by person order by maxvalue 
 desc) as y
where x.person = y.person
and x.value = y.maxvalue

This works -- to an extent. I now see:

timestamp                value           person
===============================================
2010-01-12 13:44:00       64              emp4
2010-01-12 12:12:00       45              emp3
2010-01-12 06:00:00       33              emp1
2010-01-12 00:00:00       33              emp1
2010-01-12 08:00:00       16              emp2
2010-01-12 09:00:00       16              emp2
2010-01-12 15:00:00       12              emp5

The problem is now I get all the entries for emp1 and emp2 that ends up with the same max(value).

Suppose among emp1 and emp2, I only want to see the entry with the latest timestamp. IOW, I want this:

timestamp                value           person
===============================================
2010-01-12 13:44:00       64              emp4
2010-01-12 12:12:00       45              emp3
2010-01-12 06:00:00       33              emp1
2010-01-12 09:00:00       16              emp2
2010-01-12 15:00:00       12              emp5

What kind of query would I have to write? Is it possible to extend the nested query I wrote to achieve what I want or does one have to rewrite everything from the scratch?

If its important, because I am using Sqlite, timestamps are actually stored as julian days. I use the datetime() function to convert them back to a string representation in every query.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

魂牵梦绕锁你心扉 2025-01-02 12:32:33

您已经快到了：

SELECT max(x.timestamp) AS timestamp, x.value, x.person
     , y.max_value, y.ct_value, y.avg_value
FROM   table AS x
JOIN  (
    SELECT person
         , max(value) as max_value
         , count(value) as ct_value
         , avg(value) as avg_value
    FROM   table
    GROUP  BY person
    ) AS y ON (x.person, x.value) = (y.person, y.max_value)
GROUP BY x.person, x.value, y.max_value, y.ct_value, y.avg_value
-- ORDER  BY x.person, x.value

您无法在同一个嵌套查询中计算 max(x.timestamp)，因为您不需要每个人的绝对最大值，而是需要伴随最大值的值。因此，您必须在下一个查询级别上聚合另一个时间。

在将其转换为字符串表示形式之前计算 max(x.timestamp) - 尽管您的格式也会正确排序。但这应该表现得更好。

请注意我如何将使用 where 条件的交叉连接转换为使用（简化的）连接条件的[内部]连接。做同样的事情，只是更像SQL标准的规范方式并且更具可读性。

所有这些都可以通过窗口函数（max() 和 first_value()）在一个查询级别中完成，这些函数在所有较大的 RDBMS（MYSQL 除外）中实现，但在 SQLite 中则不然。

编辑

在评论中请求后包含其他聚合。

You were almost there:

SELECT max(x.timestamp) AS timestamp, x.value, x.person
     , y.max_value, y.ct_value, y.avg_value
FROM   table AS x
JOIN  (
    SELECT person
         , max(value) as max_value
         , count(value) as ct_value
         , avg(value) as avg_value
    FROM   table
    GROUP  BY person
    ) AS y ON (x.person, x.value) = (y.person, y.max_value)
GROUP BY x.person, x.value, y.max_value, y.ct_value, y.avg_value
-- ORDER  BY x.person, x.value

You cannot compute max(x.timestamp) in the same nested query, because you don't want the absolute maximum per person, but the one accompanying the maximum value. So you have to aggregate another time on the next query level.

Compute max(x.timestamp) before you convert it to its string representation - though your format would sort correctly, too. But that that should perform better.

Note how I transformed your cross join with where conditions to an [inner] join with a (simplified) join condition. Does the same, just more like the canonical way of the SQL standard and more readable.

All of this could be done in one query level with window functions (max() and first_value()), which are implemented in all the bigger RDBMS (except MYSQL), but not in SQLite.