在 group by 子句中包含非聚合列(有一点小问题)
我有一个看起来像这样的表:
timestamp value person
===============================================
2010-01-12 00:00:00 33 emp1
2010-01-12 11:00:00 22 emp1
2010-01-12 09:00:00 16 emp2
2010-01-12 08:00:00 16 emp2
2010-01-12 12:12:00 45 emp3
2010-01-12 13:44:00 64 emp4
2010-01-12 06:00:00 33 emp1
2010-01-12 15:00:00 12 emp5
我想找到与每个人相关的最大值。明显的查询是:
select person,max(value) from table group by person
现在我想包含与每个最大值(值)关联的时间戳。我无法在上面的查询中使用时间戳列,因为众所周知,它不会出现在 group by 子句中。所以我写了这个:
select x.timestamp,x.value,x.person from table as x,
(select person,max(value) as maxvalue from table group by person order by maxvalue
desc) as y
where x.person = y.person
and x.value = y.maxvalue
这在一定程度上是有效的。我现在看到:
timestamp value person
===============================================
2010-01-12 13:44:00 64 emp4
2010-01-12 12:12:00 45 emp3
2010-01-12 06:00:00 33 emp1
2010-01-12 00:00:00 33 emp1
2010-01-12 08:00:00 16 emp2
2010-01-12 09:00:00 16 emp2
2010-01-12 15:00:00 12 emp5
问题是现在我得到了 emp1 和 emp2 的所有条目,它们最终具有相同的 max(value)。
假设在 emp1 和 emp2 中,我只想查看具有最新时间戳的条目。 IOW,我想要这个:
timestamp value person
===============================================
2010-01-12 13:44:00 64 emp4
2010-01-12 12:12:00 45 emp3
2010-01-12 06:00:00 33 emp1
2010-01-12 09:00:00 16 emp2
2010-01-12 15:00:00 12 emp5
我必须编写什么样的查询?是否可以扩展我编写的嵌套查询以实现我想要的效果,或者是否必须从头开始重写所有内容?
如果它很重要的话,因为我使用的是 Sqlite,时间戳实际上存储为儒略日。我使用 datetime() 函数将它们转换回每个查询中的字符串表示形式。
I have a table that looks like something like this:
timestamp value person
===============================================
2010-01-12 00:00:00 33 emp1
2010-01-12 11:00:00 22 emp1
2010-01-12 09:00:00 16 emp2
2010-01-12 08:00:00 16 emp2
2010-01-12 12:12:00 45 emp3
2010-01-12 13:44:00 64 emp4
2010-01-12 06:00:00 33 emp1
2010-01-12 15:00:00 12 emp5
I wanted to find the maximum value associated with each person. The obvious query was:
select person,max(value) from table group by person
Now I wanted to include the timestamp associated with each max(value). I could not use timestamp column in the above query because as everyone knows, it won't appear in the group by clause. So I wrote this instead:
select x.timestamp,x.value,x.person from table as x,
(select person,max(value) as maxvalue from table group by person order by maxvalue
desc) as y
where x.person = y.person
and x.value = y.maxvalue
This works -- to an extent. I now see:
timestamp value person
===============================================
2010-01-12 13:44:00 64 emp4
2010-01-12 12:12:00 45 emp3
2010-01-12 06:00:00 33 emp1
2010-01-12 00:00:00 33 emp1
2010-01-12 08:00:00 16 emp2
2010-01-12 09:00:00 16 emp2
2010-01-12 15:00:00 12 emp5
The problem is now I get all the entries for emp1 and emp2 that ends up with the same max(value).
Suppose among emp1 and emp2, I only want to see the entry with the latest timestamp. IOW, I want this:
timestamp value person
===============================================
2010-01-12 13:44:00 64 emp4
2010-01-12 12:12:00 45 emp3
2010-01-12 06:00:00 33 emp1
2010-01-12 09:00:00 16 emp2
2010-01-12 15:00:00 12 emp5
What kind of query would I have to write? Is it possible to extend the nested query I wrote to achieve what I want or does one have to rewrite everything from the scratch?
If its important, because I am using Sqlite, timestamps are actually stored as julian days. I use the datetime() function to convert them back to a string representation in every query.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您已经快到了:
您无法在同一个嵌套查询中计算
max(x.timestamp)
,因为您不需要每个人的绝对最大值,而是需要伴随最大值的值。因此,您必须在下一个查询级别上聚合另一个时间。在将其转换为字符串表示形式之前计算 max(x.timestamp) - 尽管您的格式也会正确排序。但这应该表现得更好。
请注意我如何将使用 where 条件的交叉连接转换为使用(简化的)连接条件的[内部]连接。做同样的事情,只是更像SQL标准的规范方式并且更具可读性。
所有这些都可以通过窗口函数(
max()
和first_value()
)在一个查询级别中完成,这些函数在所有较大的 RDBMS(MYSQL 除外)中实现,但在 SQLite 中则不然。编辑
在评论中请求后包含其他聚合。
You were almost there:
You cannot compute
max(x.timestamp)
in the same nested query, because you don't want the absolute maximum per person, but the one accompanying the maximum value. So you have to aggregate another time on the next query level.Compute
max(x.timestamp)
before you convert it to its string representation - though your format would sort correctly, too. But that that should perform better.Note how I transformed your cross join with where conditions to an [inner] join with a (simplified) join condition. Does the same, just more like the canonical way of the SQL standard and more readable.
All of this could be done in one query level with window functions (
max()
andfirst_value()
), which are implemented in all the bigger RDBMS (except MYSQL), but not in SQLite.Edit
Included additional aggregates after request in comment.