使用不带聚合的 GROUP BY 来检索不同的“最佳”的替代方案结果
我正在尝试从 SQL 表中检索“最佳”可能条目。
考虑一个包含电视节目的表: id、标题、剧集、is_hidef、is_verified 例如:
id title ep hidef verified
1 The Simpsons 1 True False
2 The Simpsons 1 True True
3 The Simpsons 1 True True
4 The Simpsons 2 False False
5 The Simpsons 2 True False
单个标题和剧集可能有重复的行,它们的布尔字段值可能不同,也可能没有。可能还有更多列包含附加信息,但这并不重要。
我想要一个结果集,为我提供每集的最佳行(因此 is_hidef 和 is_verified 在可能的情况下都是“true”)。对于被认为“相等”的行,我想要最新的行(自然排序,或按任意日期时间列排序)。
3 The Simpsons 1 True True
5 The Simpsons 2 True False
在过去,我会使用以下查询:
SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified
这适用于 MySQL 和 SQLite,但违反了 SQL 规范(GROUP BY 需要聚合等)。我真的没有兴趣再次听到为什么 MySQL 如此糟糕地允许这样做;但我非常有兴趣找到一个也适用于其他引擎的替代解决方案(如果你能给我 django ORM 代码,那就加分了)。
谢谢=)
I'm trying to retrieve the "Best" possible entry from an SQL table.
Consider a table containing tv shows:
id, title, episode, is_hidef, is_verified
eg:
id title ep hidef verified
1 The Simpsons 1 True False
2 The Simpsons 1 True True
3 The Simpsons 1 True True
4 The Simpsons 2 False False
5 The Simpsons 2 True False
There may be duplicate rows for a single title and episode which may or may not have different values for the boolean fields. There may be more columns containing additional info, but thats unimportant.
I want a result set that gives me the best row (so is_hidef and is_verified are both "true" where possible) for each episode. For rows considered "equal" I want the most recent row (natural ordering, or order by an abitrary datetime column).
3 The Simpsons 1 True True
5 The Simpsons 2 True False
In the past I would have used the following query:
SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified
This works under MySQL and SQLite, but goes against the SQL spec (GROUP BY requiring aggragates etc etc). I'm not really interested in hearing again why MySQL is so bad for allowing this; but I'm very interested in finding an alternative solution that will work on other engines too (bonus points if you can give me the django ORM code for it).
Thanks =)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在某种程度上类似于安多玛的,但这个确实有效。
第一级仲裁使用 *1 将位 (SQL Server) 或 MySQL 布尔值转换为整数值,然后添加列以生成“最佳”值。您可以给它们权重,例如,如果 hidef >验证,然后使用 hidef*2 + verify*1 ,它可以产生 3,2,1 或 0。
第二级在“最佳”场景中查找并提取最小 ID(或其他一些)决胜局栏)。这对于将多场比赛结果集减少到只有一条记录至关重要。
在这种特殊情况(表模式)中,外部选择使用直接键来检索匹配的记录。
In some way similar to Andomar's but this one really works.
The first level tiebreak converts bits (SQL Server) or MySQL boolean to an integer value using *1, and the columns are added to produce the "best" value. You can give them weights, e.g. if hidef > verified, then use hidef*2 + verified*1 which can produce 3,2,1 or 0.
The 2nd level looks among those of the "best" scenario and extracts the minimum ID (or some other tie-break column). This is essential to reduce a multi-match result set to just one record.
In this particular case (table schema), the outer select uses the direct key to retrieve the matched records.
这基本上是 分组最大关联问题。我认为没有一个符合 SQL 标准的解决方案。像这样的解决方案会表现得很好:
但考虑到可读性的成本,我会坚持你原来的查询。
This is basically a form of the groupwise-maximum-with-ties problem. I don't think there is a SQL standard compliant solution. A solution like this would perform nicely:
But given the cost of readability, I would stick with your original query.