使用不带聚合的 GROUP BY 来检索不同的“最佳”的替代方案结果

发布于 2024-10-12 05:30:26 字数 869 浏览 8 评论 0原文

我正在尝试从 SQL 表中检索“最佳”可能条目。

考虑一个包含电视节目的表： id、标题、剧集、is_hidef、is_verified 例如：

id title         ep hidef verified
1  The Simpsons  1  True  False
2  The Simpsons  1  True  True
3  The Simpsons  1  True  True
4  The Simpsons  2  False False
5  The Simpsons  2  True  False

单个标题和剧集可能有重复的行，它们的布尔字段值可能不同，也可能没有。可能还有更多列包含附加信息，但这并不重要。

我想要一个结果集，为我提供每集的最佳行（因此 is_hidef 和 is_verified 在可能的情况下都是“true”）。对于被认为“相等”的行，我想要最新的行（自然排序，或按任意日期时间列排序）。

3  The Simpsons  1  True  True
5  The Simpsons  2  True  False

在过去，我会使用以下查询：

SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified

这适用于 MySQL 和 SQLite，但违反了 SQL 规范（GROUP BY 需要聚合等）。我真的没有兴趣再次听到为什么 MySQL 如此糟糕地允许这样做；但我非常有兴趣找到一个也适用于其他引擎的替代解决方案（如果你能给我 django ORM 代码，那就加分了）。

谢谢=）

原文

I'm trying to retrieve the "Best" possible entry from an SQL table.

Consider a table containing tv shows:
id, title, episode, is_hidef, is_verified
eg:

id title         ep hidef verified
1  The Simpsons  1  True  False
2  The Simpsons  1  True  True
3  The Simpsons  1  True  True
4  The Simpsons  2  False False
5  The Simpsons  2  True  False

There may be duplicate rows for a single title and episode which may or may not have different values for the boolean fields. There may be more columns containing additional info, but thats unimportant.

I want a result set that gives me the best row (so is_hidef and is_verified are both "true" where possible) for each episode. For rows considered "equal" I want the most recent row (natural ordering, or order by an abitrary datetime column).

3  The Simpsons  1  True  True
5  The Simpsons  2  True  False

In the past I would have used the following query:

SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified

This works under MySQL and SQLite, but goes against the SQL spec (GROUP BY requiring aggragates etc etc). I'm not really interested in hearing again why MySQL is so bad for allowing this; but I'm very interested in finding an alternative solution that will work on other engines too (bonus points if you can give me the django ORM code for it).

Thanks =)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

日暮斜阳 2024-10-19 05:30:26

在某种程度上类似于安多玛的，但这个确实有效。

select C.*
FROM
(
    select min(ID) minid
    from (
        select distinct title, ep, max(hidef*1 + verified*1) ord
        from tbl
        group by title, ep) a
    inner join tbl b on b.title=a.title and b.ep=a.ep and b.hidef*1 + b.verified*1 = a.ord
    group by a.title, a.ep, a.ord
) D inner join tbl C on D.minid = C.id

第一级仲裁使用 *1 将位 (SQL Server) 或 MySQL 布尔值转换为整数值，然后添加列以生成“最佳”值。您可以给它们权重，例如，如果 hidef >验证，然后使用 hidef*2 + verify*1 ，它可以产生 3,2,1 或 0。

第二级在“最佳”场景中查找并提取最小 ID（或其他一些）决胜局栏）。这对于将多场比赛结果集减少到只有一条记录至关重要。

在这种特殊情况（表模式）中，外部选择使用直接键来检索匹配的记录。

In some way similar to Andomar's but this one really works.

select C.*
FROM
(
    select min(ID) minid
    from (
        select distinct title, ep, max(hidef*1 + verified*1) ord
        from tbl
        group by title, ep) a
    inner join tbl b on b.title=a.title and b.ep=a.ep and b.hidef*1 + b.verified*1 = a.ord
    group by a.title, a.ep, a.ord
) D inner join tbl C on D.minid = C.id

The first level tiebreak converts bits (SQL Server) or MySQL boolean to an integer value using *1, and the columns are added to produce the "best" value. You can give them weights, e.g. if hidef > verified, then use hidef*2 + verified*1 which can produce 3,2,1 or 0.

The 2nd level looks among those of the "best" scenario and extracts the minimum ID (or some other tie-break column). This is essential to reduce a multi-match result set to just one record.

In this particular case (table schema), the outer select uses the direct key to retrieve the matched records.

回复收藏 0 原文

乱了心跳 2024-10-19 05:30:26

这基本上是分组最大关联问题。我认为没有一个符合 SQL 标准的解决方案。像这样的解决方案会表现得很好：

SELECT  s2.id
,       s2.title
,       s2.episode
,       s2.is_hidef
,       s2.is_verified
FROM    (
        select  distinct title
        ,       episode
        from    shows
        where   title = 'The Simpsons' 
        ) s1
JOIN    shows s2
ON      s2.id = 
        (
        select  id
        from    shows s3
        where   s3.title = s1.title
                and s3.episode = s1.episode
        order by
                s3.is_hidef DESC
        ,       s3.is_verified DESC
        limit   1
        )

但考虑到可读性的成本，我会坚持你原来的查询。

This is basically a form of the groupwise-maximum-with-ties problem. I don't think there is a SQL standard compliant solution. A solution like this would perform nicely:

SELECT  s2.id
,       s2.title
,       s2.episode
,       s2.is_hidef
,       s2.is_verified
FROM    (
        select  distinct title
        ,       episode
        from    shows
        where   title = 'The Simpsons' 
        ) s1
JOIN    shows s2
ON      s2.id = 
        (
        select  id
        from    shows s3
        where   s3.title = s1.title
                and s3.episode = s1.episode
        order by
                s3.is_hidef DESC
        ,       s3.is_verified DESC
        limit   1
        )

But given the cost of readability, I would stick with your original query.

回复收藏 0 原文

~没有更多了~