MySQL:选择 N 行,但一列中只有唯一值

发布于 2024-07-06 20:37:24 字数 1378 浏览 13 评论 0原文

给定这个数据集:

ID  Name            City            Birthyear
1   Egon Spengler   New York        1957
2   Mac Taylor      New York        1955
3   Sarah Connor    Los Angeles     1959
4   Jean-Luc Picard La Barre        2305
5   Ellen Ripley    Nostromo        2092
6   James T. Kirk   Riverside       2233
7   Henry Jones     Chicago         1899

我需要找到 3 个最年长的人,但每个城市只需要一个。

如果只是三个最老的,那就是...

  • 亨利·琼斯 / 芝加哥
  • Mac Taylor / 纽约
  • Egon Spengler / 纽约

然而,由于 Egon Spengler 和 Mac Taylor 都位于纽约,Egon Spengler 将退出,而下一位(莎拉·康纳/洛杉矶)将代替。

有什么优雅的解决方案吗?

更新:

目前 PConroy 的一个变体是最好/最快的解决方案:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

他的原始查询“IN”对于大数据集非常慢(5 分钟后中止),但是将子查询移动到 JOIN 会加快速度涨了很多。 大约需要 0.15 秒。 我的测试环境中有 1 mio 行。 我有一个关于“城市,出生年份”的索引,还有一个关于“出生年份”的索引。

注意:这与...

Given this data set:

ID  Name            City            Birthyear
1   Egon Spengler   New York        1957
2   Mac Taylor      New York        1955
3   Sarah Connor    Los Angeles     1959
4   Jean-Luc Picard La Barre        2305
5   Ellen Ripley    Nostromo        2092
6   James T. Kirk   Riverside       2233
7   Henry Jones     Chicago         1899

I need to find the 3 oldest persons, but only one of every city.

If it would just be the three oldest, it would be...

  • Henry Jones / Chicago
  • Mac Taylor / New York
  • Egon Spengler / New York

However since both Egon Spengler and Mac Taylor are located in New York, Egon Spengler would drop out and the next one (Sarah Connor / Los Angeles) would come in instead.

Any elegant solutions?

Update:

Currently a variation of PConroy is the best/fastest solution:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

His original query with "IN" is extremly slow with big datasets (aborted after 5 minutes), but moving the subquery to a JOIN will speed it up a lot. It took about 0.15 seconds for approx. 1 mio rows in my test environment. I have an index on "City, Birthyear" and a second one just on "Birthyear".

Note: This is related to...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

圈圈圆圆圈圈 2024-07-13 20:37:24

可能不是最优雅的解决方案,并且 IN 的性能在较大的表上可能会受到影响。

嵌套查询获取每个城市的最小Birthyear。 只有具有此 Birthyear 的记录才会在外部查询中匹配。 按年龄排序,然后限制为 3 个结果,您将得到 3 个最年长的人,他们也是所在城市中最年长的人(Egon Spengler 退出..)

SELECT Name, City, Birthyear, COUNT(*) AS ct
FROM table
WHERE Birthyear IN (SELECT MIN(Birthyear)
               FROM table
               GROUP by City)
GROUP BY City
ORDER BY Birthyear DESC LIMIT 3;

+-----------------+-------------+------+----+
| name            | city        | year | ct |
+-----------------+-------------+------+----+
| Henry Jones     | Chicago     | 1899 | 1  |
| Mac Taylor      | New York    | 1955 | 1  |
| Sarah Connor    | Los Angeles | 1959 | 1  |
+-----------------+-------------+------+----+

编辑 - 添加了GROUP BY City到外部查询,因为具有相同出生年份的人会返回多个值。 如果超过一个人具有最低出生年份,则对外部查询进行分组可确保每个城市仅返回一个结果。 ct 列将显示该城市中是否存在多个具有该出生年份的人

Probably not the most elegant of solutions, and the performance of IN may suffer on larger tables.

The nested query gets the minimum Birthyear for each city. Only records who have this Birthyear are matched in the outer query. Ordering by age then limiting to 3 results gets you the 3 oldest people who are also the oldest in their city (Egon Spengler drops out..)

SELECT Name, City, Birthyear, COUNT(*) AS ct
FROM table
WHERE Birthyear IN (SELECT MIN(Birthyear)
               FROM table
               GROUP by City)
GROUP BY City
ORDER BY Birthyear DESC LIMIT 3;

+-----------------+-------------+------+----+
| name            | city        | year | ct |
+-----------------+-------------+------+----+
| Henry Jones     | Chicago     | 1899 | 1  |
| Mac Taylor      | New York    | 1955 | 1  |
| Sarah Connor    | Los Angeles | 1959 | 1  |
+-----------------+-------------+------+----+

Edit - added GROUP BY City to outer query, as people with same birth years would return multiple values. Grouping on the outer query ensures that only one result will be returned per city, if more than one person has that minimum Birthyear. The ct column will show if more than one person exists in the city with that Birthyear

独自←快乐 2024-07-13 20:37:24

这可能不是最优雅和最快的解决方案,但它应该可行。 我期待看到真正的数据库大师的解决方案。

select p.* from people p,
(select city, max(age) as mage from people group by city) t
where p.city = t.city and p.age = t.mage
order by p.age desc

This is probably not the most elegant and quickest solution, but it should work. I am looking forward the see the solutions of real database gurus.

select p.* from people p,
(select city, max(age) as mage from people group by city) t
where p.city = t.city and p.age = t.mage
order by p.age desc
衣神在巴黎 2024-07-13 20:37:24

类似的事情?

SELECT
  Id, Name, City, Birthyear
FROM
  TheTable
WHERE
  Id IN (SELECT TOP 1 Id FROM TheTable i WHERE i.City = TheTable.City ORDER BY Birthyear)

Something like that?

SELECT
  Id, Name, City, Birthyear
FROM
  TheTable
WHERE
  Id IN (SELECT TOP 1 Id FROM TheTable i WHERE i.City = TheTable.City ORDER BY Birthyear)
左岸枫 2024-07-13 20:37:24

不太漂亮,但应该也可以与具有相同 dob 的多个人一起使用:

测试数据:

select id, name, city, dob 
into people
from
(select 1 id,'Egon Spengler' name, 'New York' city , 1957 dob
union all select 2, 'Mac Taylor','New York', 1955
union all select 3, 'Sarah Connor','Los Angeles', 1959
union all select 4, 'Jean-Luc Picard','La Barre', 2305
union all select 5, 'Ellen Ripley','Nostromo', 2092
union all select 6, 'James T. Kirk','Riverside', 2233
union all select 7, 'Henry Jones','Chicago', 1899
union all select 8, 'Blah','New York', 1955) a

查询:

select 
    * 
from 
    people p
    left join people p1
    ON 
        p.city = p1.city
        and (p.dob > p1.dob and p.id <> p1.id)
        or (p.dob = p1.dob and p.id > p1.id)
where
    p1.id is null
order by 
    p.dob

Not pretty but should work also with multiple people with the same dob:

Test data:

select id, name, city, dob 
into people
from
(select 1 id,'Egon Spengler' name, 'New York' city , 1957 dob
union all select 2, 'Mac Taylor','New York', 1955
union all select 3, 'Sarah Connor','Los Angeles', 1959
union all select 4, 'Jean-Luc Picard','La Barre', 2305
union all select 5, 'Ellen Ripley','Nostromo', 2092
union all select 6, 'James T. Kirk','Riverside', 2233
union all select 7, 'Henry Jones','Chicago', 1899
union all select 8, 'Blah','New York', 1955) a

Query:

select 
    * 
from 
    people p
    left join people p1
    ON 
        p.city = p1.city
        and (p.dob > p1.dob and p.id <> p1.id)
        or (p.dob = p1.dob and p.id > p1.id)
where
    p1.id is null
order by 
    p.dob
椒妓 2024-07-13 20:37:24

@BlaM

已更新
刚刚发现用USING代替ON很好。 它将删除结果中的重复列。

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 USING(Birthyear, City)
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

原始帖子

嗨,我尝试使用您更新的查询,但我得到了错误的结果,直到我添加了额外的连接条件(也在连接选择中添加了额外的列)。 转移到您的查询,我正在使用这个:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear AND P2.City = P.City
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

理论上您不需要最后一个 GROUP BY P.City,但我暂时将其留在那里,以防万一。 稍后可能会删除它。

@BlaM

UPDATED
just found that its good to use USING instead of ON. it will remove duplicate columns in result.

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 USING(Birthyear, City)
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

ORIGINAL POST

hi, i've tried to use your updated query but i was getting wrong results until i've added extra condition to join (also extra column into join select). transfered to your query, i'am using this:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear AND P2.City = P.City
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

in theory you should not need last GROUP BY P.City, but i've left it there for now, just in case. will probably remove it later.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文