优化慢排名查询

发布于 2024-08-31 20:57:56 字数 1698 浏览 3 评论 0原文

我需要优化一个查询,以获得永远需要的排名(查询本身有效,但我知道它很糟糕,我刚刚用大量记录尝试过它,并且它给出了超时)。

我将简要解释该模型。我有 3 个表:player、team 和player_team。我有可以属于一个团队的球员。听起来很明显,玩家存储在玩家表中,团队存储在团队中。在我的应用程序中,每个玩家都可以随时切换团队,并且必须维护日志。然而,一名球员在给定时间被认为只属于一支球队。玩家当前所在的队伍是他最后加入的队伍。

我认为球员和球队的结构并不相关。我每个都有一个 id 列 PK。在player_team 中,我有:

id          (PK)
player_id   (FK -> player.id)
team_id     (FK -> team.id)

现在,每个团队都会为每个加入的玩家分配一个积分。所以,现在,我想要获得球员数量最多的前N支球队的排名。

我的第一个想法是首先从player_team中获取当前玩家(即每个玩家的一个记录顶部;该记录必须是该玩家当前的团队)。我未能找到一种简单的方法来做到这一点(尝试 GROUP BY player_team.player_id HAVING player_team.id = MAX(player_team.id),但这并没有解决问题。

我尝试了一些不起作用的查询,但是 正如我所说,它可以

SELECT 
    COUNT(*) AS total,
    pt.team_id,
    p.facebook_uid AS owner_uid, 
    t.color 
FROM 
    player_team pt 
JOIN player p ON (p.id = pt.player_id)  
JOIN team t ON (t.id = pt.team_id) 
WHERE 
    pt.id IN (
        SELECT max(J.id) 
        FROM player_team J 
        GROUP BY J.player_id
    )  

GROUP BY 
    pt.team_id 
ORDER BY 
    total DESC 
LIMIT 50            

工作,但看起来很糟糕,而且性能更差,所以我确信一定有更好的方法来优化

我正在使用的mysql 。

。提前感谢

添加解释(抱歉,不知道如何正确格式化)

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY     t   ALL     PRIMARY     NULL    NULL    NULL    5000    Using temporary; Using filesort
1   PRIMARY     pt  ref     FKplayer_pt77082,FKplayer_pt265938,new_index    FKplayer_pt77082    4   t.id    30  Using where
1   PRIMARY     p   eq_ref  PRIMARY     PRIMARY     4   pt.player_id    1
2   DEPENDENT SUBQUERY  J   index   NULL    new_index   8   NULL    150000  Using index

I need to optimize a query for a ranking that is taking forever (the query itself works, but I know it's awful and I've just tried it with a good number of records and it gives a timeout).

I'll briefly explain the model. I have 3 tables: player, team and player_team. I have players, that can belong to a team. Obvious as it sounds, players are stored in the player table and teams in team. In my app, each player can switch teams at any time, and a log has to be mantained. However, a player is considered to belong to only one team at a given time. The current team of a player is the last one he's joined.

The structure of player and team is not relevant, I think. I have an id column PK in each. In player_team I have:

id          (PK)
player_id   (FK -> player.id)
team_id     (FK -> team.id)

Now, each team is assigned a point for each player that has joined. So, now, I want to get a ranking of the first N teams with the biggest number of players.

My first idea was to get first the current players from player_team (that is one record top for each player; this record must be the player's current team). I failed to find a simple way to do it (tried GROUP BY player_team.player_id HAVING player_team.id = MAX(player_team.id), but that didn't cut it.

I tried a number of querys that didn't work, but managed to get this working.

SELECT 
    COUNT(*) AS total,
    pt.team_id,
    p.facebook_uid AS owner_uid, 
    t.color 
FROM 
    player_team pt 
JOIN player p ON (p.id = pt.player_id)  
JOIN team t ON (t.id = pt.team_id) 
WHERE 
    pt.id IN (
        SELECT max(J.id) 
        FROM player_team J 
        GROUP BY J.player_id
    )  

GROUP BY 
    pt.team_id 
ORDER BY 
    total DESC 
LIMIT 50            

As I said, it works but looks very bad and performs worse, so I'm sure there must be a better way to go. Anyone has any ideas for optimizing this?

I'm using mysql, by the way.

Thanks in advance

Adding the explain. (Sorry, not sure how to format it properly)

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY     t   ALL     PRIMARY     NULL    NULL    NULL    5000    Using temporary; Using filesort
1   PRIMARY     pt  ref     FKplayer_pt77082,FKplayer_pt265938,new_index    FKplayer_pt77082    4   t.id    30  Using where
1   PRIMARY     p   eq_ref  PRIMARY     PRIMARY     4   pt.player_id    1
2   DEPENDENT SUBQUERY  J   index   NULL    new_index   8   NULL    150000  Using index

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

半岛未凉 2024-09-07 20:57:56

试试这个:

SELECT  t.*, cnt
FROM    (
        SELECT  team_id, COUNT(*) AS cnt
        FROM    (
                SELECT  player_id, MAX(id) AS mid
                FROM    player_team
                GROUP BY
                        player_id
                ) q
        JOIN    player_team pt
        ON      pt.id = q.mid
        GROUP BY
                team_id
        ) q2
JOIN    team t
ON      t.id = q2.team_id
ORDER BY
        cnt DESC
LIMIT 50

player_team (player_id, id) (按此顺序)上创建一个索引,以便快速运行。

Try this:

SELECT  t.*, cnt
FROM    (
        SELECT  team_id, COUNT(*) AS cnt
        FROM    (
                SELECT  player_id, MAX(id) AS mid
                FROM    player_team
                GROUP BY
                        player_id
                ) q
        JOIN    player_team pt
        ON      pt.id = q.mid
        GROUP BY
                team_id
        ) q2
JOIN    team t
ON      t.id = q2.team_id
ORDER BY
        cnt DESC
LIMIT 50

Create an index on player_team (player_id, id) (in this order) for this to work fast.

放血 2024-09-07 20:57:56

它的子查询正在杀死它 - 如果您在 player_team 表上添加一个 current 字段,如果它是当前的,则给它值 = 1,如果是,则给它 0老你可以通过这样做来简化这一点:

SELECT 
    COUNT(*) AS total,
    pt.team_id,
    p.facebook_uid AS owner_uid, 
    t.color 
FROM 
    player_team pt 
JOIN player p ON (p.id = pt.player_id)  
JOIN team t ON (t.id = pt.team_id) 
WHERE 
    player_team.current = 1 
GROUP BY 
    pt.team_id 
ORDER BY 
    total DESC 
LIMIT 50  

player_team 表中为相同的关系设置多个条目,其中区分哪一个是“当前”记录的唯一方法是通过比较两个(或更多)记录我认为行是不好的做法。我以前也遇到过这种情况,为了让它发挥作用而必须采取的解决方法确实会降低性能。通过执行简单的查找(在本例中为 where current=1)或通过将历史数据移动到完全不同的表(取决于您的表),能够查看哪一行是当前的要好得多。这种情况可能有点矫枉过正)。

Its the subquery that is killing it - if you add a current field on the player_team table, where you give it value = 1 if it is current, and 0 if it is old you could simplify this alot by just doing:

SELECT 
    COUNT(*) AS total,
    pt.team_id,
    p.facebook_uid AS owner_uid, 
    t.color 
FROM 
    player_team pt 
JOIN player p ON (p.id = pt.player_id)  
JOIN team t ON (t.id = pt.team_id) 
WHERE 
    player_team.current = 1 
GROUP BY 
    pt.team_id 
ORDER BY 
    total DESC 
LIMIT 50  

Having multiple entries in the player_team table for the same relationship where the only way to distinguish which one is the 'current' record is by comparing two (or more) rows I think is bad practice. I have been in this situation before and the workarounds you have to do to make it work really kill performance. It is far better to be able to see which row is current by doing a simple lookup (in this case, where current=1) - or by moving historical data into a completely different table (depending on your situation this might be overkill).

阳光下慵懒的猫 2024-09-07 20:57:56

我有时发现 MySQL 中更复杂的查询需要分成两部分。

第一部分会将所需的数据拉入临时表,第二部分是尝试操作创建的数据集的查询。这样做肯定会带来显着的性能提升。

I sometimes find that more complex queries in MySQL need to be broken into two pieces.

The first piece would pull the data required into a temporary table and the second piece would be the query that attempts to manipulate the dataset created. Doing this definitely results in a significant performance gain.

扛起拖把扫天下 2024-09-07 20:57:56

这将使当前团队的颜色按大小排序:

  SELECT team_id, COUNT(player_id) c AS total, t.color 
    FROM player_team pt JOIN teams t ON t.team_id=pt.team_id  
    GROUP BY pt.team_id WHERE current=1
    ORDER BY pt.c DESC
    LIMIT 50;

但是您没有给出哪个玩家应被视为团队所有者的条件。您当前的查询由于分组而任意将一名玩家显示为owner_id,而不是因为该玩家是实际所有者。如果您的player_team 表包含“所有者”列,您可以将上述查询加入到所有者查询中。像这样的东西:

SELECT o.facebook_uid, a.team_id, a.color, a.c
FROM player_teams pt1 
  JOIN players o ON (pt1.player_id=o.player_id AND o.owner=1)
  JOIN (...above query...) a
    ON a.team_id=pt1.team_id;

This will get the current teams with colours ordered by size:

  SELECT team_id, COUNT(player_id) c AS total, t.color 
    FROM player_team pt JOIN teams t ON t.team_id=pt.team_id  
    GROUP BY pt.team_id WHERE current=1
    ORDER BY pt.c DESC
    LIMIT 50;

But you've not given a condition for which player should be considered owner of the team. Your current query is arbitrarily showing one player as owner_id because of the grouping, not because that player is the actual owner. If your player_team table contained an 'owner' column, you could join the above query to a query of owners. Something like:

SELECT o.facebook_uid, a.team_id, a.color, a.c
FROM player_teams pt1 
  JOIN players o ON (pt1.player_id=o.player_id AND o.owner=1)
  JOIN (...above query...) a
    ON a.team_id=pt1.team_id;
国粹 2024-09-07 20:57:56

您可以将“last_playteam_id”列添加到玩家表中,并在每次玩家使用player_team表中的pk更改其团队时更新它。

然后你可以这样做:

SELECT 
    COUNT(*) AS total,
    pt.team_id,
    p.facebook_uid AS owner_uid, 
    t.color 
FROM 
    player_team pt 
JOIN player p ON (p.id = pt.player_id)  and p.last_playteam_id = pt.id
JOIN team t ON (t.id = pt.team_id) 
GROUP BY 
    pt.team_id 
ORDER BY 
    total DESC 
LIMIT 50   

这可能是最快的,因为你不必将旧的player_team行更新为current=0。

您还可以添加一列“last_team_id”并将其当前团队保留在那里,您将获得上述查询的最快结果,但对于其他查询可能没有多大帮助。

You could add a column "last_playteam_id" to player table, and update it each time a player changes his team with the pk from player_team table.

Then you can do this:

SELECT 
    COUNT(*) AS total,
    pt.team_id,
    p.facebook_uid AS owner_uid, 
    t.color 
FROM 
    player_team pt 
JOIN player p ON (p.id = pt.player_id)  and p.last_playteam_id = pt.id
JOIN team t ON (t.id = pt.team_id) 
GROUP BY 
    pt.team_id 
ORDER BY 
    total DESC 
LIMIT 50   

This could be fastest because you don't have to update the old player_team rows to current=0.

You could also add instead a column "last_team_id" and keep it's current team there, you get the fastest result for the above query, but it could be less helpful with other queries.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文