MySQL排名，更新频繁、数据集大，如何获得最佳性能？

发布于 2024-07-13 08:00:47 字数 1175 浏览 3 评论 0原文

我想在一个非常大的表格上进行分组排名，我找到了这个问题的几个解决方案，例如这篇文章和网络上的其他地方。然而，我无法弄清楚这些解决方案最坏情况的复杂性。具体问题由一个表组成，其中每行都有多个点和关联的名称。我希望能够请求排名区间，例如 1-4。以下是一些数据示例：

name | points
Ab     14
Ac     14
B      16
C      16
Da     15
De     13

使用这些值，创建以下“排名”：

Query id | Rank | Name
1          1      B
2          1      C
3          3      Da
4          4      Ab
5          4      Ac
6          6      De

并且应该可以在查询 ID 上创建以下间隔：2-5 给出排名：1、3、4 和 4。

数据库保存大约300 万条记录，因此如果可能的话，我想避免复杂性大于 log(n) 的解决方案。数据库不断更新和插入，因此这些操作最好也以 log(n) 复杂度执行。但我不确定这是否可能，并且我已经尝试过一段时间了。我得出的结论是二分搜索应该是可能的，但我无法创建执行此操作的查询。我正在使用 MySQL 服务器。

我将详细说明过滤的伪代码如何工作。首先，需要一个关于（点，名称）的索引。作为输入，您给出 fromrank 和tilrank。数据库中的记录总数为n。伪代码应如下所示：

查找中值点值，计算小于该值的行数（计数给出了排名的粗略估计，不考虑具有相同点数的行）。如果返回的数字大于 fromrank 分隔符，我们将前半部分细分并找到它的中位数。我们继续这样做，直到确定 fromrank 应该开始的点数。然后我们在该数量的点内使用名称索引执行相同的操作，并找到中位数，直到到达正确的行。我们对tilrank 做了完全相同的事情。

结果应该是 log(n) 细分数。因此，鉴于中位数和计数可以在 log(n) 时间内完成，应该可以在最坏情况复杂度 log(n) 下解决问题。如果我错了请纠正我。

原文

I want grouped ranking on a very large table, I've found a couple of solutions for this problem e.g. in this post and other places on the web. I am, however, unable to figure out the worst case complexity of these solutions. The specific problem consists of a table where each row has a number of points and a name associated. I want to be able to request rank intervals such as 1-4. Here are some data examples:

name | points
Ab     14
Ac     14
B      16
C      16
Da     15
De     13

With these values the following "ranking" is created:

Query id | Rank | Name
1          1      B
2          1      C
3          3      Da
4          4      Ab
5          4      Ac
6          6      De

And it should be possible to create the following interval on query-id's: 2-5 giving rank: 1,3,4 and 4.

The database holds about 3 million records so if possible I want to avoid a solution with complexity greater than log(n). There are constantly updates and inserts on the database so these actions should preferably be performed in log(n) complexity as well. I am not sure it's possible though and I've tried wrapping my head around it for some time. I've come to the conclusion that a binary search should be possible but I haven't been able to create a query that does this. I am using a MySQL server.

I will elaborate on how the pseudo code for the filtering could work. Firstly, an index on (points, name) is needed. As input you give a fromrank and a tillrank. The total number of records in the database is n. The pseudocode should look something like this:

Find median point value, count rows less than this value (the count gives a rough estimate of rank, not considering those with same amount of points). If the number returned is greater than the fromrank delimiter, we subdivide the first half and find median of it. We keep doing this until we are pinpointed to the amount of points where fromrank should start. then we do the same within that amount of points with the name index, and find median until we have reached the correct row. We do the exact same thing for tillrank.

The result should be log(n) number of subdivisions. So given the median and count can be made in log(n) time it should be possible to solve the problem in worst case complexity log(n). Correct me if I am wrong.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时光暖心i 2024-07-20 08:00:47

您需要一个存储过程才能使用参数调用它：

CREATE TABLE rank (name VARCHAR(20) NOT NULL, points INTEGER NOT NULL);

CREATE INDEX ix_rank_points ON rank(points, name);

CREATE PROCEDURE prc_ranks(fromrank INT, tillrank INT)
BEGIN
  SET @fromrank = fromrank;
  SET @tillrank = tillrank;
  PREPARE STMT FROM
  '
  SELECT  rn, rank, name, points
  FROM  (
    SELECT  CASE WHEN @cp = points THEN @rank ELSE @rank := @rn + 1 END AS rank,
            @rn := @rn + 1 AS rn,
            @cp := points,
            r.*
    FROM (
         SELECT @cp := -1, @rn := 0, @rank = 1
         ) var,
         (
         SELECT *
         FROM rank
         FORCE INDEX (ix_rank_points)
         ORDER BY
           points DESC, name DESC
         LIMIT ?
         ) r
    ) o
  WHERE rn >= ?
  ';
  EXECUTE STMT USING @tillrank, @fromrank;
END;

CALL prc_ranks (2, 5);

如果您创建索引并强制 MySQL 使用它（如在我的查询中），那么查询的复杂性将不依赖于行数，它仅取决于 tillrank。

它实际上会从索引中获取最后的 tillrank 值，对它们执行一些简单的计算，并过滤掉第一个 fromrank 值。

正如您所看到的，此操作的时间仅取决于 tillrank，而不取决于有多少条记录。

我刚刚签入了 400,000 行，它在 0,004 秒内选择了从 5 到 100 的排名（也就是说，立即）

重要：仅当您按DESCENDING顺序对名称进行排序时，此方法才有效。 MySQL 不支持索引中的 DESC 子句，这意味着 points 和 name 必须排序在一个INDEX SORT 可用的顺序（要么都是 ASCENDING，要么都是 DESCENDING）。如果您想按名称快速进行ASC排序，则需要在数据库中保留负点，并更改中的符号SELECT 子句。

您还可以从索引中删除 name ，并在不使用索引的情况下执行最终的ORDER 操作：

CREATE INDEX ix_rank_points ON rank(points);

CREATE PROCEDURE prc_ranks(fromrank INT, tillrank INT)
BEGIN
  SET @fromrank = fromrank;
  SET @tillrank = tillrank;
  PREPARE STMT FROM
  '
  SELECT  rn, rank, name, points
  FROM  (
    SELECT  CASE WHEN @cp = points THEN @rank ELSE @rank := @rn + 1 END AS rank,
            @rn := @rn + 1 AS rn,
            @cp := points,
            r.*
    FROM (
         SELECT @cp := -1, @rn := 0, @rank = 1
         ) var,
         (
         SELECT *
         FROM rank
         FORCE INDEX (ix_rank_points)
         ORDER BY
           points DESC
         LIMIT ?
         ) r
    ) o
  WHERE rn >= ?
  ORDER BY rank, name
  ';
  EXECUTE STMT USING @tillrank, @fromrank;
END;

这会影响大范围内的性能，但您几乎不会注意到它在小范围内。

You need a stored procedure to be able to call this with parameters:

CREATE TABLE rank (name VARCHAR(20) NOT NULL, points INTEGER NOT NULL);

CREATE INDEX ix_rank_points ON rank(points, name);

CREATE PROCEDURE prc_ranks(fromrank INT, tillrank INT)
BEGIN
  SET @fromrank = fromrank;
  SET @tillrank = tillrank;
  PREPARE STMT FROM
  '
  SELECT  rn, rank, name, points
  FROM  (
    SELECT  CASE WHEN @cp = points THEN @rank ELSE @rank := @rn + 1 END AS rank,
            @rn := @rn + 1 AS rn,
            @cp := points,
            r.*
    FROM (
         SELECT @cp := -1, @rn := 0, @rank = 1
         ) var,
         (
         SELECT *
         FROM rank
         FORCE INDEX (ix_rank_points)
         ORDER BY
           points DESC, name DESC
         LIMIT ?
         ) r
    ) o
  WHERE rn >= ?
  ';
  EXECUTE STMT USING @tillrank, @fromrank;
END;

CALL prc_ranks (2, 5);

If you create the index and force MySQL to use it (as in my query), then the complexity of the query will not depend on the number of rows at all, it will depend only on tillrank.

It will actually take last tillrank values from the index, perform some simple calculations on them and filter out first fromrank values.

Time of this operation, as you can see, depends only on tillrank, it does not depend on how many records are there.

I just checked in on 400,000 rows, it selects ranks from 5 to 100 in 0,004 seconds (that is, instantly)

Important: this only works if you sort on names in DESCENDING order. MySQL does not support DESC clause in the indices, that means that the points and name must be sorted in one order for INDEX SORT to be usable (either both ASCENDING or both DESCENDING). If you want fast ASC sorting by name, you will need to keep negative points in the database, and change the sign in the SELECT clause.

You may also remove name from the index at all, and perform a final ORDER'ing without using an index:

CREATE INDEX ix_rank_points ON rank(points);

CREATE PROCEDURE prc_ranks(fromrank INT, tillrank INT)
BEGIN
  SET @fromrank = fromrank;
  SET @tillrank = tillrank;
  PREPARE STMT FROM
  '
  SELECT  rn, rank, name, points
  FROM  (
    SELECT  CASE WHEN @cp = points THEN @rank ELSE @rank := @rn + 1 END AS rank,
            @rn := @rn + 1 AS rn,
            @cp := points,
            r.*
    FROM (
         SELECT @cp := -1, @rn := 0, @rank = 1
         ) var,
         (
         SELECT *
         FROM rank
         FORCE INDEX (ix_rank_points)
         ORDER BY
           points DESC
         LIMIT ?
         ) r
    ) o
  WHERE rn >= ?
  ORDER BY rank, name
  ';
  EXECUTE STMT USING @tillrank, @fromrank;
END;

That will impact performance on big ranges, but you will hardly notice it on small ranges.

回复收藏 0 原文

~没有更多了~