在应用 LIMIT 之前获取结果计数的最佳方法
当对来自数据库的数据进行分页时,您需要知道将有多少页来呈现页面跳转控件。
目前,我通过运行查询两次来实现这一点,一次包含在 count()
中以确定总结果,第二次应用限制来仅返回当前页面所需的结果。
这看起来效率很低。 是否有更好的方法来确定在应用 LIMIT
之前会返回多少结果?
我正在使用 PHP 和 Postgres。
When paging through data that comes from a DB, you need to know how many pages there will be to render the page jump controls.
Currently I do that by running the query twice, once wrapped in a count()
to determine the total results, and a second time with a limit applied to get back just the results I need for the current page.
This seems inefficient. Is there a better way to determine how many results would have been returned before LIMIT
was applied?
I am using PHP and Postgres.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
自 2008 年以来,纯 SQL
发生了变化。您可以使用 窗口函数 来在一次查询中获取完整计数和有限结果。 2009 年 PostgreSQL 8.4 引入。
请注意,这可能比没有总数的情况下贵得多。 必须对所有行进行计数,并且仅从匹配索引中获取顶部行的可能快捷方式可能不再有帮助。
对于小表或
full_count
<=OFFSET
+LIMIT
来说并不重要。 对于更大的full_count
来说很重要。极端情况:当
OFFSET
至少与基本查询的行数一样大时,没有行< /strong> 返回。 所以你也得不到full_count
。 可能的替代方案:SELECT
查询中的事件序列(0。CTE 单独评估和具体化。在Postgres 12 或更高版本的规划器可能会在开始工作之前内联子查询之类的内容。)这里不。
WHERE
子句(和JOIN
条件,尽管您的示例中没有)从基表中过滤符合条件的行。 其余部分基于过滤的子集。(2.
GROUP BY
和聚合函数将位于此处。)不在这里。(3.其他
SELECT
列表表达式根据分组/聚合列进行计算。)这里不。窗口函数的应用取决于
OVER
子句和函数的框架规范。 简单的count(*) OVER()
基于所有符合条件的行。排序依据
(6.
DISTINCT
或DISTINCT ON
将出现在此处。)不在这里。LIMIT
/OFFSET
来选择要返回的行。随着表中行数的增加,
LIMIT
/OFFSET
变得越来越低效。 如果您需要更好的性能,请考虑替代方法:获取最终计数的替代方案
有完全不同的方法来获取受影响行的计数(不是
OFFSET
& 应用了LIMIT
)。 Postgres 内部记录有多少行受到最后一个 SQL 命令的影响。 某些客户端可以访问该信息或自行计算行数(例如 psql)。例如,您可以在执行 SQL 命令后立即检索 plpgsql 中受影响的行数:
手册中的详细信息。
或者您可以使用 PHP 中的
pg_num_rows
。 或者其他客户端中的类似功能。相关:
Pure SQL
Things have changed since 2008. You can use a window function to get the full count and the limited result in one query. Introduced with PostgreSQL 8.4 in 2009.
Note that this can be considerably more expensive than without the total count. All rows have to be counted, and a possible shortcut taking just the top rows from a matching index may not be helpful any more.
Doesn't matter much with small tables or
full_count
<=OFFSET
+LIMIT
. Matters for a substantially biggerfull_count
.Corner case: when
OFFSET
is at least as great as the number of rows from the base query, no row is returned. So you also get nofull_count
. Possible alternative:Sequence of events in a
SELECT
query( 0. CTEs are evaluated and materialized separately. In Postgres 12 or later the planner may inline those like subqueries before going to work.) Not here.
WHERE
clause (andJOIN
conditions, though none in your example) filter qualifying rows from the base table(s). The rest is based on the filtered subset.( 2.
GROUP BY
and aggregate functions would go here.) Not here.( 3. Other
SELECT
list expressions are evaluated, based on grouped / aggregated columns.) Not here.Window functions are applied depending on the
OVER
clause and the frame specification of the function. The simplecount(*) OVER()
is based on all qualifying rows.ORDER BY
( 6.
DISTINCT
orDISTINCT ON
would go here.) Not here.LIMIT
/OFFSET
are applied based on the established order to select rows to return.LIMIT
/OFFSET
becomes increasingly inefficient with a growing number of rows in the table. Consider alternative approaches if you need better performance:Alternatives to get final count
There are completely different approaches to get the count of affected rows (not the full count before
OFFSET
&LIMIT
were applied). Postgres has internal bookkeeping how many rows where affected by the last SQL command. Some clients can access that information or count rows themselves (like psql).For instance, you can retrieve the number of affected rows in plpgsql immediately after executing an SQL command with:
Details in the manual.
Or you can use
pg_num_rows
in PHP. Or similar functions in other clients.Related:
正如我在我的博客上所描述的,MySQL 有一个功能称为 SQL_CALC_FOUND_ROWS。 这消除了执行两次查询的需要,但它仍然需要完整地执行查询,即使 limit 子句允许它提前停止。
据我所知,PostgreSQL没有类似的功能。 进行分页时需要注意的一件事(恕我直言,这是使用 LIMIT 的最常见的事情):执行“OFFSET 1000 LIMIT 10”意味着数据库必须获取至少 1010 行,甚至如果它只给你 10。一种更高效的方法是记住前一行(在本例中为第 1000 行)排序所依据的行的值,并像这样重写查询:“... WHERE order_row >第 1000 个值限制 10”。 优点是“order_row”很可能被索引(如果没有,你就会遇到问题)。 缺点是,如果在页面视图之间添加新元素,这可能会有点不同步(但话又说回来,访问者可能无法观察到它,并且可能会带来很大的性能提升)。
As I describe on my blog, MySQL has a feature called SQL_CALC_FOUND_ROWS. This removes the need to do the query twice, but it still needs to do the query in its entireity, even if the limit clause would have allowed it to stop early.
As far as I know, there is no similar feature for PostgreSQL. One thing to watch out for when doing pagination (the most common thing for which LIMIT is used IMHO): doing an "OFFSET 1000 LIMIT 10" means that the DB has to fetch at least 1010 rows, even if it only gives you 10. A more performant way to do is to remember the value of the row you are ordering by for the previous row (the 1000th in this case) and rewrite the query like this: "... WHERE order_row > value_of_1000_th LIMIT 10". The advantage is that "order_row" is most probably indexed (if not, you've go a problem). The disadvantage being that if new elements are added between page views, this can get a little out of synch (but then again, it may not be observable by visitors and can be a big performance gain).
您可以通过不每次都运行 COUNT() 查询来减轻性能损失。 在再次运行查询之前缓存页数,例如 5 分钟。 除非您看到大量插入,否则应该可以正常工作。
You could mitigate the performance penalty by not running the COUNT() query every time. Cache the number of pages for, say 5 minutes before the query is run again. Unless you're seeing a huge number of INSERTs, that should work just fine.
由于 Postgres 已经做了一定量的缓存工作,因此这种方法并不像看起来那么低效。 这绝对不会使执行时间加倍。 我们的数据库层中内置了计时器,所以我已经看到了证据。
Since Postgres already does a certain amount of caching things, this type of method isn't as inefficient as it seems. It's definitely not doubling execution time. We have timers built into our DB layer, so I have seen the evidence.
鉴于您需要了解分页的目的,我建议运行一次完整查询,将数据作为服务器端缓存写入磁盘,然后通过分页机制提供数据。
如果您运行 COUNT 查询的目的是决定是否向用户提供数据(即,如果有 > X 条记录,则返回错误),则需要坚持使用 COUNT 方法。
Seeing as you need to know for the purpose of paging, I'd suggest running the full query once, writing the data to disk as a server-side cache, then feeding that through your paging mechanism.
If you're running the COUNT query for the purpose of deciding whether to provide the data to the user or not (i.e. if there are > X records, give back an error), you need to stick with the COUNT approach.