如何使用 PHP 和 MySQL 高效地对大型数据集进行分页?
你们中有些人可能知道,在 MySQL 中使用 LIMIT 关键字并不妨碍它读取前面的记录。
例如:
SELECT * FROM my_table LIMIT 10000, 20;
意味着 MySQL 仍会读取前 10,000 条记录并在生成我们后面的 20 条记录之前将其丢弃。
因此,在对大型数据集进行分页时,页数高意味着加载时间长。
有谁知道任何现有的分页类/技术/方法可以以更有效的方式对大型数据集进行分页,即不依赖 LIMIT MySQL 关键字?
如果可能的话,使用 PHP,因为这是我公司选择的武器。
干杯。
As some of you may know, use of the LIMIT keyword in MySQL does not preclude it from reading the preceding records.
For example:
SELECT * FROM my_table LIMIT 10000, 20;
Means that MySQL will still read the first 10,000 records and throw them away before producing the 20 we are after.
So, when paginating a large dataset, high page numbers mean long load times.
Does anyone know of any existing pagination class/technique/methodology that can paginate large datasets in a more efficient way i.e. that does not rely on the LIMIT MySQL keyword?
In PHP if possible as that is the weapon of choice at my company.
Cheers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
首先,如果你想分页,你绝对必须有一个 ORDER BY 子句。然后您只需使用该子句来更深入地挖掘您的数据集。例如,考虑一下:
您将拥有前 20 条记录,假设它们的 id 是:5,8,9,...,55,64。第 2 页的分页链接将类似于“list.php?page=2&id=64”,并且您的查询将是“
无偏移量,仅读取 20 条记录”。它不允许您任意跳转到任何页面,但大多数时候人们只是浏览下一页/上一页。即使 OFFSET 值很大,“id”上的索引也会提高性能。
First of all, if you want to paginate, you absolutely have to have an ORDER BY clause. Then you simply have to use that clause to dig deeper in your data set. For example, consider this:
You'll have the first 20 records, let's say their id's are: 5,8,9,...,55,64. Your pagination link to page 2 will look like "list.php?page=2&id=64" and your query will be
No offset, only 20 records read. It doesn't allow you to jump arbitrarily to any page, but most of the time people just browse the next/prev page. An index on "id" will improve the performance, even with big OFFSET values.
解决方案可能是不使用 limit 子句,而是使用联接——联接用作某种序列的表。
有关更多信息,关于SO,我发现了这个 问题/答案,其中给出了一个示例 - 可能对您有帮助;-)
A solution might be to not use the limit clause, and use a join instead -- joining on a table used as some kind of sequence.
For more informations, on SO, I found this question / answer, which gives an example -- that might help you ;-)
基本上有 3 种方法可以实现这一点,每种方法都有自己的权衡:
将所有 10000 条记录发送到客户端,并通过 Javascript 等在客户端处理分页。明显的好处是所有记录只需要一次查询;明显的缺点是,如果记录大小在任何方面都很重要,则发送到浏览器的页面大小将成比例大小 - 并且用户可能实际上并不关心完整的记录集。
执行您当前正在执行的操作,即 SQL LIMIT,并仅获取每个请求所需的记录,完全无状态。好处是它只发送当前请求的页面的记录,因此请求很小,缺点是a)它需要每个页面的服务器请求,b)随着后面页面的记录/页面数量的增加,速度会变慢结果,正如你提到的。在单调递增的 id 字段上使用 JOIN 或 WHERE 子句有时可以在这方面有所帮助,特别是当您从静态表而不是动态查询请求结果时。
在服务器上维护某种状态对象,该对象缓存查询结果,并可以在未来的请求中在有限的时间内引用。优点是它具有最好的查询速度,因为实际查询只需要运行一次;缺点是必须管理/存储/清理这些状态对象(对于高流量网站尤其令人讨厌)。
There are basically 3 approaches to this, each of which have their own trade-offs:
Send all 10000 records to the client, and handle pagination client-side via Javascript or the like. Obvious benefit is that only a single query is necessary for all of the records; obvious downside is that if the record size is in any way significant, the size of the page sent to the browser will be of proportionate size - and the user might not actually care about the full record set.
Do what you're currently doing, namely SQL LIMIT and grab only the records you need with each request, completely stateless. Benefit in that it only sends the records for the page currently requested, so requests are small, downsides in that a) it requires a server request for each page, and b) it's slower as the number of records/pages increases for later pages in the result, as you mentioned. Using a JOIN or a WHERE clause on a monotonically increasing id field can sometimes help in this regard, specifically if you're requesting results from a static table as opposed to a dynamic query.
Maintain some sort of state object on the server which caches the query results and can be referenced in future requests for a limited period of time. Upside is that it has the best query speed, since the actual query only needs to run once; downside is having to manage/store/cleanup those state objects (especially nasty for high-traffic websites).
意味着在搜索中显示从记录 # 10000 开始的 20 条记录,如果您在 where 子句中使用主键,那么我的 sql 上不会有沉重的负载,
任何其他分页方法都会带来真正的巨大负载,例如使用 join 方法
means show 20 records starting from record # 10000 in the search , if ur using primary keys in the where clause there will not be a heavy load on my sql
any other methods for pagnation will take real huge load like using a join method
我不知道您提到的性能下降,也不知道任何其他分页解决方案,但是 ORDER BY 子句可能会帮助您减少加载时间。
I'm not aware of that performance decrease that you've mentioned, and I don't know of any other solution for pagination however a ORDER BY clause might help you reduce the load time.
最好的方法是在 my_table 中定义索引字段,并且对于每个新插入的行,您需要增加该字段。毕竟你需要使用 WHERE YOUR_INDEX_FIELD BETWEEN 10000 AND 10020
它会快得多。
Best way is to define index field in my_table and for every new inserted row you need increment this field. And after all you need to use WHERE YOUR_INDEX_FIELD BETWEEN 10000 AND 10020
It will much faster.
其他一些选项,
some other options,