加快solr分页速度
我已经通过在查询中包含上一页的最后一个 ID 来优化 mysql 数据的分页,因此不是 "LIMIT 200,20" 而是 "WHERE id < $ last_id_from_previous_page LIMIT 20"。
这极大地加快了 mysql 数据的分页速度。 现在我正在为我的 solr 查询寻找类似的东西,我想知道这是否可能。
使用我的 solr php 库,我进行如下搜索:
$solr->search($search_term, $start, $limit, $additionalParameters);
我可以指定 ID 必须小于搜索项参数本身中的某个数字吗?像“猫 AND [id < 200]”之类的东西..?这会给我带来像 mysql 一样的 solr 性能提升吗?
I've already optimized pagination of mysql data by including the last ID from the previous page in the query so instead of having "LIMIT 200,20" it would be "WHERE id < $last_id_from_previous_page LIMIT 20".
This has dramatically sped up pagination of mysql data.
Now I'm looking to something similar for my solr queries and I'm wondering if that's even possible.
Using my solr php library I do a search like so:
$solr->search($search_term, $start, $limit, $additionalParameters);
Can I specify that the ID has to be smaller than a certain number within the search term parameter itself? Something like "cats AND [id < 200]".. ? Would this give me a performance gain with solr as it does with mysql?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
SOLR 支持指定起始行和要返回的行数。这就是人们用来进行分页的方法。 如何使用 Solr 管理“分页”?
如果您的 SOLR 搜索库没有不支持这个,那么你应该直接进入HTTP搜索界面并直接与SOLR对话。
SOLR supports specifying a start row, and the number of rows to return. This is what people use to do pagination. How to manage "paging" with Solr?
If your SOLR search library doesn't support this, then you should go direct to the HTTP search interface and talk to SOLR directly.
仅当您要过滤的字段也是您要排序的字段并且该字段支持“<”的概念时比较。
例如,如果您按姓氏字母顺序排序,则可能很难过滤掉前 20 个。而如果您按代表日期/时间的数字排序,则可能能够完成此任务。
实际上,据我所知,没有任何数字字段与仅用于该一次搜索的文档相关联。
编辑:我会问一个更深层次的问题......您确定需要如此大量地优化分页吗?如果您的搜索针对用户进行了很好的调整,那么他们很少需要经过第一页或第二页结果才能找到他们想要的内容。 Solr 已经将初始查询中的文档 ID 保留在缓存中,因此这应该已经执行得相当好。
Only if the field you are filtering by is also the field you are sorting by AND that field supports the concept of "<" comparison.
For example, if you sorted by a last name alphabetically, it may be difficult to filter out the first 20. Where as if you sorted by a number that represented a date/time, you may be able to pull this off.
Realistically, there is no numeric field that I'm aware of that is associated with the document for just that one search.
EDIT: I would ask a deeper question ... Are you sure you need to optimize pagination this heavily? If your searches are well tuned for your users, rarely will they need to go past the first or second page of results to find what they are looking for. Solr will already keep the document ids in the cache from the initial query, so this should already perform fairly well.