Solr 文档的频繁更新 - 效率/可扩展性问题
我有一个带有文档字段的 Solr 索引,如下所示:
id, body_text, date, num_upvotes, num_downvotes
在我的应用程序中,使用一些整数 id
和一些 body_text
(最多 500 个字符)创建文档。日期设置为输入时间,num_upvotes
和num_downvotes
从0开始。
我的应用程序使用户能够对上述内容进行赞成和反对,以及原因我想在 Solr 中跟踪这一点,而不仅仅是在数据库中,因为我希望能够在我的搜索
中考虑赞成票和反对票的数量。
这是一个问题,因为您不能简单地更新 solr 文档(即增加 up_votes 的数量),并且必须替换整个文档,考虑到需要访问我的数据库才能再次获取所有相关数据,这可能相当低效。
我意识到该解决方案可能需要不同的数据布局,或者可能需要多个索引(尽管我不知道是否可以跨 solr 核心查询/评分)。
有人能够就如何解决这个问题提供任何建议吗?
I have a Solr index with document fields something like:
id, body_text, date, num_upvotes, num_downvotes
In my application, a document is created with some integer id
and some body_text
(500 chars max). The date is set to the time of input, and num_upvotes
and num_downvotes
begin at 0.
My application gives users the ability to upvote and downvote the content mentioned above, and the reason I want to keep track of this in Solr instead of just the DB is that I want to be able to consider the number of upvotes and downvotes into my search
.
This is a problem because you can't simply update a solr document (i.e. increment number of up_votes) and you must replace the entire document, which is probably fairly inefficient considering it would require hitting my DB to grab all the relevant data again.
I realize the solution may require a different layout of data, or possibly multiple indexes (although I don't know if you can query/score across solr cores).
Is anyone able to offer any recommendations on how to tackle this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我在类似问题中使用的解决方案是更新数据库中的信息,并使用自上次更新以来修改的文档每十分钟执行一次 SOLR 更新/插入。
另外,每天晚上,当我没有太多流量时,我都会进行索引优化。
每次导入后,我都会在 SOLR 配置中设置一些预热查询。
在我的 SOLR 索引中,我有大约 150 万个文档,每个文档有 24 个字段,整个文档大约有 2000 个字符。
我每 10 分钟更新大约 500 个文档的索引(没有优化索引),并且执行大约 50 个热身查询,其中包括最常见的方面、最常用的过滤器查询和自由文本搜索。
我不会对性能产生负面影响。 (至少它是不可见的)-我的查询平均运行时间为 0.1 秒。 (在每 10 分钟更新一次之前,平均查询时间为 0.09 秒)
稍后编辑:
在此更新过程中我没有遇到任何问题。我总是从数据库中获取文档并使用 SOLR 的唯一键将它们插入。如果文档存在于 SOLR 中,它将被替换(这就是我所说的更新)。
更新 SOLR 的时间永远不会超过 3 分钟。事实上我每次更新后都会休息10分钟。所以我开始更新索引,等待它完成,然后再等待 10 分钟再次开始。
我没有查看整晚的性能,但对我来说这不相关,因为我想在用户访问高峰期间获得最新的数据信息。
A solution that I use in a similar problem is to update that information in database and do SOLR Updates/Inserts every ten minutes using the documents that were modified since the last update.
Also every night, when I don't have much traffic I do index optimize.
After each import I set up some warm-up queries in SOLR config.
In my SOLR index I have around 1.5 milion documents,each document has 24 fields, and around 2000 characters in the entire document.
I update the index every 10 minutes around 500 documents ( without optimizing the index ), and I do around 50 warmup queries comprised of most common facets, most used filter queries and free text search.
I don't get negative impact on performance. ( at least it is not visible ) - my queries run average in 0.1 seconds. ( before doing update at every 10 minutes average queries were 0.09 seconds)
LATER EDIT:
I didn't encounter any problems during this updates. I allways take the documents from database and insert them with a Unique key to SOLR. If the document exist in SOLR it is replaced ( this is what I mean by update).
It never takes more than 3 minutes to update SOLR. Actually I am doing 10 minutes break after each update. So I start the update of the index, I wait for it to finish, and then I wait another 10 minutes to start again.
I did not look on the performance over the night, but for me it is not relevant, as I want to have fresh information of data during the users visits peaks.
加入 功能可以在这里为您提供帮助。然后您可以将赞成/反对票存储在单独的文档中。
坏消息是您需要等到 Solr 4,除非您可以轻松地运行主干构建。
The Join feature would help you here. Then you could store the up/down votes in a separate document.
The bad news is that you need to wait until Solr 4 unless you're comfortable running with a trunk build.
如果您只想更新赞成/反对票。无需返回数据库,只需为您的应用程序使用适当的 Solr 客户端 并提取文档从索引中,根据需要设置向上/向下值,然后将文档重新插入索引中。
If you are only going to be updating the up/down votes. Instead of going back to the database, just use the appropriate Solr Client for your application and pull the document from the index, set the up/down values as needed and then reinsert the document back into the index.
SOLR 中没有解决您的问题的方法。您遇到了数据库问题,并且正在尝试使用搜索引擎来解决它。
处理此问题的最佳方法是保留一个
redis
数据库,记录来自 SOLR 的文档 ID
以及赞成/反对投票计数。然后您的应用程序可以在显示之前合并两个来源的数据。There is no solution to your problem within SOLR. You have a database problem and you are trying to solve it with a search engine.
The best way to deal with this is to keep a
redis
database that records thedocument id
from SOLR and the up/down vote counts. Then your app can merge the data from both sources before displaying.