Solr 文档的频繁更新 - 效率/可扩展性问题

发布于 2024-12-15 23:21:28 字数 554 浏览 2 评论 0原文

我有一个带有文档字段的 Solr 索引,如下所示:

id, body_text, date, num_upvotes, num_downvotes

在我的应用程序中,使用一些整数 id 和一些 body_text (最多 500 个字符)创建文档。日期设置为输入时间,num_upvotesnum_downvotes从0开始。

我的应用程序使用户能够对上述内容进行赞成和反对,以及原因我想在 Solr 中跟踪这一点,而不仅仅是在数据库中,因为我希望能够在我的搜索中考虑赞成票和反对票的数量。

这是一个问题,因为您不能简单地更新 solr 文档(即增加 up_votes 的数量),并且必须替换整个文档,考虑到需要访问我的数据库才能再次获取所有相关数据,这可能相当低效。

我意识到该解决方案可能需要不同的数据布局,或者可能需要多个索引(尽管我不知道是否可以跨 solr 核心查询/评分)。

有人能够就如何解决这个问题提供任何建议吗?

I have a Solr index with document fields something like:

id, body_text, date, num_upvotes, num_downvotes

In my application, a document is created with some integer id and some body_text (500 chars max). The date is set to the time of input, and num_upvotes and num_downvotes begin at 0.

My application gives users the ability to upvote and downvote the content mentioned above, and the reason I want to keep track of this in Solr instead of just the DB is that I want to be able to consider the number of upvotes and downvotes into my search.

This is a problem because you can't simply update a solr document (i.e. increment number of up_votes) and you must replace the entire document, which is probably fairly inefficient considering it would require hitting my DB to grab all the relevant data again.

I realize the solution may require a different layout of data, or possibly multiple indexes (although I don't know if you can query/score across solr cores).

Is anyone able to offer any recommendations on how to tackle this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

波浪屿的海角声 2024-12-22 23:21:28

我在类似问题中使用的解决方案是更新数据库中的信息,并使用自上次更新以来修改的文档每十分钟执行一次 SOLR 更新/插入。

另外,每天晚上,当我没有太多流量时,我都会进行索引优化。
每次导入后,我都会在 SOLR 配置中设置一些预热查询。

在我的 SOLR 索引中,我有大约 150 万个文档,每个文档有 24 个字段,整个文档大约有 2000 个字符。
我每 10 分钟更新大约 500 个文档的索引(没有优化索引),并且执行大约 50 个热身查询,其中包括最常见的方面、最常用的过滤器查询和自由文本搜索。

我不会对性能产生负面影响。 (至少它是不可见的)-我的查询平均运行时间为 0.1 秒。 (在每 10 分钟更新一次之前,平均查询时间为 0.09 秒)

稍后编辑:

在此更新过程中我没有遇到任何问题。我总是从数据库中获取文档并使用 SOLR 的唯一键将它们插入。如果文档存在于 SOLR 中,它将被替换(这就是我所说的更新)。

更新 SOLR 的时间永远不会超过 3 分钟。事实上我每次更新后都会休息10分钟。所以我开始更新索引,等待它完成,然后再等待 10 分钟再次开始。

我没有查看整晚的性能,但对我来说这不相关,因为我想在用户访问高峰期间获得最新的数据信息。

A solution that I use in a similar problem is to update that information in database and do SOLR Updates/Inserts every ten minutes using the documents that were modified since the last update.

Also every night, when I don't have much traffic I do index optimize.
After each import I set up some warm-up queries in SOLR config.

In my SOLR index I have around 1.5 milion documents,each document has 24 fields, and around 2000 characters in the entire document.
I update the index every 10 minutes around 500 documents ( without optimizing the index ), and I do around 50 warmup queries comprised of most common facets, most used filter queries and free text search.

I don't get negative impact on performance. ( at least it is not visible ) - my queries run average in 0.1 seconds. ( before doing update at every 10 minutes average queries were 0.09 seconds)

LATER EDIT:

I didn't encounter any problems during this updates. I allways take the documents from database and insert them with a Unique key to SOLR. If the document exist in SOLR it is replaced ( this is what I mean by update).

It never takes more than 3 minutes to update SOLR. Actually I am doing 10 minutes break after each update. So I start the update of the index, I wait for it to finish, and then I wait another 10 minutes to start again.

I did not look on the performance over the night, but for me it is not relevant, as I want to have fresh information of data during the users visits peaks.

绻影浮沉 2024-12-22 23:21:28

加入 功能可以在这里为您提供帮助。然后您可以将赞成/反对票存储在单独的文档中。

坏消息是您需要等到 Solr 4,除非您可以轻松地运行主干构建。

The Join feature would help you here. Then you could store the up/down votes in a separate document.

The bad news is that you need to wait until Solr 4 unless you're comfortable running with a trunk build.

孤者何惧 2024-12-22 23:21:28

如果您只想更新赞成/反对票。无需返回数据库,只需为您的应用程序使用适当的 Solr 客户端 并提取文档从索引中,根据需要设置向上/向下值,然后将文档重新插入索引中。

If you are only going to be updating the up/down votes. Instead of going back to the database, just use the appropriate Solr Client for your application and pull the document from the index, set the up/down values as needed and then reinsert the document back into the index.

醉城メ夜风 2024-12-22 23:21:28

SOLR 中没有解决您的问题的方法。您遇到了数据库问题,并且正在尝试使用搜索引擎来解决它。

处理此问题的最佳方法是保留一个 redis 数据库,记录来自 SOLR 的文档 ID 以及赞成/反对投票计数。然后您的应用程序可以在显示之前合并两个来源的数据。

There is no solution to your problem within SOLR. You have a database problem and you are trying to solve it with a search engine.

The best way to deal with this is to keep a redis database that records the document id from SOLR and the up/down vote counts. Then your app can merge the data from both sources before displaying.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文