Solr 用于不断更新索引

发布于 2024-09-09 02:29:12 字数 284 浏览 6 评论 0原文

我有一个新闻网站,其中有 150,000 篇新闻文章。每天大约有 250 篇新文章添加到数据库中,间隔时间为 5-15 分钟。我知道 Solr 针对数百万条记录进行了优化,我的 150K 对它来说不是问题。但我担心频繁的更新会成为一个问题,因为每次更新缓存都会失效。在我的开发服务器中,页面的冷加载需要 5-7 秒才能加载(因为每个页面都运行一些 MLT 查询)。

如果我将索引分成两个 - 存档索引和最新索引,会有帮助吗?归档索引每天更新一次。

谁能建议任何方法来优化我的安装以实现不断更新的索引?

谢谢

I have a news site with 150,000 news articles. About 250 new articles are added daily to the database at an interval of 5-15 minutes. I understand that Solr is optimized for millions of records and my 150K won't be a problem for it. But I am worried the frequent updation will be a problem, since the cache gets invalidated with every update. In my dev server, cold load of a page takes 5-7 seconds to load (since every page runs a few MLT queries).

Will it help, if I split my index into two - An archive index and a latest index. The archive index will be updated once every day.

Can anyone suggest any ways to optimize my installation for a constantly updating index?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

因为看清所以看轻 2024-09-16 02:29:12

我的答案是:测试一下!如果您不知道它的性能如何,请不要尝试优化。就像你说的,150K 并不是很多,应该很快为你的测试构建一个这个大小的索引。之后,从不同的并发线程运行几个 MLT 查询(以模拟用户),同时索引更多文档以查看其行为。

您应该关注的一项设置是自动提交。由于您不断建立索引,因此您无法提交每个文档(您将导致 Solr 崩溃)。您为此设置选择的值将允许您调整系统的延迟(在结果中返回新文档所需的时间),同时保持系统响应。

My answer is: test it! Don't try to optimize yet if you don't know how it performs. Like you said, 150K is not a lot, it should be quick to build an index of that size for your tests. After that, run a couple of MLT queries from a different concurrent threads (to simulate users) while you index more documents to see how it behaves.

One setting that you should keep an eye on is auto-commit. Since you are indexing constantly, you can't commit at each document (you will bring Solr down). The value that you will choose for this setting will let you tune the latency of the system (how many times it takes for new documents to be returned in results) while keeping the system responsive.

梦境 2024-09-16 02:29:12

考虑在主查询中使用 mlt=true,而不是针对每个结果发出 MoreLikeThis 查询。您将节省往返时间,因此速度会更快。

Consider using mlt=true in the main query instead of issuing per-result MoreLikeThis queries. You'll save the roundtrips and so it will be faster.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文