在 Solr 中更新易失性数据有哪些策略?

发布于 2024-12-06 12:52:46 字数 238 浏览 0 评论 0原文

在 Solr 中更新易失性数据有哪些策略?想象一下,如果您需要在 Solr 索引中对 YouTube 视频数据进行建模:如何在不使 Solr 淹没在更新中的情况下保持“观看次数”数据最新?

我认为将“视图”数据存储在更擅长处理快速更新的不同数据存储(例如 MongoDB 或 Redis)中将是最好的主意。

但是使用该数据定期更新索引的最佳方法是什么?在这种情况下,增量导入有意义吗?增量导入对 Solr 运行查询的性能有何影响?

What are some strategies for updating volatile data in Solr? Imagine if you needed to model YouTube video data in a Solr index: how would you keep the "views" data fresh without swamping Solr in updates?

I would imagine that storing the "views" data in a different data store (something like MongoDB or Redis) that is better at handling rapid updates would be the best idea.

But what is the best way to update the index periodically with that data? Would a delta-import make sense in this context? What does a delta-import do to Solr in terms of performance for running queries?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鹿童谣 2024-12-13 12:52:46

首先你需要定义“新鲜”。

“新鲜”是1ms吗?如果是这样,当值(渲染的 html)到达浏览器时,由于网络延迟,它已经不再新鲜了。这真的重要吗?对于绝大多数情况,不需要真正的实时结果。

更常见的限制是 1 秒。在这种情况下,Solr 可以使用 RankingAlgorithm (插件)或 软提交(目前仅在 Solr 4.0 主干中可用)。

“Delta-import”是 DataImportHandler 中的一个术语,没有太多内在含义。从 Solr 服务器的角度来看,只有文档添加,它们来自哪里或一组文档是否代表“整个”数据集并不重要。

如果您希望某个项目在其创建/修改后 1 秒内建立索引,那么就这样做,在创建/修改后立即将其添加到 Solr(例如在 DAL 中使用挂钩)。这应该异步完成,并使用 RA 或软提交。

First you need to define "fresh".

Is "fresh" 1ms? If so, by the time the value (the rendered html) gets to the browser, it's not fresh anymore, due to network latency. Does that really matter? For the vast majority of cases, no, true real-time results are not needed.

A more common limit is 1s. In that case, Solr can deal with that with RankingAlgorithm (a plugin) or soft commits (currently available in Solr 4.0 trunk only).

"Delta-import" is a term from DataImportHandler that doesn't have much intrinsic meaning. From the point of view of a Solr server, there's only document additions, it doesn't matter where they come from or if a set of documents represent the "whole" dataset or not.

If you want to have an item indexed within 1s of its creation/modification, then do just that, add it to Solr just after it's created/modified (for example with a hook in your DAL). This should be done asynchronously, and use RA or soft commits.

酒中人 2024-12-13 12:52:46

您可能对所谓的“近实时搜索”(NRT)感兴趣,它现在可以在 Solr 的主干上使用,它的设计正是为了解决这个问题。请参阅 http://wiki.apache.org/solr/NearRealtimeSearch 了解更多信息和链接。

You might be interested in so-called "near-realtime search", or NRT, now available on Solr's trunk, which is designed to deal with exactly this problem. See http://wiki.apache.org/solr/NearRealtimeSearch for more info and links.

筱果果 2024-12-13 12:52:46

使用外部文件字段怎么样?
这有助于您在单独的文件中维护索引之外的数据,您可以定期刷新该文件,而无需对索引进行任何更改。

对于下载量、浏览量、排名等快速变化的数据,这可能是一个不错的选择。
更多信息@ http://lucene.apache.org/ solr/api/org/apache/solr/schema/ExternalFileField.html

这有一些限制,因此您需要根据您的需要进行检查。

How about using the external file field ?
This helps you to maintain data outside of your index in a separate file, which you can refresh periodically without any changes to the index.

For data such as downloads, views, rank which is fast changing data this can be an good option.
More info @ http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

This has some limitations, so you would need to check depending upon your needs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文