仅 Solr 与 Solr/MySQL 解决方案

发布于 2024-12-07 17:29:49 字数 524 浏览 1 评论 0原文

目前我有一个系统,它完全基于Solr。这意味着,我将所有数据存储在 Solr 中(使用 SolrJ),不涉及其他数据存储。现在的问题是,我遇到了一些性能问题。我认为,在 MySQL 中存储然后将数据与 Solr 同步(例如 DataImportHandler。这样我就有了对Solr索引的读取操作和MySQL中的主要写入操作,然后在与Solr同步时有时只有Solr写入操作。

问题是,我期望应该存储数亿个文档,但我现在不知道 MySQL/Solr 是否有意义。

还有其他更好的解决方案吗?也许 Master-Solr 用于写入,Solr-slaves 用于读取?

更新:我忘记说的是,在 schema.xml 发生更改的情况下,“在 MySQL 中存储数据”解决方案在我看来可能很有用,因为这样我就可以重新提交所有内容数据而不关心Solr的自存储数据。

Currently I have a system, which is based solely on Solr. Which means, that I store all data in Solr (using SolrJ) with no other datastore involved. The problem is now, that I experience some performance issues. I thought, that it maybe could make sense to store in MySQL and then synchronize the data with Solr with e.g. the DataImportHandler. So that I have the reading operations on the Solr index and the main writing operations in MySQL and then sometimes only Solr-Writing operations when synchronizing with Solr.

The thing is that I expect hundreds of millions documents which should be stored and I don't really now if that the MySQL/Solr makes sense.

Is there another better solution? Maybe Master-Solr for writing and Solr-slaves for reading?

Update: What I forgot to say is, that also in case of a schema.xml change, the "storing data in MySQL" solution could be useful in my opinion, because then I can re-commit all the data without caring about Solr's self-stored data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

挽你眉间 2024-12-14 17:29:49

最好不要使用相同的 Solr 实例进行读取和写入,因为写入期间 Solr 上的活动(包括提交和优化)会严重影响读取操作。

主从配置将是更好的方法,主设备主要用于写入,从设备用于只读目的。
从站会定期刷新主站的内容。 (所以会有一些延迟)
您始终可以通过添加多个从属设备来进行扩展。

使用 MySQL 作为主从 Solr 的持久存储将是最好的方法。
MySQL 提供稳定的数据存储,并可以防止索引损坏或其他导致数据丢失的问题。
使用数据导入处理程序,您可以通过增量更新轻松完成此操作,但最新数据出现在从属设备上时会有更多时间标记。
这样,您还可以使用索引交换进行完全刷新。

如果索引增长得非常可维护并且对性能产生影响,您可能需要检查 solr 分片。

Its not preferable to use the same Solr instance for both reading and writing as the activities (with commit and optimize) on Solr during writing would heavily impact the read operations.

Master - Slave confgurations would be nicer approach, with master primarily for writes and slaves for read only purposes.
Slaves being periodically refreshed with the contents from Master. (So there would be some delay)
You can always scale by adding multiple slaves.

Using MySQL as a persistant store with Master-Slave Solr would be a best approach.
MySQL providing a stable data store, and would guard you against index corruption or some more issues which would result in data lost.
Using dataimport handler you can do it easily with incremental updates, but there would be more time tag for latest data to appear on slaves.
With this you can also use Index swapping for full refreshes.

In case the index grows up hugh to be be maintainable and has performance impact, you may want to check solr shards.

若水般的淡然安静女子 2024-12-14 17:29:49

我也考虑过同样的问题:将所有内容存储在 mySql 中的 solr 或 stor 中,并将索引存储在 Solr 中。

我决定采用第二种方式:使用 MySQL 存储并在 solr 中建立索引。

原因是:MySql 中的数据处理(读取和写入数据)比 Solr 好得多。许多开箱即用的工具也支持/可能从 MySql 导入/导出数据。
下一点:备份。与 Solr 索引相比,备份 MySql DB 的既定方法要多得多。

当然,对于全文搜索,Solr 比 MySql 好得多。所以我决定,每个人都应该在他最了解的地方工作。
供您参考:我正在谈论一个中等索引:4GB,可容纳数百万个文档。

//编辑:不要忘记,某些功能需要 lucene 中的数据(不仅是索引),例如突出显示。如果你需要这个,你必须将文档存储在 solr 中(附加)。另一种方法可以是在客户端实现这些功能。 (我是这样做的)

I also thought about the same issue: storing everything in solr or stor in mySql and index in Solr.

I decided to go the 2nd way: store with MySQL and index in solr.

The reason: handling of data (reading and writing data) in MySql is much better than by Solr. Also data import/export from/to MySql is supported/possible by lots of tools, out of the box.
Next Point: Backup. There are much more established ways for backing up an MySql DB than an Solr index.

Of course, for fulltext-search, Solr is much more better than MySql. So i decided, that everyone should have to work where he knows best.
For your Information: i'm talking about an medium Index: 4GB for some million documents.

//Edit: don't forgett, that some features requiere stared data in lucene (not only indexed), like highlighting. If you need this, you have to store the documents in solr (additional). An alternative way could be implementing those features on client-side. (I did it this way)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文