为什么我的 solr 从属索引不断增长?

发布于 2024-09-08 06:25:13 字数 1549 浏览 5 评论 0原文

我有一个 5 核 solr 1.4 master,它使用 solr 复制复制到另一个 5 核 solr,如 此处所述。所有写入都是针对主服务器完成的,并间歇性地复制到从服务器。这是使用以下顺序完成的:

  1. 在每个主核心上提交
  2. 在每个从核心上复制 在每个
  3. 从核心上优化 在
  4. 每个从核心上提交

我遇到的问题是从核心似乎保留旧索引文件并占用更多磁盘空间。例如,经过 3 次复制后,主核心数据目录如下所示:

$ du -sh *
145M    index

但同一核心的从属上的数据目录如下所示:

$ du -sh *
300M    index
144M    index.20100621042048
145M    index.20100629035801
4.0K    index.properties
4.0K    replication.properties

这是 index.properties 的内容:

#index properties
#Tue Jun 29 15:58:13 CDT 2010
index=index.20100629035801

和replication.properties:

#Replication details
#Tue Jun 29 15:58:13 CDT 2010
replicationFailedAtList=1277155032914
previousCycleTimeInSeconds=12
timesFailed=1
indexReplicatedAtList=1277845093709,1277155253911,1277155032914
indexReplicatedAt=1277845093709
replicationFailedAt=1277155032914
lastCycleBytesDownloaded=150616512
timesIndexReplicated=3

这个的 solrconfig.xml从属包含默认删除策略:

[...]
<mainIndex>
    <unlockOnStartup>false</unlockOnStartup>
    <reopenReaders>true</reopenReaders>
    <deletionPolicy class="solr.SolrDeletionPolicy">
        <str name="maxCommitsToKeep">1</str>
        <str name="maxOptimizedCommitsToKeep">0</str>
    </deletionPolicy>
</mainIndex>
[...]

我缺少什么?

I have a 5-core solr 1.4 master that is replicated to another 5-core solr using solr replication as described here. All writes are done against the master and replicated to the slave intermittently. This is done using the following sequence:

  1. Commit on each master core
  2. Replicate on each slave core
  3. Optimize on each slave core
  4. Commit on each slave core

The problem I am having is that the slave seems to be keeping around old index files and taking up ever more disk space. For example, after 3 replications, the master core data directory looks like this:

$ du -sh *
145M    index

But the data directory on the slave of the same core looks like this:

$ du -sh *
300M    index
144M    index.20100621042048
145M    index.20100629035801
4.0K    index.properties
4.0K    replication.properties

Here's the contents of index.properties:

#index properties
#Tue Jun 29 15:58:13 CDT 2010
index=index.20100629035801

And replication.properties:

#Replication details
#Tue Jun 29 15:58:13 CDT 2010
replicationFailedAtList=1277155032914
previousCycleTimeInSeconds=12
timesFailed=1
indexReplicatedAtList=1277845093709,1277155253911,1277155032914
indexReplicatedAt=1277845093709
replicationFailedAt=1277155032914
lastCycleBytesDownloaded=150616512
timesIndexReplicated=3

The solrconfig.xml for this slave contains the default deletion policy:

[...]
<mainIndex>
    <unlockOnStartup>false</unlockOnStartup>
    <reopenReaders>true</reopenReaders>
    <deletionPolicy class="solr.SolrDeletionPolicy">
        <str name="maxCommitsToKeep">1</str>
        <str name="maxOptimizedCommitsToKeep">0</str>
    </deletionPolicy>
</mainIndex>
[...]

What am I missing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小耗子 2024-09-15 06:25:14

在slave上进行提交和优化是没有用的。由于所有写操作都在主设备上完成,因此它是这些操作应该发生的唯一位置。

这可能是问题的原因:由于您在从属设备上进行了额外的提交和优化,因此它在从属设备上保留了更多提交点。但这只是一个猜测,应该更容易理解主服务器和从服务器上的完整 solrconfig.xml 会发生什么。

It is useless to commit and optimize on the slaves. Since all the write operations are done on the master, it is the only place where those operations should occur.

This may be the cause of the problem: since you do an additional commit and optimize on the slaves, it keeps more commit points on the slaves. But this is only a guess, it should be easier to understand what happens with your full solrconfig.xml on both the master and the slaves.

孤单情人 2024-09-15 06:25:14

在从站上进行的优化导致索引的大小增加了一倍。优化时,将创建单独的索引段,以将原始索引重写为优化期间提到的段数(默认为 1)。
最佳实践是偶尔优化一次,不要在任何事件中调用它(运行 cron 作业或其他操作),并且仅在主服务器而不是从服务器上进行优化。从站将通过复制获得这些新的段。
您应该在从站上提交,索引重新加载将在复制后处理从站上新文档的可用性。

the optimize that's done at slave is causing the index to double its size. on optimize separate index segments will be created to rewrite the original index into number of segments mentioned during optimize (default is 1).
Best practice is to optimize once in a while don't invoke it at any event (run a cron job or something) and optimize only at master not at slave. slaves will get these new segments through replication.
You shouldn commit at slave, index reload will take care of the availability of new docs at slave after replication.

谁的年少不轻狂 2024-09-15 06:25:14

我确定在完全重新加载主文件后进行复制时,额外的 index.* 目录似乎被留下了。我所说的“完全重新加载”的意思是停止主服务器,删除 [core]/data/* 下的所有内容,重新启动(此时 solr 创建一个新索引),为所有文档建立索引,然后复制。

基于一些额外的测试,我发现删除其他 index* 目录(除了 [core]/data/index.properties 中指定的目录之外)似乎是安全的。如果我对这种解决方法不满意,我可能会决定在完全重新加载主服务器后第一次复制之前清空从服务器索引(停止;删除数据/*;启动)。

I determined that the extra index.* directories seem to be left behind when I replicate after completely reloading the master. What I mean by "completely reloading" is stopping the master, deleting everything under [core]/data/*, restarting (at which point solr creates a new index), indexing all of our docs, then replicating.

Based on some additional testing, I have found that it seems to be safe to remove the other index* directories (other than the one specified in [core]/data/index.properties). If I'm not comfortable with that workaround I may decide to empty the slave index (stop; delete data/*; start) before replicating the first time after completely reloading the master.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文