如何优化solr索引

发布于 2024-11-27 20:18:09 字数 92 浏览 1 评论 0原文

如何优化solr索引。 我想优化我的 solr 索引,因为我尝试在 solrconfig.xml 中更改它的索引,但我想如何验证它们是否已优化以及索引优化涉及哪些内容。

How to optimize solr index.
I want to optimize my solr indexing for i try to change in solrconfig.xml it getting indexed but i want to how to verify that they are optimized and with which thing are involve in index optimization.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

悲歌长辞 2024-12-04 20:18:10

在开始之前检查相应核心的大小。

打开终端 1:

watch -n 10 "du -sh /path to core/data/*"

打开终端 2 并执行:

curl http://hostname:8980/solr/<core>/update?optimize=true

更新您各自的核心名称,而不是“core”。

您可以看到核心的大小将逐渐增加大约是索引数据大小的两倍,然后突然减小。这将需要时间取决于您的 solr 数据。

例如,50G 索引数据峰值接近 90G,并下降至优化的 25G 数据。通常这个数据量需要30-45分钟。

为什么当我删除文档时我的索引目录不会(立即)变小?强制合并?优化?

Check the size of respective core before you start.

Open Terminal 1:

watch -n 10 "du -sh /path to core/data/*"

Open Terminal 2 and Execute:

curl http://hostname:8980/solr/<core>/update?optimize=true

Instead of "core", update your respective name of the core.

You could see the size of the core will increase gradually about double the size of your indexed data and will reduce suddenly. This will take time depends on your solr data.

For instance, 50G indexed data spikes nearly 90G and downs to optimized 25G data. And normally it will take 30-45min for this amount of data.

Why doesn't my index directory get smaller (immediately) when i delete documents? force a merge? optimize?

梦年海沫深 2024-12-04 20:18:10

我发现这是优化 Solr 索引的最简单方法。在我的上下文中,“优化”意味着合并所有索引段。

curl http://localhost:8983/solr/<core_name>/update -F stream.body=' <optimize />'

I find this to be the easiest way to optimize a Solr index. In my context "optimize" means to merge all index segments.

curl http://localhost:8983/solr/<core_name>/update -F stream.body=' <optimize />'
很酷不放纵 2024-12-04 20:18:10

您需要传递optimize=true来更新solr请求以优化solr。

http://[主机名]:[端口]/solr/update?优化=true

You need to pass optimize=true to update solr request to optimize the solr.

http://[HostName]:[port]/solr/update?optimize=true

北方的韩爷 2024-12-04 20:18:10

有多种方法可以优化索引。
您可以触发 solr 基本脚本之一:
http://wiki.apache.org/solr/SolrOperationsTools#optimize

您还可以设置在(完全)导入或添加新数据时optimize=true
...或者简单地使用 optimize=true 触发提交

也许这也可能满足您的需求:
http://wiki.apache.org/solr/UpdateXmlMessages#A .22commit.22_and_.22优化.22

There are different ways to optimize an index.
You could trigger one of the solr basic scripts:
http://wiki.apache.org/solr/SolrOperationsTools#optimize

You also could set optimize=true at an (full) import or while adding new data.
...or simply trigger an commit with optimize=true

Maybe also this could be interesting for your needs:
http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22

情魔剑神 2024-12-04 20:18:10

通过优化认为它是forceMerge。 Optimize操作重新组织Core(或每个分片)中的所有Segment并将它们合并为1个Segment(默认为1个Segment)

优化:您可以在solrconfig.xml中指定MergePolicy,以便Solr将自己合并Segment 。手动触发优化 http://hostname:port/solr//update?optimize=true&maxSegments=1'

回答您下一个问题 - 如何验证优化是否完成?您可以检查 Solr UI 中的 Core/Shard Overview 选项卡,该选项卡将表示段的计数。您还可以验证优化前后 /data/index 文件夹中段的大小。

Optimize/forceMerge表现更好,但仍然是昂贵的操作。

https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

“优化非常昂贵,如果指数不断变化,轻微的性能提升不会持续太久。”

By Optimise considering it to be forceMerge. The Optimize operation re-organizes all the Segments in a Core (or per shard) and merged them to 1 single Segment (default is 1 segment)

To optimise: You could specify MergePolicy in solrconfig.xml, so that Solr will merge segments by himself. To manually trigger the optimise http://hostname:port/solr/<COLLECTION_NAME>/update?optimize=true&maxSegments=1'

To answer you next question - how to verify if optimise is done or not ? You can check the Core/Shard Overview tab in the Solr UI which will denote the count of segment. You can also verify the size of segments in the /data/index folder before and after the optimise.

segment count in the statistics of Solr overview

Optimize/forceMerge are better behaved, but still expensive operations.

https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations:

“Optimizing is very expensive, and if the index is constantly changing, the slight performance boost will not last long.”

锦上情书 2024-12-04 20:18:10

要测试优化索引的更改量,只需编写自定义索引器并添加随机生成的内容即可。添加大量文档(500.000 或 1.000.000)并测量所需时间。

根据上面分享的文章,我为自己制作了一个自定义索引器,并设法将索引文档所需的时间优化了 80%。

For testing how much a change you do optimize the indexing, just write a custom indexer and add random generated content. Add a big number of documents (500.000 or 1.000.000) and measure the time it takes.

Following the articles shared above I made to myself a custom indexer and I mananged to optimize the time it took to index documents by 80%.

我的影子我的梦 2024-12-04 20:18:10

当谈到 Solr 核心/分片数据的优化时,就像运行这样的命令一样简单:

curl http://hostname:8980/solr/<COLLECTION_NAME>/update?optimize=true'

但请注意,这不是免费的 - 如果您有大量数据,您最终可能会得到相当多的数据Solr 节点上的 I/O 以及该过程本身需要花费大量时间。在大多数情况下,您希望从调整合并过程开始,而不是强制合并索引本身。

我在 Lucene/Solr 革命期间就该主题做了一次演讲 - 如果您想查看幻灯片,这里的视频是一个链接:https://sematext.com/blog/solr-optimize-is-not-bad-for-you-lucene-solr-revolution/

When it comes to optimization of Solr core/shard data it is as easy as running a command like this:

curl http://hostname:8980/solr/<COLLECTION_NAME>/update?optimize=true'

But be aware that this doesn't come for free - if you have a lot of data you may end up with quite a lot of I/O on Solr nodes and the process itself taking a lot of time. In most cases, you want to start with tuning the merge process, not force merging the index itself.

I did a talk on that topic during Lucene/Solr revolution - if you would like to have a look at the slides and the video here is a link: https://sematext.com/blog/solr-optimize-is-not-bad-for-you-lucene-solr-revolution/

四叶草在未来唯美盛开 2024-12-04 20:18:10

如果您有权访问 Solr 基于 Web 的 UI,则可以通过导航到要优化的核心来完成此操作,然后:

  1. 打开“文档”页面/菜单/其他
  2. 设置 Request-Handler/update (这是默认值),文档类型为 XML (这对于 JSON 来说是可能的,但是...)
  3. 输入 进入“文档”文本区域
  4. 提交文档

这将启动优化过程。

If you have access to the Solr web-based UI, this can be done there by navigating to the core you want to optimize, then:

  1. Open the "Documents" page/menu/whatever
  2. Set the Request-Handler to /update (which is the default) and the document type to XML (this may be possible with JSON, but...)
  3. Enter <optimize/> into the "Document(s)" text area
  4. Submit the document

This will kick-off the optimization process.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文