如何优化solr索引
如何优化solr索引。 我想优化我的 solr 索引,因为我尝试在 solrconfig.xml 中更改它的索引,但我想如何验证它们是否已优化以及索引优化涉及哪些内容。
How to optimize solr index.
I want to optimize my solr indexing for i try to change in solrconfig.xml it getting indexed but i want to how to verify that they are optimized and with which thing are involve in index optimization.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
在开始之前检查相应核心的大小。
打开终端 1:
打开终端 2 并执行:
更新您各自的核心名称,而不是“core”。
您可以看到核心的大小将逐渐增加大约是索引数据大小的两倍,然后突然减小。这将需要时间取决于您的 solr 数据。
例如,50G 索引数据峰值接近 90G,并下降至优化的 25G 数据。通常这个数据量需要30-45分钟。
为什么当我删除文档时我的索引目录不会(立即)变小?强制合并?优化?
Check the size of respective core before you start.
Open Terminal 1:
Open Terminal 2 and Execute:
Instead of "core", update your respective name of the core.
You could see the size of the core will increase gradually about double the size of your indexed data and will reduce suddenly. This will take time depends on your solr data.
For instance, 50G indexed data spikes nearly 90G and downs to optimized 25G data. And normally it will take 30-45min for this amount of data.
Why doesn't my index directory get smaller (immediately) when i delete documents? force a merge? optimize?
我发现这是优化 Solr 索引的最简单方法。在我的上下文中,“优化”意味着合并所有索引段。
I find this to be the easiest way to optimize a Solr index. In my context "optimize" means to merge all index segments.
您需要传递
optimize=true
来更新solr请求以优化solr。You need to pass
optimize=true
to update solr request to optimize the solr.有多种方法可以优化索引。
您可以触发 solr 基本脚本之一:
http://wiki.apache.org/solr/SolrOperationsTools#optimize
您还可以设置在(完全)导入或添加新数据时
optimize=true
。...或者简单地使用
optimize=true
触发提交也许这也可能满足您的需求:
http://wiki.apache.org/solr/UpdateXmlMessages#A .22commit.22_and_.22优化.22
There are different ways to optimize an index.
You could trigger one of the solr basic scripts:
http://wiki.apache.org/solr/SolrOperationsTools#optimize
You also could set
optimize=true
at an (full) import or while adding new data....or simply trigger an commit with
optimize=true
Maybe also this could be interesting for your needs:
http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
通过优化认为它是forceMerge。 Optimize操作重新组织Core(或每个分片)中的所有Segment并将它们合并为1个Segment(默认为1个Segment)
优化:您可以在solrconfig.xml中指定MergePolicy,以便Solr将自己合并Segment 。手动触发优化 http://hostname:port/solr//update?optimize=true&maxSegments=1'
回答您下一个问题 - 如何验证优化是否完成?您可以检查 Solr UI 中的 Core/Shard Overview 选项卡,该选项卡将表示段的计数。您还可以验证优化前后 /data/index 文件夹中段的大小。
Optimize/forceMerge表现更好,但仍然是昂贵的操作。
https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations:
“优化非常昂贵,如果指数不断变化,轻微的性能提升不会持续太久。”
By Optimise considering it to be forceMerge. The Optimize operation re-organizes all the Segments in a Core (or per shard) and merged them to 1 single Segment (default is 1 segment)
To optimise: You could specify MergePolicy in solrconfig.xml, so that Solr will merge segments by himself. To manually trigger the optimise http://hostname:port/solr/<COLLECTION_NAME>/update?optimize=true&maxSegments=1'
To answer you next question - how to verify if optimise is done or not ? You can check the Core/Shard Overview tab in the Solr UI which will denote the count of segment. You can also verify the size of segments in the /data/index folder before and after the optimise.
Optimize/forceMerge are better behaved, but still expensive operations.
https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations:
“Optimizing is very expensive, and if the index is constantly changing, the slight performance boost will not last long.”
要测试优化索引的更改量,只需编写自定义索引器并添加随机生成的内容即可。添加大量文档(500.000 或 1.000.000)并测量所需时间。
根据上面分享的文章,我为自己制作了一个自定义索引器,并设法将索引文档所需的时间优化了 80%。
For testing how much a change you do optimize the indexing, just write a custom indexer and add random generated content. Add a big number of documents (500.000 or 1.000.000) and measure the time it takes.
Following the articles shared above I made to myself a custom indexer and I mananged to optimize the time it took to index documents by 80%.
当谈到 Solr 核心/分片数据的优化时,就像运行这样的命令一样简单:
但请注意,这不是免费的 - 如果您有大量数据,您最终可能会得到相当多的数据Solr 节点上的 I/O 以及该过程本身需要花费大量时间。在大多数情况下,您希望从调整合并过程开始,而不是强制合并索引本身。
我在 Lucene/Solr 革命期间就该主题做了一次演讲 - 如果您想查看幻灯片,这里的视频是一个链接:https://sematext.com/blog/solr-optimize-is-not-bad-for-you-lucene-solr-revolution/
When it comes to optimization of Solr core/shard data it is as easy as running a command like this:
But be aware that this doesn't come for free - if you have a lot of data you may end up with quite a lot of I/O on Solr nodes and the process itself taking a lot of time. In most cases, you want to start with tuning the merge process, not force merging the index itself.
I did a talk on that topic during Lucene/Solr revolution - if you would like to have a look at the slides and the video here is a link: https://sematext.com/blog/solr-optimize-is-not-bad-for-you-lucene-solr-revolution/
如果您有权访问 Solr 基于 Web 的 UI,则可以通过导航到要优化的核心来完成此操作,然后:
Request-Handler
到/update
(这是默认值),文档类型为XML
(这对于 JSON 来说是可能的,但是...)这将启动优化过程。
If you have access to the Solr web-based UI, this can be done there by navigating to the core you want to optimize, then:
Request-Handler
to/update
(which is the default) and the document type toXML
(this may be possible with JSON, but...)<optimize/>
into the "Document(s)" text areaThis will kick-off the optimization process.