Lucene 中的段是什么?
Lucene 中的段是什么?
细分有什么好处?
What are segments in Lucene?
What are the benefits of segments?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
Lucene 中的段是什么?
细分有什么好处?
What are segments in Lucene?
What are the benefits of segments?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
Lucene 索引被分割成更小的块,称为段。每个段都有自己的索引。 Lucene 按顺序搜索所有这些。
当新的写入器打开以及写入器提交或关闭时,会创建一个新段。
使用该系统的优点是一旦创建了段,您就不必修改它的文件。当您在索引中添加新文档时,它们会添加到下一个段中。以前的段永远不会被修改。
删除文档只需在文件中指示删除某个段的哪个文档即可完成,但物理上,该文档始终保留在该段中。 Lucene 中的文档并没有真正更新。所发生的情况是,文档的先前版本在其原始段中被标记为已删除,而文档的新版本被添加到当前段中。这可以最大程度地减少因在发生更改时必须不断修改索引内容而损坏索引的可能性。它还允许在不同机器之间轻松备份和同步索引。
然而,在某些时候,Lucene 可能会决定合并某些段。此操作也可以通过优化来触发。
The Lucene index is split into smaller chunks called segments. Each segment is its own index. Lucene searches all of them in sequence.
A new segment is created when a new writer is opened and when a writer commits or is closed.
The advantages of using this system are that you never have to modify the files of a segment once it is created. When you are adding new documents in your index, they are added to the next segment. Previous segments are never modified.
Deleting a document is done by simply indicating in a file which document of a segment is deleted, but physically, the document always stays in the segment. Documents in Lucene aren't really updated. What happens is that the previous version of the document is marked as deleted in its original segment and the new version of the document is added to the current segment. This minimizes the chances of corrupting an index by constantly having to modify its content when there are changes. It also allows for easy backup and synchronization of the index across different machines.
However, at some point, Lucene may decide to merge some segments. This operation can also be triggered with an optimize.
段非常简单索引的一部分。这个想法是,您可以通过创建一个仅包含新文档的新段来将文档添加到当前正在提供服务的索引中。这样,您就不必为了向索引添加新文档而频繁重建整个索引,从而产生昂贵的麻烦。
A segment is very simply a section of the index. The idea is that you can add documents to the index that's currently being served by creating a new segment with only new documents in it. This way, you don't have to go to the expensive trouble of rebuilding your entire index frequently in order to add new documents to the index.
其他人已经回答了细分市场的好处。我将附上 Lucene 索引的 ASCII 图。
Lucene 段
Lucene 段是索引的一部分。每个段由多个索引文件组成。如果您查看这些文件中的任何一个,您将看到它包含 1 个或多个 Lucene文档。
参考
Lucene in Action 第二版 - 2010 年 7 月 - Manning Publication
The segment benefits have been answered already by others. I will include an ascii diagram of a Lucene Index.
Lucene Segment
A Lucene segment is part of an Index. Each segment is composed of several index files. If you look inside any of these files, you will see that it holds 1 or more Lucene documents.
Reference
Lucene in Action Second Edition - July 2010 - Manning Publication