避免在重建期间删除当前的 Lucene.NET 索引

发布于 2024-10-11 04:58:00 字数 636 浏览 5 评论 0 原文

我是 Lucene.NET 的新手,但我正在使用为 ="http://www.sitecore.net" rel="noreferrer">Sitecore CMS,使用 Lucene.NET 对 CMS 中的大量内容进行索引。我昨天确认,当我重建索引时,当前的索引文件会被擦除干净,因此任何依赖于该索引的内容在大约 30-60 秒(完整索引重建的时间)内不会获取任何数据。是否有最佳实践或方法使 Lucene.NET 在新索引完全重建之前不会覆盖当前索引文件?我基本上认为我希望它写入新的临时索引文件,并且当重建完成时让这些文件覆盖当前索引。

我正在谈论的示例:

  • 构建新索引(约 30 秒)
  • 索引大约有 500 个文档
  • 使用代码访问索引中的数据并在网站上显示
  • 重建索引(约 30 秒)
    • 现在读取数据索引的任何代码都不会返回任何内容,因为索引文件正在被覆盖;结果网站不显示任何数据
  • 重建完成:数据现在再次可用,数据返回网站

提前致谢

I'm new to Lucene.NET but I'm using an open source tool built for Sitecore CMS that uses Lucene.NET to index lots of content from the CMS. I confirmed yesterday that when I rebuild my indexes, the current index files wipe clean so anything that relies on the index gets no data for about 30-60 seconds (the amount of time for a full index rebuild). Is there a best practice or way to make Lucene.NET not overwrite the current index files until the new index is completely rebuilt? I'm basically thinking I'd like it to write to new temp index files and when the rebuild is done have those files overwrite the current index.

Example of what I'm talking about:

  • Build fresh index (~30 seconds)
  • Index has about 500 documents
  • Use code to access data in index and display on website
  • Rebuild index (~30 seconds)
    • Any code that now reads the index for data returns nothing because the index files are being overwritten; results in website not showing any data
  • Rebuild complete: data now available again, data back on website

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

隐诗 2024-10-18 04:58:00

我对“Sitecore”本身没有任何经验,但这是我的故事。

我们最近为我们的电子商务子系统引入了基于索引的搜索(使用 Lucene.Net)。我们案例的索引更新过程可能需要大约半小时(约 50,000 个产品本身 + 许多相关信息)。为了防止在索引更新期间出现“拒绝服务”响应,我们首先创建索引的“备份”版本(只需将索引目录复制到另一个位置),并且所有进一步的请求都将重定向以使用此“备份”版本。索引更新完成后,我们删除备份,以便客户端开始使用索引的更新(或“实时”)版本。如果在更新过程中可能发生任何未处理的异常,这也很有帮助,因为您最终可能会遇到根本没有索引的情况(在我们的情况下,客户端始终可以使用“备份”版本)。

Lucene.Net.Index.IndexWriter 对象的 API 参考 (Lucene 2.4) 声明如下:

请注意,您可以使用以下命令打开索引
create=true 即使读者正在
使用索引。老读者会
继续搜索“时间点”
他们已经打开的快照,但不会
查看新创建的索引,直到它们
重新打开。

因此,至少您不应该担心当前正在您的索引中搜索的客户端。

希望这会帮助您做出正确的决定。

I have no experience with "Sitecore" itself but here's my story.

We've recently incorporated the index-based search (using Lucene.Net) for our eCommerce sub-system. The index update process for our case might take about half a hour (~50,000 products themselves + lots of related information). To prevent a "denial of service" responses during the update of the index we first create a "backup" version of the it (simply copying index directory to another location) and all further requests are redirected to use this "backup" version. When the index update is completed we delete the backup in order for clients to start using the updated (or "live") version of the index. This is also helps in case of any unhandled exceptions that might occur during the update process becase you might end up in a situation of having no index at all (and in our case clients can always use the "backup" version).

The API reference (Lucene 2.4) of the Lucene.Net.Index.IndexWriter object states the following:

Note that you can open an index with
create=true even while readers are
using the index. The old readers will
continue to search the "point in time"
snapshot they had opened, and won't
see the newly created index until they
re-open.

So at least you shouldn't worry about the clients that are currently searching within your index.

Hope this will help you to make a right decision.

画骨成沙 2024-10-18 04:58:00

我不熟悉那个 sitecore 工具,但我可以回答你如何使用纯 Lucene.Net 来做到这一点:你应该使用 NRT 设置,这意味着“拥有一个索引编写器并且永不关闭它”。

基本上,索引编写器在内存中拥有一个“虚拟”索引,直到将其刷新到磁盘为止。因此,只要您从作者那里获得读者,您就总是会看到最新的内容,即使它尚未刷新到磁盘。

I'm not familiar with that sitecore tool, but I can answer how you would do it with pure Lucene.Net: you should use an NRT setup, which means "have one index writer and never close it."

Basically, index writers have a "virtual" index in memory until it gets flushed to disk. So as long as you get your readers from the writer, you'll always see the latest stuff, even if it hasn't been flushed to disk yet.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文