Lucene的几个问题

发布于 2024-08-03 19:23:31 字数 414 浏览 8 评论 0原文

我一直在使用 Zend,需要搜索。 Zend 文档不是很好,所以我有几个问题很容易回答,但不是很明显。我正在使用 Lucene 搜索 SQL 数据库

  1. 如何将项目的索引与该项目的文本相关联。因此,如果他们搜索并找到该项目,我如何返回其索引?据我所知,您只能返回搜索的文本。

  2. 当我向包含所有数据的文档添加一个项目,但该文档已创建时,它只是一个 open('document'), $doc = new Doc(), $doc->addDocument( ), 犯罪()?

  3. 据我了解,每次向数据库添加内容时都会更新 Lucene 文档。在优化时,我是否应该在每次添加内容时重新优化?这样效率低吗?我应该每周执行一次吗?

很抱歉问一些看似显而易见的问题,并提前感谢您的帮助。

I've been using Zend and need a search. The Zend docs aren't great so I had a couple questions that are easy to answer but not directly obvious. I'm using Lucene to search an SQL database

  1. How do I associate the index of my item with the text of that item. So if they search and find the item, how do I get its index returned? As far as I can tell you can only return the text of the search.

  2. When I add an item to the document that holds all the data, but the document is created already, is it simply a open('document'), $doc = new Doc(), $doc->addDocument(), commit()?

  3. I understand that I update the Lucene document every time that I add something to the database. In optimizing, should I reoptimize every time that I add something? Is that inefficient? Should I do it once a week?

Sorry to ask what seems like obvious questions, and thanks for your help in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你丑哭了我 2024-08-10 19:23:32
  1. '索引,你应该检索' - 你必须索引你最终想要返回的内容。也就是说,如果您希望在搜索文本“Flux Capacitator”时能够返回记录 id 1389,则应该存储一个文档,该文档在一个字段中包含该文本,在另一字段中包含该 id。 id 字段不必建立索引,但必须存储它以便您可以取回它。
  2. 您正在寻找的是“更新文档”操作。 Lucene 并没有真正拥有它们。您应该先删除该文档,然后添加包含更新信息的新文档。现在返回到第 1 项,获取您在此处添加的 id 字段并将其编入索引(例如关键字),因为您需要将其用作文档的唯一标识符才能删除它。
  3. 很好的问题。这在很大程度上取决于您的用例。当您的站点/数据库相对空闲时,您是否每天都有“死区时间”?这就是优化的时候了。你没有这样的时间吗?您可以放弃优化并采取较小的(例如 5-10%)性能损失,这也可以使用 合并因素

我希望这是有道理的。如果没有,请在评论中提问。

  1. 'Index, and thou shalt retrieve' - You have to index what you want to be returned eventually. That is, if you want to be able to return record id 1389 when searching for its text "Flux Capacitator", you should store a document having the text in one field and the id in another field. The id field does not have to be indexed, but it has to be stored so you can get it back.
  2. What you are looking for is an 'update document' action. Lucene does not really have them. You should delete the document first, and then add a new document containing the updated information. Now go back to item 1, take the id field you added there and make it indexed (say as Keyword), because you will need to use it as a unique identifier of the document in order to delete it.
  3. Great question. This is very much dependent on your use case. Do you have a daily "dead time", when your site/database is relatively idle? That would be the time to optimize. Do you have no such time? You can forgo optimizing and take a small (say 5-10%) performance penalty, which could also be mitigated using the Merge Factor.

I hope this make sense. If it does not, please ask in the comments.

惜醉颜 2024-08-10 19:23:32

第 3 点)在 Lucene 2.9 中作为 NRT(NearRealtimeSearch)解决,通过 SegmentReader + 内部 RamDirectory 使用

检查 OtisGospodnetic wiki 条目

point 3) is addressed in Lucene 2.9 as NRT(NearRealtimeSearch) implemented by means of SegmentReader + internal RamDirectory usage

check OtisGospodnetic wiki entry

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文