Hibernate 搜索中的模糊索引

发布于 2024-11-27 18:36:01 字数 171 浏览 3 评论 0原文

我完全理解模糊搜索,但在我的应用程序中,它们非常慢,有很多术语(约 500 毫秒)。我遇到了一种缓慢模糊搜索的解决方案,其中建议不要进行模糊搜索,而是使用 levenstein 算法对术语进行索引,以便常规关键字搜索会产生模糊结果。

有没有办法用 Hibernate Search 来做到这一点,最好是使用注释?

I understand fuzzy searches all and well, but in my application they are very slow with lots of terms (~500ms). I ran across a solution to slow fuzzy searches where it was suggested that instead of doing fuzzy searches, index the terms with the levenstein algorithm, so that a regular keyword search would yield fuzzy results.

Is there any way of doing this with Hibernate Search, preferably using annotations?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

菊凝晚露 2024-12-04 18:36:01

我不太确定你想在这里做什么。您是否希望在索引期间将具有给定 Levenstein 距离的单词插入到索引中?类似于同义词搜索,您将同义词标记插入索引中?如果是这样,您可以编写令牌过滤器(和过滤器工厂),然后使用 @AnalyzerDef 框架来构建自定义分析器。查看源代码以了解这是如何完成的。
请注意,我发现这种方法有几个问题。索引变得昂贵并且索引大小将变得非常大。当然,我对你的用例了解不多。

I am not quite sure what you want to do here. Do you want during indexing time insert words with a given Levenstein distance into the index? Similar to synonym search where you insert synonym tokens into the index? If so, you could just write your on token filter (and filter factory) and then use the @AnalyzerDef framework to build your custom analyzer. Look at the source code to see how this is done.
Mind you, I see several issues with this approach. Indexing becomes expensive and the index size will become very big. Of course I don't know much more about your usecase.

绅士风度i 2024-12-04 18:36:01

我会按顺序尝试以下选项:

  1. 您只是想纠正用户查询中的拼写错误吗?也许您应该为此预先使用拼写检查器/自动建议,而不是使用较慢且难以调整相关性的模糊查询。
  2. 这不是真正的全文搜索,而是某种类型的“匹配”过程吗?在这种情况下,另一种方法可以是索引字符 n 元语法,例如使用 lucene 的 ngram TokenFilters,这样您就可以在字段上执行布尔查询,而不是缓慢的模糊查询。这实际上就是 lucene 的拼写检查器在幕后工作的方式!
  3. 如果上述情况不适用,并且您确实决定需要模糊搜索,并且没有其他选择,您可以尝试使用 lucene 主干的夜间构建。这使用了完全不同的算法,因此这些查询速度更快[1]。但是,我认为您无法轻松地将未发布的 lucene trunk 与 hibernate 集成。

    [1]: http:// blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html 关于模糊改进的博客。

I would try the following options, in order:

  1. Are you just trying to correct spelling errors in user queries? Maybe you should use a spellchecker/autosuggest up-front for this, rather than using slower fuzzy queries with hard-to-tune relevance.
  2. Is this not really a full-text search, but instead some type of 'matching' procedure? In this case, an alternative could be to index character n-grams instead, e.g. with lucene's ngram TokenFilters, so that you are doing a boolean query on the field instead of a slow fuzzy query. This is actually how lucene's spellchecker works behind the scenes anyway!
  3. If the above don't apply, and you really decide you need fuzzy search, and there is no alternative, you could try using a nightly build of lucene's trunk instead. This uses a totally different algorithm so that these queries are much faster [1]. But, I don't think you will be able to easily integrate unreleased lucene trunk with hibernate.

    [1]: http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html Blog about fuzzy improvements.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文