Hibernate 搜索中的模糊索引
我完全理解模糊搜索,但在我的应用程序中,它们非常慢,有很多术语(约 500 毫秒)。我遇到了一种缓慢模糊搜索的解决方案,其中建议不要进行模糊搜索,而是使用 levenstein 算法对术语进行索引,以便常规关键字搜索会产生模糊结果。
有没有办法用 Hibernate Search 来做到这一点,最好是使用注释?
I understand fuzzy searches all and well, but in my application they are very slow with lots of terms (~500ms). I ran across a solution to slow fuzzy searches where it was suggested that instead of doing fuzzy searches, index the terms with the levenstein algorithm, so that a regular keyword search would yield fuzzy results.
Is there any way of doing this with Hibernate Search, preferably using annotations?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不太确定你想在这里做什么。您是否希望在索引期间将具有给定 Levenstein 距离的单词插入到索引中?类似于同义词搜索,您将同义词标记插入索引中?如果是这样,您可以编写令牌过滤器(和过滤器工厂),然后使用 @AnalyzerDef 框架来构建自定义分析器。查看源代码以了解这是如何完成的。
请注意,我发现这种方法有几个问题。索引变得昂贵并且索引大小将变得非常大。当然,我对你的用例了解不多。
I am not quite sure what you want to do here. Do you want during indexing time insert words with a given Levenstein distance into the index? Similar to synonym search where you insert synonym tokens into the index? If so, you could just write your on token filter (and filter factory) and then use the @AnalyzerDef framework to build your custom analyzer. Look at the source code to see how this is done.
Mind you, I see several issues with this approach. Indexing becomes expensive and the index size will become very big. Of course I don't know much more about your usecase.
我会按顺序尝试以下选项:
如果上述情况不适用,并且您确实决定需要模糊搜索,并且没有其他选择,您可以尝试使用 lucene 主干的夜间构建。这使用了完全不同的算法,因此这些查询速度更快[1]。但是,我认为您无法轻松地将未发布的 lucene trunk 与 hibernate 集成。
[1]: http:// blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html 关于模糊改进的博客。
I would try the following options, in order:
If the above don't apply, and you really decide you need fuzzy search, and there is no alternative, you could try using a nightly build of lucene's trunk instead. This uses a totally different algorithm so that these queries are much faster [1]. But, I don't think you will be able to easily integrate unreleased lucene trunk with hibernate.
[1]: http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html Blog about fuzzy improvements.