Lucene 对短语而不是单个单词进行模糊匹配
我正在尝试使用 Apache Lucene 对短语“Grand Prarie”(故意拼写错误)进行模糊匹配。我的问题的一部分是 ~
运算符仅对单个单词术语进行模糊匹配,并表现为短语的邻近匹配。
有没有办法用lucene对短语进行模糊匹配?
I'm trying to do a fuzzy match on the Phrase "Grand Prarie" (deliberately misspelled) using Apache Lucene. Part of my issue is that the ~
operator only does fuzzy matches on single word terms and behaves as a proximity match for phrases.
Is there a way to do a fuzzy match on a phrase with lucene?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Lucene 3.0 有 ComplexPhraseQueryParser< /a> 支持模糊短语查询。这是在 contrib 包中。
Lucene 3.0 has ComplexPhraseQueryParser that supports fuzzy phrase query. This is in the contrib package.
通过谷歌发现了这个问题,并找到了不是我想要的解决方案。
就我而言,解决方案是简单地针对 solr API 重复搜索序列。
因此,例如,如果我正在寻找: title_t 以包含“dog~”和“cat~”的匹配,我添加了一些手动代码来生成查询:
它可能只是上面查询的内容,但是链接似乎已死。
Came across this through Google and felt solutions where not what I was after.
In my case, solution was to simply repeat the search sequence against the solr API.
So for example if I was looking for: title_t to include match for "dog~" and "cat~", I added some manual code to generate query as:
It might just be what above queries are about, however links seems dead.
没有对模糊短语的直接支持,但您可以通过明确地模拟它 枚举模糊术语,然后将它们添加到 MultiPhraseQuery。结果查询将如下所示:
There's no direct support for a fuzzy phrase, but you can simulate it by explicitly enumerating the fuzzy terms and then adding them to a MultiPhraseQuery. The resulting query would look like: