如何对 Lucene.net 搜索中提供的所有单词执行模糊搜索
我正在尝试自学 Lucene.Net 在我的网站上实现。我知道如何做几乎所有我需要做的事情,除了一个问题。我试图弄清楚如何允许对搜索字符串中的所有搜索词进行模糊搜索。
例如,如果我有一个包含字符串 The big red Fox
的文档,我会尝试使用 bag fix
来匹配它。
问题是,似乎为了执行模糊搜索,我必须将 ~
添加到用户输入的每个搜索词中。我不确定解决这个问题的最佳方法。现在我正在尝试这样做,
string queryString = "bag rad";
queryString = queryString.Replace("~", string.Empty).Replace(" ", "~ ") + "~";
第一次替换是由于 Lucene.Net 抛出异常,如果搜索字符串已经有 ~
,显然它无法处理 ~~
用一句话来说。这种方法有效,但如果我开始添加模糊权重值,它似乎会变得混乱。
有没有更好的方法来默认所有单词以允许模糊性?
I am trying to teach myself Lucene.Net to implement on my site. I understand how to do almost everything I need except for one issue. I am trying to figure out how to allow a fuzzy search for all search terms in a search string.
So for example if I have a document with the string The big red fox
, I am trying to get bag fix
to match it.
The problem is, it seems like in order to perform fuzzy searches, I have to add ~
to every search term the user enters. I am unsure of the best way to go about this. Right now I am attempting this by
string queryString = "bag rad";
queryString = queryString.Replace("~", string.Empty).Replace(" ", "~ ") + "~";
The first replace is due to Lucene.Net throwing an exception if the search string has a ~
already, apparently it can't handle ~~
in a phrase. This method works, but it seems like it will get messy if I start adding fuzzy weight values.
Is there a better way to default all words to allow for fuzzyness?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能希望将文档索引为二元组或三元组。看一下 CJKAnalyzer< /a> 看看他们是如何做到的。您将需要下载源代码并查看源代码。
You might want to index your documents as bi-grams or tri-grams. Take a look at the CJKAnalyzer to see how they do it. You will want to download the source and look at the source.