Lucene.net 中带有特殊字符的精确短语
我在 lucene.net 中进行全文搜索时遇到问题,其中搜索结果包含特殊的 lucene 字符。
我的 Lucene 文档中有一个名为“content”的字段。该字段创建如下,包含索引文档的内容:
document.Add(new Field("content", fulltext, Field.Store.YES, Field.Index.ANALYZED));
为了创建索引,我使用的是 Standardanalyzer。
为了查询索引,我使用以下代码:
var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "content", analayzer);
queryParser.SetAllowLeadingWildcard(true);
queryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
Query fullTextQuery = queryParser.Parse(queryString);
然后将查询添加到 BooleanQuery 中,该 BooleanQuery 用于从 IndexSearcher 获取结果。我认为代码的其余部分并不那么重要,因为代码对于 99% 的查询都适用。我还使用 StandardAnalyzer 来查询索引。
现在问题来了。 有时,文档的“内容”字段包含使用“-”分隔的文本
一些文本一些文本选择器杆一些文本一些文本
现在,当我使用“选择杆”进行全文搜索(精确短语)时。查询如下所示:
内容:“选档杆”
这里的问题是,也找到了包含上述文本的文档,尽管不应该找到它,因为这两个单词是使用“-”分隔的,而不是空白。
我认为这与分析器以及“-”是lucene中的特殊字符有关。
也许有人可以帮助我解决这个问题。
提前致谢 马丁
i've a problem doing a full text search in lucene.net where the search result contains special lucene characters.
I've a field named "content" in my Lucene documents. This field is created as followed and contains the content of the indexed documents:
document.Add(new Field("content", fulltext, Field.Store.YES, Field.Index.ANALYZED));
For creating the index i'm using the Standardanalyzer.
For querying the index i'm using the following code:
var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "content", analayzer);
queryParser.SetAllowLeadingWildcard(true);
queryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
Query fullTextQuery = queryParser.Parse(queryString);
The query is then added to a BooleanQuery which is used to get the results from a IndexSearcher. I think the rest of the code is not that important, because the code works like it should for 99% of the queries. I'm also using the StandardAnalyzer for querying the index.
Now here is the problem.
Sometimes the "content" field of a document contains text that is separated using "-"
some text some text selector-lever some text some text
Now when i'm doing a full text search (exact phrase) using "selector lever". The query looks like this:
content:"selector lever"
The problem here is that also the document containing the above text is found, although it shouldn't be found because the 2 words are separated using the "-" and not blank.
I think it has something to do with the analyzer and the fact that "-" is a special character in lucene.
Maybe someone can help me solving this problem.
thanks in advance
Martin
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您认为问题在于您在索引时使用的分析器是正确的。
来自 Lucene javadocs:
因此,在您的情况下,您需要使用更严格的分析器(例如仅按空格分割的
WhitespaceAnalyzer
)来索引文档。You are right in thinking that the problem is the analyzer that you are using at index time.
From the Lucene javadocs:
Therefore, in your case you would need to index your documents with a more strict Analyzer like the
WhitespaceAnalyzer
which only splits on whitespace.