Luke Lucene 布尔查询
在 Luke 中,以下搜索表达式返回 23 个结果:
docurl:www.siteurl.com docfile:Tomatoes*
如果我使用以下实现将相同的表达式传递到我的 C# Lucene.NET 应用程序中:
IndexReader reader = IndexReader.Open(indexName);
Searcher searcher = new IndexSearcher(reader);
try
{
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
BooleanQuery bquery = new BooleanQuery();
Query parsedQuery = parser.Parse(query);
bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
int _max = searcher.MaxDoc();
BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
TopDocs hits = searcher.Search(parsedQuery, _max)
...
}
我得到 0 个结果
Luke 使用的是 StandardAnalyzer,“解释结构”窗口如下所示:
我是否必须为我搜索的每个字段手动创建 BooleanClause
对象,并指定 应该
为每一个然后使用.Add()
将它们添加到BooleanQuery
对象中吗?我认为 QueryParser 会为我做这件事。我缺少什么?
编辑: 稍微简化一下,docfile:Tomatoes*
在 Luke 中返回 23 个文档,但在我的应用程序中返回 0 个文档。根据 Gene 的建议,我已从 MUST
更改为 SHOULD
:
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
BooleanQuery bquery = new BooleanQuery();
Query parsedQuery = parser.Parse(query);
bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
int _max = searcher.MaxDoc();
BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
TopDocs hits = searcher.Search(parsedQuery, _max);
parsedQuery 只是 docfile:tomatoes*
Edit2:
我想我终于找到了根本问题:
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
Query parsedQuery = parser.Parse(query);
在第二行中,query
是 "docfile:Tomatoes*"
,但是 parsedQuery
是{docfile:西红柿*}
。注意到区别了吗?解析查询中的小写“t”。我以前从未注意到这一点。如果我将 IDE 中的值更改为“T”,则会返回 23 个结果。
我已经验证在索引和读取索引时正在使用 StandardAnalyzer
。如何强制 queryParser
保持 query
值的大小写?
编辑3: 哇,多么令人沮丧。根据 文档,我可以通过以下方式完成此操作:
parser.setLowercaseExpandedTerms(false);
是否是通配符、前缀、 模糊和范围查询是 是否自动小写。 默认值为 true。
我不会争论这是否是一个明智的默认设置。我认为 SimpleAnalyzer 应该用于小写索引中和索引外的所有内容。令人沮丧的是,至少在我使用的版本中,卢克默认是另一种方式!至少我对 Lucene 有了更多的了解。
In Luke, the following search expression returns 23 results:
docurl:www.siteurl.com docfile:Tomatoes*
If I pass this same expression into my C# Lucene.NET app with the following implementation:
IndexReader reader = IndexReader.Open(indexName);
Searcher searcher = new IndexSearcher(reader);
try
{
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
BooleanQuery bquery = new BooleanQuery();
Query parsedQuery = parser.Parse(query);
bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
int _max = searcher.MaxDoc();
BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
TopDocs hits = searcher.Search(parsedQuery, _max)
...
}
I get 0 results
Luke is using StandardAnalyzer and this is what the Explain Structure window looks like:
Must I manually create BooleanClause
objects for each field I search on, specifying Should
for each one then add them to the BooleanQuery
object with .Add()
? I thought the QueryParser
would do this for me. What am I missing?
Edit:
Simplifying a tad, docfile:Tomatoes*
returns 23 docs in Luke, yet 0 in my app. Per Gene's suggestion, I've changed from MUST
to SHOULD
:
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
BooleanQuery bquery = new BooleanQuery();
Query parsedQuery = parser.Parse(query);
bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
int _max = searcher.MaxDoc();
BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
TopDocs hits = searcher.Search(parsedQuery, _max);
parsedQuery is simply docfile:tomatoes*
Edit2:
I think I've finally gotten to the root problem:
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
Query parsedQuery = parser.Parse(query);
In the second line, query
is "docfile:Tomatoes*"
, but parsedQuery
is {docfile:tomatoes*}
. Notice the difference? Lower case 't' in the parsed query. I never noticed this before. If I change the value in the IDE to 'T', 23 results return.
I've verified that StandardAnalyzer
is being used when indexing and reading the index. How do I force queryParser
to keep the case of the value of query
?
Edit3:
Wow, how frustrating. According to the documentation, I can accomplish this with:
parser.setLowercaseExpandedTerms(false);
Whether terms of wildcard, prefix,
fuzzy and range queries are to be
automatically lower-cased or not.
Default is true.
I won't argue whether that's a sensible default or not. I suppose SimpleAnalyzer should have been used to lowercase everything in and out of the index. The frustrating part is, at least with the version I'm using, Luke defaults the other way! At least I learned a bit more about Lucene.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用
Occur.MUST
相当于在标准查询解析器中使用+
运算符。因此,您的代码正在评估+docurl:www.siteurl.com +docfile:Tomatoes*
而不是您在 Luke 中输入的表达式。要获得该行为,请在添加子句时尝试Occur.SHOULD
。Using
Occur.MUST
is equivalent to using the+
operator with the standard query parser. Thus you code is evaluating+docurl:www.siteurl.com +docfile:Tomatoes*
rather than the expression you typed into Luke. To get that behavior, tryOccur.SHOULD
when adding your clauses.QueryParser
确实会接受像“docurl:www.siteurl.com docfile:Tomatoes*”这样的查询,并根据给定的查询构建适当的查询(布尔查询、范围查询等) (请参阅查询语法)。您的第一步应该是附加调试器并检查
parsedQuery
的值和类型。QueryParser
will indeed take a query like "docurl:www.siteurl.com docfile:Tomatoes*" and build a proper query out of it (boolean query, range query, etc.) depending on the query given (see query syntax).Your first step should be to attach a debugger and inspect the value and type of
parsedQuery
.