Luke Lucene 布尔查询

发布于 2024-11-06 18:41:38 字数 2764 浏览 7 评论 0原文

在 Luke 中，以下搜索表达式返回 23 个结果：

docurl:www.siteurl.com  docfile:Tomatoes*

如果我使用以下实现将相同的表达式传递到我的 C# Lucene.NET 应用程序中：

        IndexReader reader = IndexReader.Open(indexName);
        Searcher searcher = new IndexSearcher(reader);
        try
        {
            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max)
            ...
        }

我得到 0 个结果

Luke 使用的是 StandardAnalyzer，“解释结构”窗口如下所示： Luke Query Structure

我是否必须为我搜索的每个字段手动创建 BooleanClause 对象，并指定 应该为每一个然后使用.Add()将它们添加到BooleanQuery对象中吗？我认为 QueryParser 会为我做这件事。我缺少什么？

编辑： 稍微简化一下，docfile:Tomatoes* 在 Luke 中返回 23 个文档，但在我的应用程序中返回 0 个文档。根据 Gene 的建议，我已从 MUST 更改为 SHOULD：

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max);

parsedQuery 只是 docfile:tomatoes*

Edit2:

我想我终于找到了根本问题：

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            Query parsedQuery = parser.Parse(query);

在第二行中，query 是 "docfile:Tomatoes*"，但是 parsedQuery 是{docfile:西红柿*}。注意到区别了吗？解析查询中的小写“t”。我以前从未注意到这一点。如果我将 IDE 中的值更改为“T”，则会返回 23 个结果。

我已经验证在索引和读取索引时正在使用 StandardAnalyzer。如何强制 queryParser 保持 query 值的大小写？

编辑3： 哇，多么令人沮丧。根据文档，我可以通过以下方式完成此操作：

parser.setLowercaseExpandedTerms(false);

是否是通配符、前缀、模糊和范围查询是是否自动小写。默认值为 true。

我不会争论这是否是一个明智的默认设置。我认为 SimpleAnalyzer 应该用于小写索引中和索引外的所有内容。令人沮丧的是，至少在我使用的版本中，卢克默认是另一种方式！至少我对 Lucene 有了更多的了解。

原文

In Luke, the following search expression returns 23 results:

docurl:www.siteurl.com  docfile:Tomatoes*

If I pass this same expression into my C# Lucene.NET app with the following implementation:

        IndexReader reader = IndexReader.Open(indexName);
        Searcher searcher = new IndexSearcher(reader);
        try
        {
            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max)
            ...
        }

I get 0 results

Luke is using StandardAnalyzer and this is what the Explain Structure window looks like:
Luke Query Structure

Must I manually create BooleanClause objects for each field I search on, specifying Should for each one then add them to the BooleanQuery object with .Add()? I thought the QueryParser would do this for me. What am I missing?

Edit:
Simplifying a tad, docfile:Tomatoes* returns 23 docs in Luke, yet 0 in my app. Per Gene's suggestion, I've changed from MUST to SHOULD:

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max);

parsedQuery is simply docfile:tomatoes*

Edit2:

I think I've finally gotten to the root problem:

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            Query parsedQuery = parser.Parse(query);

In the second line, query is "docfile:Tomatoes*", but parsedQuery is {docfile:tomatoes*}. Notice the difference? Lower case 't' in the parsed query. I never noticed this before. If I change the value in the IDE to 'T', 23 results return.

I've verified that StandardAnalyzer is being used when indexing and reading the index. How do I force queryParser to keep the case of the value of query?

Edit3:
Wow, how frustrating. According to the documentation, I can accomplish this with:

parser.setLowercaseExpandedTerms(false);

Whether terms of wildcard, prefix,
fuzzy and range queries are to be
automatically lower-cased or not.
Default is true.

I won't argue whether that's a sensible default or not. I suppose SimpleAnalyzer should have been used to lowercase everything in and out of the index. The frustrating part is, at least with the version I'm using, Luke defaults the other way! At least I learned a bit more about Lucene.

分享到QQ

分享到微博