Luke Lucene 布尔查询

发布于 2024-11-06 18:41:38 字数 2764 浏览 0 评论 0原文

在 Luke 中,以下搜索表达式返回 23 个结果:

docurl:www.siteurl.com  docfile:Tomatoes*

如果我使用以下实现将相同的表达式传递到我的 C# Lucene.NET 应用程序中:

        IndexReader reader = IndexReader.Open(indexName);
        Searcher searcher = new IndexSearcher(reader);
        try
        {
            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max)
            ...
        }

我得到 0 个结果

Luke 使用的是 StandardAnalyzer,“解释结构”窗口如下所示: Luke Query Structure

我是否必须为我搜索的每个字段手动创建 BooleanClause 对象,并指定 应该为每一个然后使用.Add()将它们添加到BooleanQuery对象中吗?我认为 QueryParser 会为我做这件事。我缺少什么?

编辑: 稍微简化一下,docfile:Tomatoes* 在 Luke 中返回 23 个文档,但在我的应用程序中返回 0 个文档。根据 Gene 的建议,我已从 MUST 更改为 SHOULD

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max);

parsedQuery 只是 docfile:tomatoes*

Edit2:

我想我终于找到了根本问题:

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            Query parsedQuery = parser.Parse(query);

在第二行中,query"docfile:Tomatoes*",但是 parsedQuery{docfile:西红柿*}。注意到区别了吗?解析查询中的小写“t”。我以前从未注意到这一点。如果我将 IDE 中的值更改为“T”,则会返回 23 个结果。

我已经验证在索引和读取索引时正在使用 StandardAnalyzer。如何强制 queryParser 保持 query 值的大小写?

编辑3: 哇,多么令人沮丧。根据 文档,我可以通过以下方式完成此操作:

parser.setLowercaseExpandedTerms(false);

是否是通配符、前缀、 模糊和范围查询是 是否自动小写。 默认值为 true。

我不会争论这是否是一个明智的默认设置。我认为 SimpleAnalyzer 应该用于小写索引中和索引外的所有内容。令人沮丧的是,至少在我使用的版本中,卢克默认是另一种方式!至少我对 Lucene 有了更多的了解。

In Luke, the following search expression returns 23 results:

docurl:www.siteurl.com  docfile:Tomatoes*

If I pass this same expression into my C# Lucene.NET app with the following implementation:

        IndexReader reader = IndexReader.Open(indexName);
        Searcher searcher = new IndexSearcher(reader);
        try
        {
            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max)
            ...
        }

I get 0 results

Luke is using StandardAnalyzer and this is what the Explain Structure window looks like:
Luke Query Structure

Must I manually create BooleanClause objects for each field I search on, specifying Should for each one then add them to the BooleanQuery object with .Add()? I thought the QueryParser would do this for me. What am I missing?

Edit:
Simplifying a tad, docfile:Tomatoes* returns 23 docs in Luke, yet 0 in my app. Per Gene's suggestion, I've changed from MUST to SHOULD:

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max);

parsedQuery is simply docfile:tomatoes*

Edit2:

I think I've finally gotten to the root problem:

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            Query parsedQuery = parser.Parse(query);

In the second line, query is "docfile:Tomatoes*", but parsedQuery is {docfile:tomatoes*}. Notice the difference? Lower case 't' in the parsed query. I never noticed this before. If I change the value in the IDE to 'T', 23 results return.

I've verified that StandardAnalyzer is being used when indexing and reading the index. How do I force queryParser to keep the case of the value of query?

Edit3:
Wow, how frustrating. According to the documentation, I can accomplish this with:

parser.setLowercaseExpandedTerms(false);

Whether terms of wildcard, prefix,
fuzzy and range queries are to be
automatically lower-cased or not.
Default is true.

I won't argue whether that's a sensible default or not. I suppose SimpleAnalyzer should have been used to lowercase everything in and out of the index. The frustrating part is, at least with the version I'm using, Luke defaults the other way! At least I learned a bit more about Lucene.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

丶情人眼里出诗心の 2024-11-13 18:41:38

使用 Occur.MUST 相当于在标准查询解析器中使用 + 运算符。因此,您的代码正在评估 +docurl:www.siteurl.com +docfile:Tomatoes* 而不是您在 Luke 中输入的表达式。要获得该行为,请在添加子句时尝试 Occur.SHOULD

Using Occur.MUST is equivalent to using the + operator with the standard query parser. Thus you code is evaluating +docurl:www.siteurl.com +docfile:Tomatoes* rather than the expression you typed into Luke. To get that behavior, try Occur.SHOULD when adding your clauses.

尤怨 2024-11-13 18:41:38

QueryParser 确实会接受像“docurl:www.siteurl.com docfile:Tomatoes*”这样的查询,并根据给定的查询构建适当的查询(布尔查询、范围查询等) (请参阅查询语法)。

您的第一步应该是附加调试器并检查 parsedQuery 的值和类型。

QueryParser will indeed take a query like "docurl:www.siteurl.com docfile:Tomatoes*" and build a proper query out of it (boolean query, range query, etc.) depending on the query given (see query syntax).

Your first step should be to attach a debugger and inspect the value and type of parsedQuery.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文