为什么这个 Lucene.Net 查询失败?

发布于 2024-11-10 00:09:22 字数 3338 浏览 0 评论 0原文

我正在尝试转换我的搜索功能以允许涉及多个单词的模糊搜索。我现有的搜索代码如下所示:

        // Split the search into seperate queries per word, and combine them into one major query
        var finalQuery = new BooleanQuery();

        string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
        foreach (string term in terms)
        {
            // Setup the fields to search
            string[] searchfields = new string[] 
            {
                // Various strings denoting the document fields available
            };

            var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
            finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
        }

        // Perform the search
        var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
        var searcher = new IndexSearcher(directory, true);
        var hits = searcher.Search(finalQuery, MAX_RESULTS);

这可以正常工作,如果我有一个名称字段为“我的名字是 Andrew”的实体,并且我执行搜索“Andrew Name”,Lucene 会正确找到正确的文档。现在我想启用模糊搜索,以便正确找到“Anderw Name”。我更改了方法以使用以下代码:

        const int MAX_RESULTS = 10000;
        const float MIN_SIMILARITY = 0.5f;
        const int PREFIX_LENGTH = 3;

        if (string.IsNullOrWhiteSpace(searchString))
            throw new ArgumentException("Provided search string is empty");

        // Split the search into seperate queries per word, and combine them into one major query
        var finalQuery = new BooleanQuery();

        string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
        foreach (string term in terms)
        {
            // Setup the fields to search
            string[] searchfields = new string[] 
            {
                // Strings denoting document field names here
            };

            // Create a subquery where the term must match at least one of the fields
            var subquery = new BooleanQuery();
            foreach (string field in searchfields)
            {
                var queryTerm = new Term(field, term);
                var fuzzyQuery = new FuzzyQuery(queryTerm, MIN_SIMILARITY, PREFIX_LENGTH);
                subquery.Add(fuzzyQuery, BooleanClause.Occur.SHOULD);
            }

            // Add the subquery to the final query, but make at least one subquery match must be found
            finalQuery.Add(subquery, BooleanClause.Occur.MUST);
        }

        // Perform the search
        var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
        var searcher = new IndexSearcher(directory, true);
        var hits = searcher.Search(finalQuery, MAX_RESULTS);

不幸的是,使用此代码,如果我提交搜索查询“Andrew Name”(与以前相同),我会得到零结果。

核心思想是所有术语必须至少在一个文档字段中找到,但每个术语可以驻留在不同的字段中。有谁知道为什么我重写的查询失败?


Final Edit: Ok it turns out I was over complicating this by a LOT, and there was no need to change from my first approach. After reverting back to the first code snippet, I enabled fuzzy searching by changing

finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);

finalQuery.Add(parser.Parse(term.Replace("~", "") + "~"), BooleanClause.Occur.MUST);

I am trying to convert my search functionality to allow for fuzzy searches involving multiple words. My existing search code looks like:

        // Split the search into seperate queries per word, and combine them into one major query
        var finalQuery = new BooleanQuery();

        string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
        foreach (string term in terms)
        {
            // Setup the fields to search
            string[] searchfields = new string[] 
            {
                // Various strings denoting the document fields available
            };

            var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
            finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
        }

        // Perform the search
        var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
        var searcher = new IndexSearcher(directory, true);
        var hits = searcher.Search(finalQuery, MAX_RESULTS);

This works correctly, and if I have an entity with the name field of "My name is Andrew", and I perform a search for "Andrew Name", Lucene correctly finds the correct document. Now I want to enable fuzzy searching, so that "Anderw Name" is found correctly. I changed my method to use the following code:

        const int MAX_RESULTS = 10000;
        const float MIN_SIMILARITY = 0.5f;
        const int PREFIX_LENGTH = 3;

        if (string.IsNullOrWhiteSpace(searchString))
            throw new ArgumentException("Provided search string is empty");

        // Split the search into seperate queries per word, and combine them into one major query
        var finalQuery = new BooleanQuery();

        string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
        foreach (string term in terms)
        {
            // Setup the fields to search
            string[] searchfields = new string[] 
            {
                // Strings denoting document field names here
            };

            // Create a subquery where the term must match at least one of the fields
            var subquery = new BooleanQuery();
            foreach (string field in searchfields)
            {
                var queryTerm = new Term(field, term);
                var fuzzyQuery = new FuzzyQuery(queryTerm, MIN_SIMILARITY, PREFIX_LENGTH);
                subquery.Add(fuzzyQuery, BooleanClause.Occur.SHOULD);
            }

            // Add the subquery to the final query, but make at least one subquery match must be found
            finalQuery.Add(subquery, BooleanClause.Occur.MUST);
        }

        // Perform the search
        var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
        var searcher = new IndexSearcher(directory, true);
        var hits = searcher.Search(finalQuery, MAX_RESULTS);

Unfortunately, with this code if I submit the search query "Andrew Name" (same as before) I get zero results back.

The core idea is that all terms must be found in at least one document field, but each term can reside in different fields. Does anyone have any idea why my rewritten query fails?


Final Edit: Ok it turns out I was over complicating this by a LOT, and there was no need to change from my first approach. After reverting back to the first code snippet, I enabled fuzzy searching by changing

finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);

to

finalQuery.Add(parser.Parse(term.Replace("~", "") + "~"), BooleanClause.Occur.MUST);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

铃予 2024-11-17 00:09:22

如果我将 searchString 重写为小写,您的代码对我有用。我假设您在索引时使用 StandardAnalyzer,并且它将生成小写术语。

您需要 1) 将令牌传递给相同的分析器(以启用相同的处理),2) 应用与分析器相同的逻辑,或 3) 使用与您执行的处理相匹配的分析器 (WhitespaceAnalyzer) 。

Your code works for me if I rewrite the searchString to lower-case. I'm assuming that you're using the StandardAnalyzer when indexing, and it will generate lower-case terms.

You need to 1) pass your tokens through the same analyzer (to enable identical processing), 2) apply the same logic as the analyzer or 3) use an analyzer which matches the processing you do (WhitespaceAnalyzer).

梦幻的味道 2024-11-17 00:09:22

您希望这一行:

var queryTerm = new Term(term);

看起来像这样:

var queryTerm = new Term(field, term);

现在您正在搜索字段 term (可能不存在)中的空字符串(永远不会找到)。

You want this line:

var queryTerm = new Term(term);

to look like this:

var queryTerm = new Term(field, term);

Right now you're searching field term (which probably doesn't exist) for the empty string (which will never be found).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文