Lucene.net 多字段搜索

发布于 2024-10-18 23:08:45 字数 2220 浏览 2 评论 0原文

为了获得一些与上下文更相关的搜索结果,我决定尝试一下 lucene.net,尽管我对它很陌生,而且我发现它不是我遇到过的最直观的库。由于缺乏相关示例来帮助我弄清楚这一点,这无济于事。

我正在使用 simple lucene 来构建我的索引,这似乎工作得很好:

Field f = null;
Document document = new Document();

document.Add(new Field("id", dl.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));

f = new Field("category", dl.CategoryName.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
f.SetBoost(5);
document.Add(f);

f = new Field("company_name", dl.CompanyName.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
f.SetBoost(2);
document.Add(f);

document.Add(new Field("description", dl.Description.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document.Add(new Field("meta_keywords", dl.Meta_Keywords.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document.Add(new Field("meta_description", dl.Meta_Description.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));

//And a few more fields

基于这个索引,我首先尝试了沿着这些的查询lines:

var whatParser = new MultiFieldQueryParser(
    global::Lucene.Net.Util.Version.LUCENE_29,
    new string[] { "company_name", "description", "meta_keywords", "meta_description", "category" },
    analyzer);

whatQuery = whatParser.Parse("search".ToLowerInvariant());

这一直很有效,直到搜索词超过 1 个单词。接下来是短语查询。

whatQuery = new PhraseQuery();
whatQuery.Add(new Term("company_name", what));
whatQuery.Add(new Term("description", what));
whatQuery.Add(new Term("meta_keywords", what));
whatQuery.Add(new Term("meta_description", what));
whatQuery.Add(new Term("category", what));

然后我发现抛出了错误:所有短语术语必须位于同一字段

那么,我哪里出错了?您对如何修复它有什么建议吗?如果有更好的建议,我愿意完全改变搜索技术。

一些可能有用的附加信息

  • 所有结果最终都通过 new Sort(new SortField[] {new SortField("is_featured", SortField.STRING, true),SortField.FIELD_SCORE}) 进行
  • 排序有一些额外的搜索条件,因此每个查询都会添加到布尔查询中,并将发生设置为“应该”

感谢您的帮助。

In an attempt to get some more contextually relevant search results I've decided to have a play with lucene.net although I'm very new to it and I've found it not to be the most intuitive library I've come across. This isn't helped by the lack of relevant examples out there to help me figure it out.

I'm using simple lucene to build my index and that seems to be working perfectly:

Field f = null;
Document document = new Document();

document.Add(new Field("id", dl.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));

f = new Field("category", dl.CategoryName.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
f.SetBoost(5);
document.Add(f);

f = new Field("company_name", dl.CompanyName.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
f.SetBoost(2);
document.Add(f);

document.Add(new Field("description", dl.Description.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document.Add(new Field("meta_keywords", dl.Meta_Keywords.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document.Add(new Field("meta_description", dl.Meta_Description.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));

//And a few more fields

Based on this index I first tried a query along these lines:

var whatParser = new MultiFieldQueryParser(
    global::Lucene.Net.Util.Version.LUCENE_29,
    new string[] { "company_name", "description", "meta_keywords", "meta_description", "category" },
    analyzer);

whatQuery = whatParser.Parse("search".ToLowerInvariant());

This worked great up until the search term became more than 1 word. Next up was a phrase query.

whatQuery = new PhraseQuery();
whatQuery.Add(new Term("company_name", what));
whatQuery.Add(new Term("description", what));
whatQuery.Add(new Term("meta_keywords", what));
whatQuery.Add(new Term("meta_description", what));
whatQuery.Add(new Term("category", what));

Which i then found threw the error: All phrase terms must be in the same field

So, where am I going wrong? Do you have any suggestions on how to fix it? I'm open to changing the search technology entirely if there are better suggestions out there.

Some additional information which may be useful

  • All results are sorted in the end via new Sort(new SortField[] {new SortField("is_featured", SortField.STRING, true),SortField.FIELD_SCORE})
  • There are some additional search criteria so each query is added to a Boolean query with occur set to SHOULD

Thanks for your help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

獨角戲 2024-10-25 23:08:45

我认为 BooleanClause.Occur.SHOULD 是问题所在。我们这样使用它:

string[] fieldList = { "field1", "field2", "field3"; 

//for us the field list varies .. there are other ways to create this array of course
List<BooleanClause.Occur> occurs = new List<BooleanClause.Occur>();
foreach (string field in fieldList)
    occurs.Add(BooleanClause.Occur.SHOULD);

if(!string.IsNullOrEmpty(multiWordPhrase))
{
    Query q = MultiFieldQueryParser.Parse(multiWordPhrase, fieldList, occurs.ToArray(), new StandardAnalyzer());
    return q;
}

I think the BooleanClause.Occur.SHOULD is the issue. We use it like this:

string[] fieldList = { "field1", "field2", "field3"; 

//for us the field list varies .. there are other ways to create this array of course
List<BooleanClause.Occur> occurs = new List<BooleanClause.Occur>();
foreach (string field in fieldList)
    occurs.Add(BooleanClause.Occur.SHOULD);

if(!string.IsNullOrEmpty(multiWordPhrase))
{
    Query q = MultiFieldQueryParser.Parse(multiWordPhrase, fieldList, occurs.ToArray(), new StandardAnalyzer());
    return q;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文