如何在lucene中存储索引中字段的Boost Factor

发布于 2024-12-02 23:19:33 字数 1153 浏览 0 评论 0原文

我正在使用 lucene 在地址簿之类的产品中进行搜索。我想根据某些特定标准来提高搜索结果。 (例如,位置字段中的匹配应该比实体名称中的匹配具有更大的相关性。)这是我的案例的固定标准。

我试图通过在索引时调用 SetBoost() 方法来使用 Field 存储 boostfactor。但结果的分数也不如预期。它正在考虑为每个字段提供相同的提升值。

有人能建议我哪里出错了吗?

我用来构建索引的代码。

Directory objIndexDirectory =
  FSDirectory.Open(new System.IO.DirectoryInfo(<PathOfIndexFolder>));
StandardAnalyzer objAnalyzer =
  new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
IndexWriter objWriter = new IndexWriter(
  objIndexDirectory, objAnalyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
Document objDocument = new Document();
Field objName =
  new Field("Name", "John Doe", Field.Store.YES, Field.Index.ANALYZED);
Field objLocation =
  new Field("Location", "NY", Field.Store.YES, Field.Index.NOT_ANALYZED);
objLocation.SetBoost((2f);
objDocument.Add(objName);
objDocument.Add(objLocation);
objWriter.AddDocument(objDocument);

我想要实现的是, 假设索引中有三个条目:

  1. John Doe, NY
  2. John Foo, New Jercy
  3. XYZ, NY

在这种情况下,如果搜索查询是“John NY”,那么结果应该具有像

  1. John Doe, NY
  2. XYZ, NY
  3. John Foo, New 这样的 相关性杰西

I am using lucene to search in the address book like product. I want to boost the search results according to some specific criteria. (e.g. Match in location field should have greater relevance than match in name of entity.) This is fixed criteria for my case.

I am trying to store the boostfactor with Field by calling SetBoost() method while indexing. But then also result's score is not as expected. It's considering same boost value for every field.

Can anybody suggest me where I am going wrong?

Code I am using to build the index.

Directory objIndexDirectory =
  FSDirectory.Open(new System.IO.DirectoryInfo(<PathOfIndexFolder>));
StandardAnalyzer objAnalyzer =
  new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
IndexWriter objWriter = new IndexWriter(
  objIndexDirectory, objAnalyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
Document objDocument = new Document();
Field objName =
  new Field("Name", "John Doe", Field.Store.YES, Field.Index.ANALYZED);
Field objLocation =
  new Field("Location", "NY", Field.Store.YES, Field.Index.NOT_ANALYZED);
objLocation.SetBoost((2f);
objDocument.Add(objName);
objDocument.Add(objLocation);
objWriter.AddDocument(objDocument);

What I am trying to achieve is,
Assuming there is three entries in index:

  1. John Doe, NY
  2. John Foo, New Jercy
  3. XYZ, NY

In this case if the search query is "John NY", then result should have relevance like

  1. John Doe, NY
  2. XYZ, NY
  3. John Foo, New Jercy

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清泪尽 2024-12-09 23:19:33

我无法弄清楚您认为您的方法有什么问题,但这是我用来测试的代码:

class Program
{
    static void Main(string[] args)
    {
        RAMDirectory dir = new RAMDirectory();

        IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer());

        AddDocument(writer, "John Doe", "NY");
        AddDocument(writer, "John Foo", "New Jersey");
        AddDocument(writer, "XYZ", "NY");

        writer.Commit();

        BooleanQuery query = new BooleanQuery();
        query.Add(new TermQuery(new Term("Name", "john")), BooleanClause.Occur.SHOULD);
        query.Add(new TermQuery(new Term("Location", "NY")), BooleanClause.Occur.SHOULD);

        IndexReader reader = writer.GetReader();

        IndexSearcher searcher = new IndexSearcher(reader);
        var hits = searcher.Search(query, null, 10);

        for (int i = 0; i < hits.totalHits; i++)
        {
            Document doc = searcher.Doc(hits.scoreDocs[i].doc);
            var explain = searcher.Explain(query, hits.scoreDocs[i].doc);
            Console.WriteLine("{0} - {1} - {2}", hits.scoreDocs[i].score, doc.ToString(), explain.ToString());
        }
    }

    private static void AddDocument(IndexWriter writer, string name, string address)
    {
        Document objDocument = new Document();
        Field objName = new Field("Name", name, Field.Store.YES, Field.Index.ANALYZED);
        Field objLocation = new Field("Location", address, Field.Store.YES, Field.Index.NOT_ANALYZED);
        objLocation.SetBoost(2f);
        objDocument.Add(objName);
        objDocument.Add(objLocation);
        writer.AddDocument(objDocument);
    }
}

此代码确实按照您希望的顺序返回结果。事实上,如果您排除提升,它会按此顺序返回它们。我不是 Lucene 评分方面的专家,但我相信这是因为您将“NY”与“XYZ,NY”完全匹配,而“John”查询是部分匹配。您可以阅读通过Explain 类打印出来的详细信息。

I can't figure out what you think is wrong with your approach, but here is the code I was using to test with:

class Program
{
    static void Main(string[] args)
    {
        RAMDirectory dir = new RAMDirectory();

        IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer());

        AddDocument(writer, "John Doe", "NY");
        AddDocument(writer, "John Foo", "New Jersey");
        AddDocument(writer, "XYZ", "NY");

        writer.Commit();

        BooleanQuery query = new BooleanQuery();
        query.Add(new TermQuery(new Term("Name", "john")), BooleanClause.Occur.SHOULD);
        query.Add(new TermQuery(new Term("Location", "NY")), BooleanClause.Occur.SHOULD);

        IndexReader reader = writer.GetReader();

        IndexSearcher searcher = new IndexSearcher(reader);
        var hits = searcher.Search(query, null, 10);

        for (int i = 0; i < hits.totalHits; i++)
        {
            Document doc = searcher.Doc(hits.scoreDocs[i].doc);
            var explain = searcher.Explain(query, hits.scoreDocs[i].doc);
            Console.WriteLine("{0} - {1} - {2}", hits.scoreDocs[i].score, doc.ToString(), explain.ToString());
        }
    }

    private static void AddDocument(IndexWriter writer, string name, string address)
    {
        Document objDocument = new Document();
        Field objName = new Field("Name", name, Field.Store.YES, Field.Index.ANALYZED);
        Field objLocation = new Field("Location", address, Field.Store.YES, Field.Index.NOT_ANALYZED);
        objLocation.SetBoost(2f);
        objDocument.Add(objName);
        objDocument.Add(objLocation);
        writer.AddDocument(objDocument);
    }
}

This code does return the results in the order you wish. In fact it returns them in that order for this set if you exclude the boost. I'm not an expert on Lucene scoring, but I believe this is because you are matching "NY" exactly for "XYZ, NY", and the "John" query is a partial match. You can read the details printed out via the Explain class.

肩上的翅膀 2024-12-09 23:19:33

您尝试过 MultiFieldQueryParser 吗?

Have you tried MultiFieldQueryParser?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文