如何在不指明“top n”的情况下搜索 Lucene.NET达到极限了吗?

发布于 2024-10-13 04:22:02 字数 279 浏览 7 评论 0原文

Lucene 中有几个 IndexSearcher.Search 方法的重载。其中一些需要“前 n 个点击”参数,有些则不需要(这些已过时,将在 Lucene.NET 3.0 中删除)。

那些需要“top n”参数的参数实际上会导致整个可能的结果范围的内存预分配。因此,当您甚至无法大致估计返回结果的数量时,唯一的机会是传递一个随机的大数字以确保返回所有查询结果。这会导致严重的内存压力和 LOH 碎片造成的泄漏。

有没有一种官方的、不过时的方法来搜索而不传递“top n”参数?

预先感谢各位。

There are several overloads of IndexSearcher.Search method in Lucene. Some of them require "top n hits" argument, some don't (these are obsolete and will be removed in Lucene.NET 3.0).

Those, which require "top n" argument actually cause memory preallocation for this entire posible range of results. So when you're in situation when you can't even approximately estimate count of results returned, the only opportunity is to pass a random large number to ensure that all query results will be returned. This causes severe memory pressure and leaks due to LOH fragmentation.

Is there an oficial not outdated way to search without passing "top n" argument?

Thanks in advance, guys.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

番薯 2024-10-20 04:22:02

我使用 Lucene.NET 2.9.2 作为这个答案的参考点。

您可以构建一个自定义收集器,并将其传递给搜索重载之一。

using System;
using System.Collections.Generic;
using Lucene.Net.Index;
using Lucene.Net.Search;

public class AwesomeCollector : Collector {
    private readonly List<Int32> _docIds = new List<Int32>();
    private Scorer _scorer;
    private Int32 _docBase;

    public IEnumerable<Int32> DocumentIds {
        get { return _docIds; }
    }

    public override void SetScorer(Scorer scorer) {
        _scorer = scorer;
    }

    public override void Collect(Int32 doc) {
        var score = _scorer.Score();
        if (_lowerInclusiveScore <= score)
            _docIds.Add(_docBase + doc);
    }

    public override void SetNextReader(IndexReader reader, Int32 docBase) {
        _docBase = docBase;
    }

    public override bool AcceptsDocsOutOfOrder() {
        return true;
    }
}

I'm using Lucene.NET 2.9.2 as reference point for this answer.

You could build a custom collector which you pass to one of the search overloads.

using System;
using System.Collections.Generic;
using Lucene.Net.Index;
using Lucene.Net.Search;

public class AwesomeCollector : Collector {
    private readonly List<Int32> _docIds = new List<Int32>();
    private Scorer _scorer;
    private Int32 _docBase;

    public IEnumerable<Int32> DocumentIds {
        get { return _docIds; }
    }

    public override void SetScorer(Scorer scorer) {
        _scorer = scorer;
    }

    public override void Collect(Int32 doc) {
        var score = _scorer.Score();
        if (_lowerInclusiveScore <= score)
            _docIds.Add(_docBase + doc);
    }

    public override void SetNextReader(IndexReader reader, Int32 docBase) {
        _docBase = docBase;
    }

    public override bool AcceptsDocsOutOfOrder() {
        return true;
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文