自定义 Lucene HitCollector C#

发布于 2024-10-03 14:14:04 字数 312 浏览 2 评论 0原文

有谁有与 Lucene 自定义命中收集器的实现相关的 C# 示例代码吗?

我正在尝试从索引中获取按文档类型排列的点击摘要。我可以迭代点击对象,但考虑到潜在的点击次数,我试图避免这种开销。

我找到了一个使用 Java 的示例,但在 C# 中实现时遇到困难,

例如。 Lucene - 使用 HitCollector

一如既往,任何指针都会乐于助人

Does anyone have any C# sample code relating to the implementation of a Lucene Custom Hit Collector.

I am trying to get a summary of hits by document type from my indexes. I could iterate through the hits object but given the potential number of hits I am trying to avoid this overhead.

I have found an example using Java but am having difficulties in implementing in C#

eg. Lucene - using the HitCollector

As always any pointers would be helpful

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

谜兔 2024-10-10 14:14:04

我自己也遇到了问题,所以我查看了源中的一个收集器并对其进行了修改,希望它有所帮助

using System;
using IndexReader = Lucene.Net.Index.IndexReader;

namespace Lucene.Net.Search
{ 

public abstract class RestrictedScoreDocCollector : TopDocsCollector
{

    // Assumes docs are scored in order.
    private class InOrderTopScoreDocCollector : RestrictedScoreDocCollector
    {
        private Predicate<int> filter;
        private bool hasFilter = false;
        internal InOrderTopScoreDocCollector(int numHits, Predicate<int> filter)
            : base(numHits)
        {
            this.filter = filter;
            this.hasFilter = (filter != null);
        }

        public override void Collect(int doc)
        {
            if (this.hasFilter && !this.filter(doc))
            {
                return;
            }
            float score = scorer.Score();

            // This collector cannot handle these scores:
            System.Diagnostics.Debug.Assert(score != float.NegativeInfinity);
            System.Diagnostics.Debug.Assert(!float.IsNaN(score));

            totalHits++;

            if (score <= pqTop.score)
            {
                // Since docs are returned in-order (i.e., increasing doc Id), a document
                // with equal score to pqTop.score cannot compete since HitQueue favors
                // documents with lower doc Ids. Therefore reject those docs too.
                return;
            }
            pqTop.doc = doc + docBase;
            pqTop.score = score;
            pqTop = (ScoreDoc)pq.UpdateTop();
        }

        public override bool AcceptsDocsOutOfOrder()
        {
            return false;
        }
    }

    // Assumes docs are scored out of order.
    private class OutOfOrderTopScoreDocCollector : RestrictedScoreDocCollector
    {
        private Predicate<int> filter;
        private bool hasFilter = false;

        internal OutOfOrderTopScoreDocCollector(int numHits, Predicate<int> filter)
            : base(numHits)
        {
            this.filter = filter;
            this.hasFilter = (filter != null);
        }

        public override void Collect(int doc)
        {
            if (this.hasFilter &&  !this.filter(doc))
            {
                return;
            }

            float score = scorer.Score();

            // This collector cannot handle NaN
            System.Diagnostics.Debug.Assert(!float.IsNaN(score));

            totalHits++;
            doc += docBase;
            if (score < pqTop.score || (score == pqTop.score && doc > pqTop.doc))
            {
                return;
            }
            pqTop.doc = doc;
            pqTop.score = score;
            pqTop = (ScoreDoc)pq.UpdateTop();
        }

        public override bool AcceptsDocsOutOfOrder()
        {
            return true;
        }
    }

    /// <summary> Creates a new {@link TopScoreDocCollector} given the number of hits to
    /// collect and whether documents are scored in order by the input
    /// {@link Scorer} to {@link #SetScorer(Scorer)}.
    /// 
    /// <p/><b>NOTE</b>: The instances returned by this method
    /// pre-allocate a full array of length
    /// <code>numHits</code>, and fill the array with sentinel
    /// objects.
    /// </summary>
    public static RestrictedScoreDocCollector create(int numHits, bool docsScoredInOrder,Predicate<int> filter)
    {

        if (docsScoredInOrder)
        {
            return new InOrderTopScoreDocCollector(numHits,filter);
        }
        else
        {
            return new OutOfOrderTopScoreDocCollector(numHits,filter);
        }
    }

    internal ScoreDoc pqTop;
    internal int docBase = 0;
    internal Scorer scorer;

    // prevents instantiation
    private RestrictedScoreDocCollector(int numHits)
        : base(new HitQueue(numHits, true))
    {
        // HitQueue implements getSentinelObject to return a ScoreDoc, so we know
        // that at this point top() is already initialized.
        pqTop = (ScoreDoc)pq.Top();
    }

    public /*protected internal*/ override TopDocs NewTopDocs(ScoreDoc[] results, int start)
    {
        if (results == null)
        {
            return EMPTY_TOPDOCS;
        }

        // We need to compute maxScore in order to set it in TopDocs. If start == 0,
        // it means the largest element is already in results, use its score as
        // maxScore. Otherwise pop everything else, until the largest element is
        // extracted and use its score as maxScore.
        float maxScore = System.Single.NaN;
        if (start == 0)
        {
            maxScore = results[0].score;
        }
        else
        {
            for (int i = pq.Size(); i > 1; i--)
            {
                pq.Pop();
            }
            maxScore = ((ScoreDoc)pq.Pop()).score;
        }

        return new TopDocs(totalHits, results, maxScore);
    }

    public override void SetNextReader(IndexReader reader, int base_Renamed)
    {
        docBase = base_Renamed;
    }

    public override void SetScorer(Scorer scorer)
    {
        this.scorer = scorer;
    }
}

}

I had problems with this myself so I looked at one of the collectors in the source and modified it, hope it helps

using System;
using IndexReader = Lucene.Net.Index.IndexReader;

namespace Lucene.Net.Search
{ 

public abstract class RestrictedScoreDocCollector : TopDocsCollector
{

    // Assumes docs are scored in order.
    private class InOrderTopScoreDocCollector : RestrictedScoreDocCollector
    {
        private Predicate<int> filter;
        private bool hasFilter = false;
        internal InOrderTopScoreDocCollector(int numHits, Predicate<int> filter)
            : base(numHits)
        {
            this.filter = filter;
            this.hasFilter = (filter != null);
        }

        public override void Collect(int doc)
        {
            if (this.hasFilter && !this.filter(doc))
            {
                return;
            }
            float score = scorer.Score();

            // This collector cannot handle these scores:
            System.Diagnostics.Debug.Assert(score != float.NegativeInfinity);
            System.Diagnostics.Debug.Assert(!float.IsNaN(score));

            totalHits++;

            if (score <= pqTop.score)
            {
                // Since docs are returned in-order (i.e., increasing doc Id), a document
                // with equal score to pqTop.score cannot compete since HitQueue favors
                // documents with lower doc Ids. Therefore reject those docs too.
                return;
            }
            pqTop.doc = doc + docBase;
            pqTop.score = score;
            pqTop = (ScoreDoc)pq.UpdateTop();
        }

        public override bool AcceptsDocsOutOfOrder()
        {
            return false;
        }
    }

    // Assumes docs are scored out of order.
    private class OutOfOrderTopScoreDocCollector : RestrictedScoreDocCollector
    {
        private Predicate<int> filter;
        private bool hasFilter = false;

        internal OutOfOrderTopScoreDocCollector(int numHits, Predicate<int> filter)
            : base(numHits)
        {
            this.filter = filter;
            this.hasFilter = (filter != null);
        }

        public override void Collect(int doc)
        {
            if (this.hasFilter &&  !this.filter(doc))
            {
                return;
            }

            float score = scorer.Score();

            // This collector cannot handle NaN
            System.Diagnostics.Debug.Assert(!float.IsNaN(score));

            totalHits++;
            doc += docBase;
            if (score < pqTop.score || (score == pqTop.score && doc > pqTop.doc))
            {
                return;
            }
            pqTop.doc = doc;
            pqTop.score = score;
            pqTop = (ScoreDoc)pq.UpdateTop();
        }

        public override bool AcceptsDocsOutOfOrder()
        {
            return true;
        }
    }

    /// <summary> Creates a new {@link TopScoreDocCollector} given the number of hits to
    /// collect and whether documents are scored in order by the input
    /// {@link Scorer} to {@link #SetScorer(Scorer)}.
    /// 
    /// <p/><b>NOTE</b>: The instances returned by this method
    /// pre-allocate a full array of length
    /// <code>numHits</code>, and fill the array with sentinel
    /// objects.
    /// </summary>
    public static RestrictedScoreDocCollector create(int numHits, bool docsScoredInOrder,Predicate<int> filter)
    {

        if (docsScoredInOrder)
        {
            return new InOrderTopScoreDocCollector(numHits,filter);
        }
        else
        {
            return new OutOfOrderTopScoreDocCollector(numHits,filter);
        }
    }

    internal ScoreDoc pqTop;
    internal int docBase = 0;
    internal Scorer scorer;

    // prevents instantiation
    private RestrictedScoreDocCollector(int numHits)
        : base(new HitQueue(numHits, true))
    {
        // HitQueue implements getSentinelObject to return a ScoreDoc, so we know
        // that at this point top() is already initialized.
        pqTop = (ScoreDoc)pq.Top();
    }

    public /*protected internal*/ override TopDocs NewTopDocs(ScoreDoc[] results, int start)
    {
        if (results == null)
        {
            return EMPTY_TOPDOCS;
        }

        // We need to compute maxScore in order to set it in TopDocs. If start == 0,
        // it means the largest element is already in results, use its score as
        // maxScore. Otherwise pop everything else, until the largest element is
        // extracted and use its score as maxScore.
        float maxScore = System.Single.NaN;
        if (start == 0)
        {
            maxScore = results[0].score;
        }
        else
        {
            for (int i = pq.Size(); i > 1; i--)
            {
                pq.Pop();
            }
            maxScore = ((ScoreDoc)pq.Pop()).score;
        }

        return new TopDocs(totalHits, results, maxScore);
    }

    public override void SetNextReader(IndexReader reader, int base_Renamed)
    {
        docBase = base_Renamed;
    }

    public override void SetScorer(Scorer scorer)
    {
        this.scorer = scorer;
    }
}

}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文