在 Lucene 搜索结果中查找项目位置的最佳方法

发布于 2025-01-04 11:56:19 字数 387 浏览 1 评论 0原文

我正在使用 Lucene.NET,并且能够搜索并获得 ScoreDoc[] 形式的命中结果。

我需要知道 ScoreDoc[] 中的具体项目位置。 ScoreDoc[] 中的所有项目都是唯一的。

示例代码: luceneSearcher.Search(查询,收集器); ScoreDoc[] Scores = Collector.TopDocs().scoreDocs

例如,我需要在 ScoreDoc[] 中查找项目位置,它具有自定义 ID 属性,其中值可以是“99999”。

我可以迭代 Scores[] 中的项目并检查与“99999”匹配的 ID 属性,然后返回位置,但这可能会影响性能,因为 Scores[] 可以包含数千个项目。

有没有更好的技术?

谢谢

I am using Lucene.NET and able to search get hit results as ScoreDoc[].

I need to know specific item position in ScoreDoc[]. All items in ScoreDoc[] are unique.

Sample code:
luceneSearcher.Search(query, collector);
ScoreDoc[] scores = collector.TopDocs().scoreDocs

For example, I need to get find item position in ScoreDoc[], which has custom ID property where value could be '99999'.

I can iterate through item in scores[] and check for ID property which matches '99999' then return the position, but this can have performance hit because scores[] can have thousands of items.

Is there any better technique?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

山色无中 2025-01-11 11:56:20

我想出了创建新的 ExtendedCollector 来存储 CollectedDocuments。

    public class ExtendedCollector : Collector
    {
        private Scorer _scorer;
        private Int32 _docBase;
        private List<CollectedDocument> _documents;

        public ExtendedCollector()
        {
            _documents = new List<CollectedDocument>();
        }

        public override void SetScorer(Scorer scorer)
        {
            _scorer = scorer;
        }

        public override void Collect(int doc)
        {
            var docId = _docBase + doc;
            var score = _scorer.Score();

            var currentDoc = _documents.FirstOrDefault(d => d.DocId == docId);

            if (currentDoc == null)
                _documents.Add(new CollectedDocument()
                                   {DocId = docId, Score = score, OriginalIndex = _documents.Count, Index = _documents.Count});
            else
                currentDoc.Score = score;
        }

        public override void SetNextReader(IndexReader reader, int docBase)
        {
            _docBase = docBase;
        }

        public override bool AcceptsDocsOutOfOrder()
        {
            return false;
        }

        public List<CollectedDocument> Documents
        {
            get { return _documents; }
        }

        public List<CollectedDocument> DocumentsByScore
        {
            get
            {
                var result = _documents.OrderByDescending(d => d.Score).ToList();
                var itemId = 0;
                foreach (var collectedDocument in result)
                {
                    itemId++;
                    collectedDocument.Index = itemId;
                }

                return result;
            }
        }
    }

CollectedDocument 看起来像这样

    public class CollectedDocument
    {
        public Int32 DocId { get; set; }
        public float Score { get; set; }
        public int OriginalIndex { get; set; }
        public int Index { get; set; }
    }

每当您想要获得结果时,您都会这样做

        var myCollector = new ExtendedCollector();
        searcher.Search(searchQuery, myCollector);

        foreach (var doc in myCollector.Documents)
        {
            var docIndex = doc.Index; //this is the current index in a list
            var originalIndex = doc.OriginalIndex; //this is item Id set when doc was collected
        }

您还可以使用按分数排序的文档

myCollector.DocumentsByScore

这可能不是最简单的解决方案,但它有效。如果有人有更好的解决方案,请发布,因为我也想知道。

I came up with creating new ExtendedCollector which stores CollectedDocuments.

    public class ExtendedCollector : Collector
    {
        private Scorer _scorer;
        private Int32 _docBase;
        private List<CollectedDocument> _documents;

        public ExtendedCollector()
        {
            _documents = new List<CollectedDocument>();
        }

        public override void SetScorer(Scorer scorer)
        {
            _scorer = scorer;
        }

        public override void Collect(int doc)
        {
            var docId = _docBase + doc;
            var score = _scorer.Score();

            var currentDoc = _documents.FirstOrDefault(d => d.DocId == docId);

            if (currentDoc == null)
                _documents.Add(new CollectedDocument()
                                   {DocId = docId, Score = score, OriginalIndex = _documents.Count, Index = _documents.Count});
            else
                currentDoc.Score = score;
        }

        public override void SetNextReader(IndexReader reader, int docBase)
        {
            _docBase = docBase;
        }

        public override bool AcceptsDocsOutOfOrder()
        {
            return false;
        }

        public List<CollectedDocument> Documents
        {
            get { return _documents; }
        }

        public List<CollectedDocument> DocumentsByScore
        {
            get
            {
                var result = _documents.OrderByDescending(d => d.Score).ToList();
                var itemId = 0;
                foreach (var collectedDocument in result)
                {
                    itemId++;
                    collectedDocument.Index = itemId;
                }

                return result;
            }
        }
    }

CollectedDocument looks like this

    public class CollectedDocument
    {
        public Int32 DocId { get; set; }
        public float Score { get; set; }
        public int OriginalIndex { get; set; }
        public int Index { get; set; }
    }

Whenever you want to get results you would do

        var myCollector = new ExtendedCollector();
        searcher.Search(searchQuery, myCollector);

        foreach (var doc in myCollector.Documents)
        {
            var docIndex = doc.Index; //this is the current index in a list
            var originalIndex = doc.OriginalIndex; //this is item Id set when doc was collected
        }

You can also get the documents ordered by score using

myCollector.DocumentsByScore

This might not be the easiest solution, but it works. If anyone has a better solution, please post it as I'd like to know that as well.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文