“洗牌” Lucene Hits 结果集

发布于 2024-12-07 10:22:10 字数 5624 浏览 1 评论 0原文

我有以下程序:

    public class Hit
    {
        readonly Hits _hits;
        readonly int _index;

        public Hit(Hits hits, int index)
        {
            this._hits = hits;
            this._index = index;
        }

        public int id { get { return _hits.Id(_index); } }
        public float score { get { return _hits.Score(_index); } }
        public string this[string key] { get { return _hits.Doc(_index).Get(key); } }
    }

    class HitList : IList<Hit>
    {
        protected Hits hits;

        public HitList(Hits hits)
        {
            this.hits = hits;
        }

        #region IList Members
        public int Add(object value) { throw new NotImplementedException(); }
        public void Clear() { throw new NotImplementedException(); }
        public bool Contains(object value) { throw new NotImplementedException(); }
        public int IndexOf(object value) { throw new NotImplementedException(); }
        public void Insert(int index, object value) { throw new NotImplementedException(); }
        public bool IsFixedSize { get { throw new NotImplementedException(); } }
        public bool IsReadOnly { get { throw new NotImplementedException(); } }
        public void Remove(object value) { throw new NotImplementedException(); }
        public void RemoveAt(int index) { throw new NotImplementedException(); }
        public object this[int index] { get { return new Hit(hits, index); } set { throw new NotImplementedException(); } }
        #endregion

        #region ICollection Members
        public void CopyTo(Array array, int index) { throw new NotImplementedException(); }
        public int Count { get { return hits.Length(); } }
        public bool IsSynchronized { get { throw new NotImplementedException(); } }
        public object SyncRoot { get { throw new NotImplementedException(); } }
        #endregion

        #region IEnumerable Members
        public System.Collections.IEnumerator GetEnumerator() { throw new NotImplementedException(); }
        #endregion

        #region IList<Hit> Members
        public int IndexOf(Hit item) { throw new NotImplementedException(); }
        public void Insert(int index, Hit item) { throw new NotImplementedException(); }
        Hit IList<Hit>.this[int index] { get { return new Hit(hits, index); } set { throw new NotImplementedException(); } }
        #endregion

        #region ICollection<Hit> Members
        public void Add(Hit item) { throw new NotImplementedException(); }
        public bool Contains(Hit item) { throw new NotImplementedException(); }
        public void CopyTo(Hit[] array, int arrayIndex) { throw new NotImplementedException(); }
        public bool Remove(Hit item) { throw new NotImplementedException(); }
        #endregion

        #region IEnumerable<Hit> Members
        IEnumerator<Hit> IEnumerable<Hit>.GetEnumerator() { throw new NotImplementedException(); }
        #endregion
    }
    private const string IndexFileLocation = @"C:\Users\Public\Index";
    private IList<Hit> _hits;

    public Form1()
    {
        InitializeComponent();
    }

    private void Form1_Load(object sender, EventArgs e)
    {
        Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory(IndexFileLocation, true);

        Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
        var indexWriter = new Lucene.Net.Index.IndexWriter(dir, analyzer, true);


        for (var i = 0; i < 10; i++)
        {
            var doc = new Lucene.Net.Documents.Document();
            var fldContent = new Lucene.Net.Documents.Field("content", "test " + i,
                                                             Lucene.Net.Documents.Field.Store.YES,
                                                             Lucene.Net.Documents.Field.Index.TOKENIZED,
                                                             Lucene.Net.Documents.Field.TermVector.YES);
            doc.Add(fldContent);
            indexWriter.AddDocument(doc);
        }
        indexWriter.Optimize();
        indexWriter.Close();

        var searcher = new Lucene.Net.Search.IndexSearcher(dir);
        var searchTerm = new Lucene.Net.Index.Term("content", "test");
        Lucene.Net.Search.Query query = new Lucene.Net.Search.TermQuery(searchTerm);

        Lucene.Net.Search.Hits hits = searcher.Search(query);
        for (var i = 0; i < hits.Length(); i++)
        {
            Document doc = hits.Doc(i);
            string contentValue = doc.Get("content");

            Debug.WriteLine(contentValue);
        }
        HitList h = new HitList(hits);

        h.Shuffle();

        for (var i = 0; i < h.Count; i++)
        {
            var z = (Hit)h[i];
            string contentValue = z.id.ToString();

            Debug.WriteLine(contentValue);
        }
    }
}

public static class SiteItemExtensions
{
    public static void Shuffle<T>(this IList<T> list)
    {
        var rng = new Random();
        int n = list.Count;
        while (n > 1)
        {
            n--;
            int k = rng.Next(n + 1);
            T value = list[k];
            list[k] = list[n];
            list[n] = value;
        }
    }
}

我想做的是“洗牌”从 Hits 集合中返回的结果。当我按原样运行该程序时,当我到达 h.Shuffle(); 行时,它会崩溃。我明白为什么要轰炸了。它的轰炸是因为它执行我的 Shuffle 扩展方法,然后依次尝试对数组值执行 set 操作,而我在 public 对象 this[int 上没有 set 实现索引] 行。

我的问题是,我无法实现一个集合,因为 Lucene id 和 Score 属性是只读的,这又解释了为什么 Apache 将它们设置为只读。我的问题是,如何“洗牌”或随机化我返回的点击次数?任何帮助将不胜感激。

I have the following program:

    public class Hit
    {
        readonly Hits _hits;
        readonly int _index;

        public Hit(Hits hits, int index)
        {
            this._hits = hits;
            this._index = index;
        }

        public int id { get { return _hits.Id(_index); } }
        public float score { get { return _hits.Score(_index); } }
        public string this[string key] { get { return _hits.Doc(_index).Get(key); } }
    }

    class HitList : IList<Hit>
    {
        protected Hits hits;

        public HitList(Hits hits)
        {
            this.hits = hits;
        }

        #region IList Members
        public int Add(object value) { throw new NotImplementedException(); }
        public void Clear() { throw new NotImplementedException(); }
        public bool Contains(object value) { throw new NotImplementedException(); }
        public int IndexOf(object value) { throw new NotImplementedException(); }
        public void Insert(int index, object value) { throw new NotImplementedException(); }
        public bool IsFixedSize { get { throw new NotImplementedException(); } }
        public bool IsReadOnly { get { throw new NotImplementedException(); } }
        public void Remove(object value) { throw new NotImplementedException(); }
        public void RemoveAt(int index) { throw new NotImplementedException(); }
        public object this[int index] { get { return new Hit(hits, index); } set { throw new NotImplementedException(); } }
        #endregion

        #region ICollection Members
        public void CopyTo(Array array, int index) { throw new NotImplementedException(); }
        public int Count { get { return hits.Length(); } }
        public bool IsSynchronized { get { throw new NotImplementedException(); } }
        public object SyncRoot { get { throw new NotImplementedException(); } }
        #endregion

        #region IEnumerable Members
        public System.Collections.IEnumerator GetEnumerator() { throw new NotImplementedException(); }
        #endregion

        #region IList<Hit> Members
        public int IndexOf(Hit item) { throw new NotImplementedException(); }
        public void Insert(int index, Hit item) { throw new NotImplementedException(); }
        Hit IList<Hit>.this[int index] { get { return new Hit(hits, index); } set { throw new NotImplementedException(); } }
        #endregion

        #region ICollection<Hit> Members
        public void Add(Hit item) { throw new NotImplementedException(); }
        public bool Contains(Hit item) { throw new NotImplementedException(); }
        public void CopyTo(Hit[] array, int arrayIndex) { throw new NotImplementedException(); }
        public bool Remove(Hit item) { throw new NotImplementedException(); }
        #endregion

        #region IEnumerable<Hit> Members
        IEnumerator<Hit> IEnumerable<Hit>.GetEnumerator() { throw new NotImplementedException(); }
        #endregion
    }
    private const string IndexFileLocation = @"C:\Users\Public\Index";
    private IList<Hit> _hits;

    public Form1()
    {
        InitializeComponent();
    }

    private void Form1_Load(object sender, EventArgs e)
    {
        Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory(IndexFileLocation, true);

        Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
        var indexWriter = new Lucene.Net.Index.IndexWriter(dir, analyzer, true);


        for (var i = 0; i < 10; i++)
        {
            var doc = new Lucene.Net.Documents.Document();
            var fldContent = new Lucene.Net.Documents.Field("content", "test " + i,
                                                             Lucene.Net.Documents.Field.Store.YES,
                                                             Lucene.Net.Documents.Field.Index.TOKENIZED,
                                                             Lucene.Net.Documents.Field.TermVector.YES);
            doc.Add(fldContent);
            indexWriter.AddDocument(doc);
        }
        indexWriter.Optimize();
        indexWriter.Close();

        var searcher = new Lucene.Net.Search.IndexSearcher(dir);
        var searchTerm = new Lucene.Net.Index.Term("content", "test");
        Lucene.Net.Search.Query query = new Lucene.Net.Search.TermQuery(searchTerm);

        Lucene.Net.Search.Hits hits = searcher.Search(query);
        for (var i = 0; i < hits.Length(); i++)
        {
            Document doc = hits.Doc(i);
            string contentValue = doc.Get("content");

            Debug.WriteLine(contentValue);
        }
        HitList h = new HitList(hits);

        h.Shuffle();

        for (var i = 0; i < h.Count; i++)
        {
            var z = (Hit)h[i];
            string contentValue = z.id.ToString();

            Debug.WriteLine(contentValue);
        }
    }
}

public static class SiteItemExtensions
{
    public static void Shuffle<T>(this IList<T> list)
    {
        var rng = new Random();
        int n = list.Count;
        while (n > 1)
        {
            n--;
            int k = rng.Next(n + 1);
            T value = list[k];
            list[k] = list[n];
            list[n] = value;
        }
    }
}

What I am trying to do is "shuffle" the results I get back from the Hits collection. When I run this program, as is, it bombs when I get to the h.Shuffle(); line. I understand why its bombing. Its bombing because its executing my Shuffle extension method, when in turn, is trying to do a set operation on an array value and I do not have a set implementation on the public object this[int index] line.

My problem is, I can't implement a set because the Lucene id and score properties are read only, which, again, makes sense why Apache made them read only. My question is, how can I "shuffle" or randomize the Hits that I'm getting back? Any help would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

抚你发端 2024-12-14 10:22:10

所讨论的打乱搜索结果的方法可能存在一些性能问题。

首先,如果我没记错的话,Hits 类会进行本地文档缓存,并每 100 个文档重复搜索一次。因此,枚举所有搜索结果将需要“HitCount/100”搜索。

其次,加载文档是 Lucene.Net 中成本最高的部分之一。只是为了能够随机播放,加载所有搜索结果可能不是一个好的选择。

我更喜欢“随机评分”方法,如下所示:

public class RandomScoreQuery : Lucene.Net.Search.Function.CustomScoreQuery
{
   Random r = new Random((int)(DateTime.Now.Ticks & 0x7fffffff));
   public RandomScoreQuery(Query q): base(q)
   {
   }
   public override float CustomScore(int doc, float subQueryScore, float valSrcScore)
   {
       return r.Next(10000) / 1000.0f; //rand scores between 0-10
   }
} 

Query q1 =  new TermQuery(new Term("content", "test"));
Query q2 = new RandomScoreQuery(q1);
TopDocs td = src.Search(q2, 100);

There might be some performance problems with the approach in question for shuffling search results.

First, If I recall correctly, Hits class does a local document caching and repeats the search for every 100 documents. So, enumarating all search results would require "HitCount/100" searches.

Second, loading a document is one of the most costly parts of the Lucene.Net. Just to be able to shuffle, loading all search results may not be a good choise.

I would prefer a "random scoring" approach as below:

public class RandomScoreQuery : Lucene.Net.Search.Function.CustomScoreQuery
{
   Random r = new Random((int)(DateTime.Now.Ticks & 0x7fffffff));
   public RandomScoreQuery(Query q): base(q)
   {
   }
   public override float CustomScore(int doc, float subQueryScore, float valSrcScore)
   {
       return r.Next(10000) / 1000.0f; //rand scores between 0-10
   }
} 

Query q1 =  new TermQuery(new Term("content", "test"));
Query q2 = new RandomScoreQuery(q1);
TopDocs td = src.Search(q2, 100);
人│生佛魔见 2024-12-14 10:22:10

您需要将您的点击复制到适当的数据结构并在那里进行排序;根本问题是 Hits 类型不适合修改。

对于洗牌,我相信这应该可以解决问题:

var shuffledHits = hits.Cast<Hit>().OrderBy(h => rng.Next());

You need to copy your hits to an appropiate data structure and do your sorting there; the underlying problem is that the Hits type is not intended for modification.

For the shuffling, I believe this should do the trick:

var shuffledHits = hits.Cast<Hit>().OrderBy(h => rng.Next());
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文