调整从列表中选择项目的机会

发布于 2024-08-08 10:06:36 字数 243 浏览 5 评论 0原文

我有一份物品清单。当我创建列表时,每个项目都有同等的机会被选择。但当一个项目被选择时,它的机会就会下降,而其他项目的机会就会上升。如果在此过程中添加了新项目,则该项目被选择的机会应该最高,但随着选择的机会会下降。我正在寻找一个可以完成此任务的好算法,即 C#。

概括的想法: 我有 5 个项目,随着时间的推移,所有 5 个项目将有 20% 的时间被选择。我试图将选择率尽可能保持在 20% 左右,减少异常值。如果存在,它将被更多/更少地选择以使其恢复一致。

I have a list of items. When I create the list each item has equal chance to be selected. But as an item is selected its chance goes down while the others chance goes up. If a new item is added during the process, it should have the highest chance to be selected with its chances falling off as it is selected. I am looking for a good algorithm that can accomplish this is C#.

Generalizaed idea:
I have 5 items, over time all 5 items will be selected 20% of time time. I am trying to keep the selections a close to that 20% as possible, cutting down on outlyers. If one exists it will be selected more/less to bring it back in line.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

孤者何惧 2024-08-15 10:06:36

使用存储桶加权队列:不要使用列表,而是将集合划分为多个存储桶 - 每个存储桶都有一个关联的检索频率。项目在被选择时从较高频率的存储桶移动到较低频率的存储桶。

实现此目的的一个简单方法是为每个存储桶分配一个值范围,并在组合范围内生成一个随机数以从中选择一个存储桶。您可能希望将此集合抽象为某种类,这样您就不会向消费者公开详细信息。

算法:

最初,所有项目都从同一个(顶级)存储桶中开始。

当您选择一个项目时,将其从所在的存储桶移动到下一个较低的存储桶。如有必要,创建下一级存储桶。

添加新项目时,将其添加到最顶部(最常用)的存储桶中。

要随机选择一个项目,请先选择一个桶,然后选择该桶内的一个项目。将所选项目向下移动到下一级存储桶中。请注意,将项目下移到较低频率的存储桶是可选的 - 您可以设置一些截止点。

创建新存储桶时,更新与所有存储桶关联的检索范围,以使新集合具有您想要的频率分布特征。

通用分桶加权队列的 C# 实现(第一次切割):

using System;
using System.Collections.Generic;
using System.Linq;

namespace Utility
{
    public class BucketWeightedQueue<T>
    {
        private int m_MaxFallOff;
        private readonly double m_FallOffRate;
        private readonly List<List<T>> m_Buckets;
        private readonly List<int> m_FallOffFactors;
        private readonly Random m_Rand = new Random();

        public BucketWeightedQueue(IEnumerable<T> items, double fallOffRate )
        {
            if( fallOffRate < 0 ) 
                throw new ArgumentOutOfRangeException("fallOffRate");

            m_MaxFallOff = 1;
            m_FallOffRate = fallOffRate;
            m_Buckets = new List<List<T>> { items.ToList() };
            m_FallOffFactors = new List<int> { m_MaxFallOff };
        }

        public T GetRandomItem()
        {
            var bucketIndex = GetRandomBucketIndex();
            var bucket = m_Buckets[bucketIndex];
            var itemIndex = m_Rand.Next( bucket.Count );
            var selectedItem = bucket[itemIndex];

            ReduceItemProbability( bucketIndex, itemIndex );
            return selectedItem;
        }

        private int GetRandomBucketIndex()
        {
            var randFallOffValue = m_Rand.Next( m_MaxFallOff );
            for (int i = 0; i < m_FallOffFactors.Count; i++ )
                if( m_FallOffFactors[i] <= randFallOffValue )
                    return i;
            return m_FallOffFactors[0];
        }

        private void ReduceItemProbability( int bucketIndex, int itemIndex )
        {
            if( m_FallOffRate <= 0 )
                return; // do nothing if there is no fall off rate...

            var item = m_Buckets[bucketIndex][itemIndex];
            m_Buckets[bucketIndex].RemoveAt( itemIndex );

            if( bucketIndex <= m_Buckets.Count )
            { 
                // create a new bucket...
                m_Buckets.Add( new List<T>() );
                m_MaxFallOff = (int)(m_FallOffFactors[bucketIndex] / m_FallOffRate);
                m_FallOffFactors.Add( m_MaxFallOff );
            }

            var nextBucket = m_Buckets[bucketIndex + 1];
            nextBucket.Add( item );

            if (m_Buckets[bucketIndex].Count == 0) // drop empty buckets
                m_Buckets.RemoveAt( bucketIndex );
        }
    }
}

Use a bucket weighted queue: Instead of using a list, divide your collection into buckets - each bucket has an associated frequency of retrieval. Items move from higher frequency buckets to lower frequency ones as they are selected.

An easy way to implement this is to assign each bucket a range of values, and generate a random number within the combined range to pick a bucket to choose from. You'll probably want to abstract this collection into some kind of class so that you don't expose consumers to the details.

Algorithm:

Initially, all items start in the same (top-level) bucket.

When you select an item, move it from the bucket it's in, to the next lower bucket. If necessary, create the next level bucket.

When a new item is added, add it to the top most (most frequently used) bucket.

To randomly select an item, first pick a bucket, then pick an item within the bucket. Move the selected item down into the next level bucket. Note, that moving an item down to a lower frequency bucket is optional - you can set some cutoff point.

When creating a new bucket, update the retrieval range associated with all buckets to give the new set the frequency distribution characteristic you want.

C# implementation (first cut) of a generic bucketed weighted queue:

using System;
using System.Collections.Generic;
using System.Linq;

namespace Utility
{
    public class BucketWeightedQueue<T>
    {
        private int m_MaxFallOff;
        private readonly double m_FallOffRate;
        private readonly List<List<T>> m_Buckets;
        private readonly List<int> m_FallOffFactors;
        private readonly Random m_Rand = new Random();

        public BucketWeightedQueue(IEnumerable<T> items, double fallOffRate )
        {
            if( fallOffRate < 0 ) 
                throw new ArgumentOutOfRangeException("fallOffRate");

            m_MaxFallOff = 1;
            m_FallOffRate = fallOffRate;
            m_Buckets = new List<List<T>> { items.ToList() };
            m_FallOffFactors = new List<int> { m_MaxFallOff };
        }

        public T GetRandomItem()
        {
            var bucketIndex = GetRandomBucketIndex();
            var bucket = m_Buckets[bucketIndex];
            var itemIndex = m_Rand.Next( bucket.Count );
            var selectedItem = bucket[itemIndex];

            ReduceItemProbability( bucketIndex, itemIndex );
            return selectedItem;
        }

        private int GetRandomBucketIndex()
        {
            var randFallOffValue = m_Rand.Next( m_MaxFallOff );
            for (int i = 0; i < m_FallOffFactors.Count; i++ )
                if( m_FallOffFactors[i] <= randFallOffValue )
                    return i;
            return m_FallOffFactors[0];
        }

        private void ReduceItemProbability( int bucketIndex, int itemIndex )
        {
            if( m_FallOffRate <= 0 )
                return; // do nothing if there is no fall off rate...

            var item = m_Buckets[bucketIndex][itemIndex];
            m_Buckets[bucketIndex].RemoveAt( itemIndex );

            if( bucketIndex <= m_Buckets.Count )
            { 
                // create a new bucket...
                m_Buckets.Add( new List<T>() );
                m_MaxFallOff = (int)(m_FallOffFactors[bucketIndex] / m_FallOffRate);
                m_FallOffFactors.Add( m_MaxFallOff );
            }

            var nextBucket = m_Buckets[bucketIndex + 1];
            nextBucket.Add( item );

            if (m_Buckets[bucketIndex].Count == 0) // drop empty buckets
                m_Buckets.RemoveAt( bucketIndex );
        }
    }
}
冬天旳寂寞 2024-08-15 10:06:36

在这里,我们将设计一个随机数生成器,其分布有利于低值。您可以使用它来优先选择列表开头的项目。要减少选择某项的可能性,请将该项目向下移动到列表中。您可以通过多种方式选择如何将项目移至列表下方。让我们首先回顾一下随机变量变换。

通过将以下函数应用于 0 和 1 之间的均匀随机变量:

index = Int(l*(1-r^(0.5)) # l=length, r=uniform random var between 0 and 1

您将得到一个很酷的分布,它大大降低了较大索引的几率

p(0)=0.09751
p(1)=0.09246
p(2)=0.08769
p(3)=0.08211
p(4)=0.07636
p(5)=0.07325
p(6)=0.06772
p(7)=0.06309
p(8)=0.05813
p(9)=0.05274
p(10)=0.04808
p(11)=0.04205
p(12)=0.03691
p(13)=0.03268
p(14)=0.02708
p(15)=0.02292
p(16)=0.01727
p(17)=0.01211
p(18)=0.00736
p(19)=0.00249

的列表的分布

0.75139
0.24862

这是大小为 2大小 3

0.55699
0.33306
0.10996

大小 4

0.43916
0.31018
0.18836
0.06231

现在让我们讨论以下两个选项:将项目在列表中向下移动。我测试了两个:

  • ToEnd - 将最近选择的项目移至列表末尾

  • 排序 - 保留每个项目被选择的次数的关联数组,并按选择次数从最少到最多的顺序对列表进行排序。

我创建了一个模拟来从列表中进行选择并检查每个项目被选择的计数的标准偏差。标准差越低越好。例如,对包含 10 个项目的列表进行 1 次模拟,其中 50 个选择创建了分布:

{"a"=>5, "b"=>5, "c"=>6, "d"=>5, "e"=>4, "f"=>4, "g"=>5, "h"=>5, "i"=>6, "j"=>5}

此模拟的标准偏差

0.63

为 有了运行模拟的能力,然后我通过运行模拟 500 次并提供一些元统计数据来计算每种方法的平均标准差:ToEnd 和 Sort。我预计选择次数越少,标准偏差就会越高,但实际上,对于 ToEnd 算法,标准偏差随着选择数量的增加而增加。排序方法解决了这个问题。

Testing ["a", "b", "c", "d", "e"]
-------------------------
Picks    ToEnd (StdDev) Sort (StdDev)
5       0.59        0.57
10      0.76        0.68
15      0.93        0.74
20      1.04        0.74
25      1.20        0.73
30      1.28        0.73
35      1.34        0.74
40      1.50        0.75
45      1.53        0.75
45      1.56        0.77
80      2.04        0.79
125     2.52        0.74
180     3.11        0.77
245     3.53        0.79
320     4.05        0.75
405     4.52        0.76
500     5.00        0.78

以下是较大组的一些测试结果。

Testing ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]
-------------------------
Picks  ToEnd (StdDev)  Sort (StdDev)
10          0.68        0.65
20          0.87        0.77
30          1.04        0.80
40          1.18        0.82
50          1.30        0.85
60          1.43        0.84
70          1.57        0.87
80          1.65        0.88
90          1.73        0.87
90          1.71        0.87
160         2.30        0.89
250         2.83        0.88
360         3.44        0.88
490         3.89        0.88
640         4.54        0.87
810         5.12        0.88
1000        5.66        0.85

有了一个好的测试框架,我忍不住尝试不同的随机数转换。我的假设是,如果我取 x 的立方根而不是平方根,我的标准差就会减小。事实上确实如此,但我担心这会降低随机性。在这里,当公式更改为“

index = Int(l*(1-r^(0.33)) # l=length, r=uniform random var between 0 and 1

现在检查实际选择”时,您可以观察一些模拟。正如我所想,它一开始就非常重视列表的开头。如果你想权重这么大,你可能应该在开始之前随机化你的列表。

StdDev = 0.632455532033676
{"a"=>10, "b"=>10, "c"=>11, "d"=>9, "e"=>10}
a d e c b c e b a d b e c a d d e b a e e c c b d a d c b c e b a a d d b e a e a b c b d c a c e c

StdDev = 0.0
{"a"=>10, "b"=>10, "c"=>10, "d"=>10, "e"=>10}
b c d a a d b c b a d e c d e c b e b a e e d c c a b a d a e e b d b a e c c e b a c c d d d a b e

StdDev = 0.632455532033676
{"a"=>9, "b"=>10, "c"=>10, "d"=>10, "e"=>11}
b d a e b c a d c e e b a c a d d c b c e d a e b b a c d c d a e a e e b d c b e a b c b c d d e e

StdDev = 0.0
{"a"=>10, "b"=>10, "c"=>10, "d"=>10, "e"=>10}
a b e d c e a b d c b c c a d b e e b e a d d c e a d b b c c a a a b e d d e c c a b b e a c d e d

StdDev = 0.632455532033676
{"a"=>11, "b"=>10, "c"=>9, "d"=>10, "e"=>10}
b a d c d c a e b e a e b c d b c a a d e e d c d e c b a b b e d c d b c e a a a d b c e b e a d a

Here we will engineer a random number generator that has a distribution that favors low values. You can use it to prefer items at the beginning of a list. To decrease the odds of something being selected, move that item down the list. You have a few options for how you want to move the item down the list. Lets review the random variable transformation first.

By applying the following function to a uniform random variable between 0 and 1:

index = Int(l*(1-r^(0.5)) # l=length, r=uniform random var between 0 and 1

You get a cool distribution that drastically reduces the odds of a larger index

p(0)=0.09751
p(1)=0.09246
p(2)=0.08769
p(3)=0.08211
p(4)=0.07636
p(5)=0.07325
p(6)=0.06772
p(7)=0.06309
p(8)=0.05813
p(9)=0.05274
p(10)=0.04808
p(11)=0.04205
p(12)=0.03691
p(13)=0.03268
p(14)=0.02708
p(15)=0.02292
p(16)=0.01727
p(17)=0.01211
p(18)=0.00736
p(19)=0.00249

Here is the distribution for a list of size 2

0.75139
0.24862

Size 3

0.55699
0.33306
0.10996

Size 4

0.43916
0.31018
0.18836
0.06231

Now lets discuss the two options for moving the items down the list. I tested two:

  • ToEnd - Move most recently selected item to end of list

  • Sort - Keep an associated array of the number of times each item has been selected and sort the list from least to most selected.

I created a simulation to pick from a list and examine the standard deviation of the count that each item was selected. The lower the standard deviation the better. For example, 1 simulation for a list of 10 items where 50 selections where made created the spread:

{"a"=>5, "b"=>5, "c"=>6, "d"=>5, "e"=>4, "f"=>4, "g"=>5, "h"=>5, "i"=>6, "j"=>5}

The Standard Devation for this simulation was

0.63

With the ability to run a simulation, I then computed some meta-statistics by running the simulation 500 times and providing the average standard deviation for each method: ToEnd and Sort. I expected for the standard deviation to be high for low # of picks, but in fact for the ToEnd algorithm the Standard Deviation increased with the number of picks. The sort method fixed this.

Testing ["a", "b", "c", "d", "e"]
-------------------------
Picks    ToEnd (StdDev) Sort (StdDev)
5       0.59        0.57
10      0.76        0.68
15      0.93        0.74
20      1.04        0.74
25      1.20        0.73
30      1.28        0.73
35      1.34        0.74
40      1.50        0.75
45      1.53        0.75
45      1.56        0.77
80      2.04        0.79
125     2.52        0.74
180     3.11        0.77
245     3.53        0.79
320     4.05        0.75
405     4.52        0.76
500     5.00        0.78

Here are some test results for a larger set.

Testing ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]
-------------------------
Picks  ToEnd (StdDev)  Sort (StdDev)
10          0.68        0.65
20          0.87        0.77
30          1.04        0.80
40          1.18        0.82
50          1.30        0.85
60          1.43        0.84
70          1.57        0.87
80          1.65        0.88
90          1.73        0.87
90          1.71        0.87
160         2.30        0.89
250         2.83        0.88
360         3.44        0.88
490         3.89        0.88
640         4.54        0.87
810         5.12        0.88
1000        5.66        0.85

With a good test framework, I couldn't resist trying a different random number transformation. My assumption was if I take the cube root instead of square root of x that my standard deviation would decrease. It did in fact, but I was worried that this would decrease the randomness. Here you can observe a few simulations when the formula is changed to

index = Int(l*(1-r^(0.33)) # l=length, r=uniform random var between 0 and 1

Now inspect the actual picks. As I thought, its very weighted to the beginning of the list at first. If you want to weight this heavily, you should probably randomize your list before you start.

StdDev = 0.632455532033676
{"a"=>10, "b"=>10, "c"=>11, "d"=>9, "e"=>10}
a d e c b c e b a d b e c a d d e b a e e c c b d a d c b c e b a a d d b e a e a b c b d c a c e c

StdDev = 0.0
{"a"=>10, "b"=>10, "c"=>10, "d"=>10, "e"=>10}
b c d a a d b c b a d e c d e c b e b a e e d c c a b a d a e e b d b a e c c e b a c c d d d a b e

StdDev = 0.632455532033676
{"a"=>9, "b"=>10, "c"=>10, "d"=>10, "e"=>11}
b d a e b c a d c e e b a c a d d c b c e d a e b b a c d c d a e a e e b d c b e a b c b c d d e e

StdDev = 0.0
{"a"=>10, "b"=>10, "c"=>10, "d"=>10, "e"=>10}
a b e d c e a b d c b c c a d b e e b e a d d c e a d b b c c a a a b e d d e c c a b b e a c d e d

StdDev = 0.632455532033676
{"a"=>11, "b"=>10, "c"=>9, "d"=>10, "e"=>10}
b a d c d c a e b e a e b c d b c a a d e e d c d e c b a b b e d c d b c e a a a d b c e b e a d a
强辩 2024-08-15 10:06:36

这完全取决于选择给定项目时被选择的概率如何变化。

实现此目的的一个简单方法是使用双绘图,即从输入列表(永远不会改变)和一个空列表开始,您可以在其中添加随机选择的项目。每次正常绘制后(从第一个列表),您从第二个列表中绘制一个项目。如果相同的项目(或者更确切地说项目值)再次出现,则不会选择它(即无效抽奖,重新开始),否则抽奖计数并且相应的值被添加到第二个列表中。

一般概念似乎与遍历源有关。

DigitalJoe 评论了这种方法的明显缺点。简而言之,乔指出,防止先前绘制的项目值(在算法的第二个列表中找到的值)重复(不一定是连续重复,只是“重复”)的概率在前几百个绘图中波动很大。另一个隐含的评论是,在第二个列表包含几十个值之后,防止这种重复的概率极低。这些都是有效的观点,并且需要资格。

这个推理符合我们对第二个列表工作方式的直观理解:其中的项目值越多,我们“双重抽奖”的机会就越少,从而防止重复。这是事实,但它只关注第二个列表。需要考虑[从第一个列表]提取先前看到的项目的概率。我们可以用两个例子来直观地理解这个想法,但这当然是一个梯度:

  • 情况“A”:与绘制其他项目值的概率相比,绘制给定项目值的概率相对较低。因此,在前几次抽奖期间多次抽奖此项目的组合概率很低,并且这样做然后由于列表二中的抽奖而丢弃它的概率甚至更低(即使后一个概率可能是高,因为第二个列表中的项目数量较少)。
  • 情况“B”:抽到给定项目的概率相对于抽到其他项目的概率较高。在这种情况下,在前几次抽奖中出现重复的概率很高,并且由于防止实际重复(第二个列表中的抽奖)的概率也很高(第二次抽奖的确定性,50%的机会)在第二张图上……在第 11 幅图上有 10% 的机会),“惩罚”高概率物品的总体概率很高。然而,这不是问题,因为从第一个列表中绘制项目的相对概率将确保从统计上看,有许多其他机会产生重复项,而这些重复项不会随着绘图数量的增加而被“强制”抑制。

在这两种情况下,重要的是实际绘图的总体分布与输入列表中的分布相匹配。当绘图数量变得更加具有统计意义时尤其如此。

关于重复过滤器可能太弱的问题,随着第二个列表越来越多地反映第一个列表的分布,这也变得不太相关。也许感受这一切的一种方法是考虑实现OP问题中描述的目标的替代方法。

替代算法
算法 1:从第一个列表中无替换地绘制。我们将使用原始列表的副本来开始,并且每次绘制给定项目时,我们都会将其从该列表中删除,因此总体上不太可能再次出现相同的值。当我们绘制所有项目时,我们已经准确地再现了原始列表的分布。
算法 2:无替换地绘制,从原始列表已被复制给定次数的列表中提取。这与上面类似,但引入了更多的随机性,即需要更多数量的图纸来使图纸分布方法与原始列表相匹配。

在某种程度上,我最初建议的双列表算法(从原始列表中进行替换绘制并管理第二个列表以有时防止重复)类似于 Algo 2,这些算法使绘图分布向原始列表的分布收敛。然而,原始算法的优点是它使列表的管理更容易(尽管公平地说,这样做的一个简单方法是将绘制的项目替换为“空”值,然后重新绘制,然后点击这样的“空单元格”,这实际上是相同的事情,相反,当第二个列表中的绘图产生相同的项目值时重新绘制..)

It all depends how the probablility of being selected should vary when a given item gets selected.

A simple way to achieve this is with double drawring, whereby you start with the input list (which never changes) and an empty list to which you add the items selected at random. After each normal draw, (from first list), you draw one item out of the second list. If the same item (or rather item value) comes up again, it is not selected (i.e. invalid draw, start anew), otherwise the draw counts and the corresponding value is added to second list.

The general concept seems related to ergodic sources.

DigitalJoe commented on an apparent drawback to this approach. In a nutshell, Joe noted that the probability of preventing the repeating (not necessarily consecutive repeats, just "repeating") of a previously drawn item value (a value found in the second list of the algorithm) fluctuates greatly in the first few hundred drawings. Another implicit commentary was that after the second list contains a few dozen values, the probability of preventing such duplication is extremely low. These are both valid points, and require qualification.

This reasoning matches our intuitive understanding of the way the second list works: the more items values are in there, the least chances we have to "double draw", hence to prevent a repeat. This is true but it only focuses on the second list. The probability of drawing [from the first list] an item previously seen needs to be taking into account. We can use two examples to intuit this idea, but this is of course a gradient:

  • Case "A" :the probability of drawing a given item value is be relatively low compared with the probability of drawing something else. The combined probability of drawing this item multiple times during the first few drawings is therefore low, and the probability of doing so and then of discarding it due to the drawing(s) in list two is even lower (even though the latter probability may be high, due to the small number of item in the second list).
  • Case "B": the probability of drawing a given item is high relatively to the probability of drawing other items. In that case, the probability of having repeats in the first few draws is high, and since the probability of preventing an actual repeat (with the drawing in second list) is high as well (a certainty for the 2nd drawing, a 50% chance on second drawing... a 10% chance on the 11th drawing), the overall probability of "punishing" a highly probable item is high. This is however not a problem, because, the relative probability to drawing the item from the first list will ensure that there will be, statistically, many other opportunities to produce duplicates that will not be so "forcefully" suppressed as the number of drawings progresses.

In both cases, what matters is that the overall distribution of the actual drawings matches that of the distribution in the input list. This is particularly true as the number of drawings becomes more statistically significant.

On the question of the possibility of having too weak of a repeat filter, that too gets to be less relevant as the second list reflects more and more the distribution of the first list. Maybe a way to get a feel for all this is to consider alternative ways of achieving the goals described in the OP's question.

Alternative algorithms:
Algo 1: Drawing without replacement from the first list. That we'd use a copy of the original list to start, and each time a given item is drawn, we'd remove it from that list, hence making it overall less probably that the same value would show up again. By the time we 'd drawn all of the items, we'd reproduced exactly the distribution of the original list.
Algo 2:Drawing without replacement, from a list where the original list has been replicated a given number of times. This is similar to the above, but introduces a bit more randomness, i.e. by requiring a bigger number of drawings to have the drawings distrubution approach and match that of the original list.

In a way, the two list algorithm I suggested originally (Draw with replacement from original list and manage a second list to sometimes prevent repeats) is similar to Algo 2, in the way these makes the drawing distribution converge towards that of the original list. The advantage of the original algorithm however is that it makes the management of the lists easier (although in fairness a simple way to do so is to replace drawn items with a "null" value of sorts, and to re-draw then hitting such an "empty cell", which effectively is the same thing, in reverse, as to redrawing when the drawing in the second list produces the same item value..)

温柔少女心 2024-08-15 10:06:36

您可以执行一些操作,例如创建一个自定义类(针对您的列表),用于存储项目和权重。

添加项目时,默认权重为 1,并存储(在“列表”中)所有项目的所有权重的总和。

然后,当您随机选择一个项目时,只需选择 0 -> 之间的数字即可。列表中所有项目的总权重,然后遍历列表以查找该“位置”的项目(通过权重)。减少该项目的权重(这可能是一些衰减,即:将其权重乘以 0.8 或 0.5 - 您可以很好地控制选择衰减的概率有多快),然后返回它。

这里的缺点是,如果你有很多项目,你的选择是 O(n) (因为你必须遍历列表)。因此,我可能会使用链表进行存储(无论如何,您都必须步行进行检索,因此这可以让您更快地插入/删除)。

不过,如果您没有存储大量选项,那么这将非常容易实现,并且可以让您对概率进行大量控制,并在选择时降低概率。

You could do something such as make a custom class (for your list), that stores the Item plus a weighting.

Default the weighting to 1 when you add an item, and store (in the "list") the total of all of the weightings of all items.

You could then, when you select an item at random, just pick a number between 0 -> total weightings of all items in the list, and walk through the list to find the item at that "position" (by weighting). Decrement the weighting of that item (this can be some falloff, ie: multiply it's weighting by 0.8, or 0.5 - you'd have a lot of control over how fast it's probability of select falls off), and return it.

The downside here, if you have lots of items, is that your selection is O(n) (since you have to walk through the list). Because of this, I'd probably use a linked list for the storage (you're going to have to walk for retrieval anyways, so this gives you faster insertion/removal).

If you're not storing a huge number of options, though, this would be very easy to implement, and give you a lot of control over probabilities plus probability reduction at selection time.

苏大泽ㄣ 2024-08-15 10:06:36

使用自插入或上次选择项目以来经过的时间作为优先级修饰符...设置每个项目优先级 = 自插入或上次选择以来的时间量,然后通过乘以该优先级来调整其被选择的机会。

一旦获得了所有项目的机会,请将它们标准化(按相同的计算比率调整所有项目,使它们的总和达到 1.000),然后使用这些值作为被选择的概率。

Use the elapsed time since the item was inserted or last selected as the priority mofifier... Set each items priority = amount of time since it has been inserted or last selected, and then adjust it's chance to be selected by multiplying by this priority.

Once you have all the items chances, normalize them, (adjsut them all by the same calculated ratio to get them all to add up to 1.000), then use those values as their probability of being selected.

小猫一只 2024-08-15 10:06:36

从列表中选择加权随机元素的一般策略是:给每个项目一个权重。归一化,使总权重为 1。(因此,首先,每个项目的权重为 1/n)。按重量对清单进行排序。现在选择一个 0 到 1 之间的随机数,然后沿着列表向下累积总数,直到超过你的数字。例如,如果您的权重是 [0.4, 0.3, 0.2, 0.1] 并且您的随机数是 0.63215,那么您的前两个步骤总计 = 0.4,总计 = 0.7,然后注意 0.7 大于 0.63215,因此您返回第二个元素。

一旦你选择了一个元素,你就向下调整它的权重(你需要尝试使用权重降低的公式,直到找到一个适合你的公式,最简单的就是每次将它乘以某个常数分数),然后重新归一化并重复。

请注意,如果您有很多元素,则效率相当低,因为列表的长度是 O(n),但在实践中这并不重要,除非您在一个需要大量优化的内循环或类似的。如果事实证明这是一个问题,您可以研究范围树等几何数据结构来优化查找。

the general strategy for picking a weighted random element from a list is this: give each item a weight. normalise, so that the total weight is 1. (so to start with, every item has weight 1/n). sort your list by weight. now pick a random number between 0 and 1, and go down the list accumulating totals till you cross your number. e.g. if your weights are [0.4, 0.3, 0.2, 0.1] and your random number is 0.63215, your first two steps have total = 0.4, total = 0.7 and then notice that 0.7 is greater than 0.63215 so you return the second element.

once you pick an element, you adjust its weighting downwards (you'll need to experiment with downweighting formulas till you find one that works for you, the simplest is just to multiply it by some constant fraction each time), and then renormalise and repeat.

note that this is fairly inefficient if you have a lot of elements since it is O(n) in the length of the list, but it shouldn't matter in practice unless you're doing this in an inner loop that needs to be heavily optimised, or similar. if it turns out to be an issue you could look into geometric data structures like range trees to optimise the lookup.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文