寻找最少使用的排列
我需要根据历史数据随时间均匀分布一组数据,以便每个数字随着时间的推移在每个位置出现相等(或接近相等)的次数。问题是,给定过去使用的排序列表,看起来像这样(但可以有任意数量的元素):
1,2,5,3,4
4,1,5,2,3
1,3,5,2,4
4,1,2,3,5
2,4,1,3,5
5,1,4,3,2
1,5,3,2,4
5,1,3,2,4
3,2,5,4,1
4,3,1,5,2
我如何找到最少使用的值的排序,并将导致“更平衡” ” 一组订单。明显的答案是我可以对它们进行分组和计数并选择最少使用的一个,但问题是最少使用的排列可能从未使用过,例如这里的排序“1,2,3,4,5”是最少使用的候选者,因为它根本不出现。
简单的答案似乎是确定“1”出现频率最低的位置,并将该位置设置为“1”,依此类推。我怀疑这是可行的,但我觉得有一个更优雅的解决方案,我没有考虑过交叉连接,以便包含所有可能的组合。
有什么想法吗?
I need to distribute a set of data evenly over time based on historical data such that each digit appears an equal (or close to equal) number of times in each position over time. The problem is, given a list of orderings used in the past, that look like this (but could have any number of elements):
1,2,5,3,4
4,1,5,2,3
1,3,5,2,4
4,1,2,3,5
2,4,1,3,5
5,1,4,3,2
1,5,3,2,4
5,1,3,2,4
3,2,5,4,1
4,3,1,5,2
how can I find an ordering of the values that is the least used and will lead to a "more balanced" set of orderings. The obvious answer is I could group by and count them and pick the least used one, but the problem is the least used permutation may not have ever been used, for example here, the ordering "1,2,3,4,5" is a candidate for least used because it doesn't appear at all.
The simple answer seems to be to identify which position "1" appears in the least frequent and set that position to "1" and so on for each digit. I suspect that works, but I feel like there's a more elegant solution that I haven't considered potentially with cross joins so that all possible combinations are included.
any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这里有一个直方图平整问题。
从这个角度考虑问题:您有一组 N 个直方图,表示离散范围 {1..N} 上 N 个值的出现频率。您想要做的是将一组新值添加到数据总体中,使所有直方图更接近水平。鉴于问题的性质,我们知道每个值总体上与其他值出现的次数相同。
一种方法是找出哪些值 N 在任何位置中出现的频率最低 - 并将其分配给该位置。接下来,在剩余的直方图中,找到任意位置中出现频率最低的下一个值,并将该值分配给该位置。继续重复此过程,直到所有值都被分配了唯一的位置。这将为您提供下一组值。现在,您可以迭代地重复此操作以继续生成新的值集,这些值集将尝试在每次迭代中重新平衡值的分布。
如果您在分配值时维护直方图,这将成为一种相对有效的操作(您不必不断地重新扫描数据集)。
但请记住,对于任何足够小的值群体,您总是会在某种程度上“失去平衡”。没有办法解决这个问题。
What you have here is a histogram leveling problem.
Consider the problem from this perspective: you have a set of N histograms that represent the frequency of occurrence of the value N values over a discrete range {1..N}. What you want to do is to add a new set of values to your population of data that shifts the all histograms closer to being level. Given the nature of your problem, we know that each value will, overall, appear the same number of times as every other value.
One way to do so, is to find which values N has the lowest frequency of occurence in any position - and assign it that position. Next, in the remaining histograms, find the next value with the lowest frequency of occurence in any position, and assign that value to that position. Continue repeating this process until all values have been assigned a unique position. This gives you your next set of values. You can now iteratively repeat this operation to continue generating new value sets that will attempt to re-balance the distribution of values with each iteration.
If you maintain the histograms as you distribute values, this becomes a relatively efficient operation (you don't have to constantly re-scan the data set).
Keep in mind, however, that for any sufficiently small population of values, you will always be "out of balance" to some degree. There's no way around this.
我认为您有办法生成随机排列(例如 在 C# 中随机“排序”(随机排列)整数列表的最有效方法)。鉴于此,生成单个新排序的一个建议如下:
1)生成两个随机排列
2)保留其中最能平衡不平衡的一个。
平衡的一种衡量方法是将每个位置的所有数字频率计数的列表视为一个向量,在完美平衡的情况下,该向量的每个元素都是相同的。那么,不平衡度就是减去完美向量得到的向量的长度。通过在两个随机排列之间进行选择,您将从分布中选择一个排列,其平均向量指向与当前不平衡相反的方向,因此您应该倾向于纠正它,同时仍然产生随机的排列序列。
I presume that you have a way to generate a random permutation (e.g. Most efficient way to randomly "sort" (Shuffle) a list of integers in C#). Given that, one suggestion to generate a single new ordering is as follows:
1) Generate two random permuations
2) Keep whichever one of them would even out the imbalance the most.
One measure of balance would be to think of the list of all of the counts of digit frequencies at each position as a vector, which, in the case of perfect balance, would have each element the same. The imbalance would then be the length of the vector you get by subtracting off that perfect vector. By choosing between two random permutations you will pick a permutation from a distribution whose mean vector points in a direction opposite to the current imbalance, so you should tend to correct it while still producing a random-ish sequence of permutations.
如果组合总数足够小,我很久以前就用过一种解决类似问题的方法:
维护定期补充的选择池。
在您的示例中,您有 120 种可能的排列。创建一个包含 120 个元素的数组,为每个元素分配一个初始值(例如 5)。当您需要从该池中选取一个随机值时,箱中的数字就是赋予该箱的权重。 (一开始,箱的总和为 600。从 1 到 600 中随机选择一个,从中减去箱,直到 <= 0。刚刚减去的箱就是结果。)当选择一个条目时,将该箱减一。一旦您从该堆中抽取了 120 次,就向每个容器中添加 1 次。
显然,如果可能性总数太高,这将变得不切实际。
If the total number of combinations is small enough there's an approach I used on a similar problem long ago:
Maintain a pool of choices that is periodically replenished.
In your example you have 120 possible permutations. Make an array of 120 elements, assign each an initial value of say 5. When you need a random value you pick from this pool, the number in the bin being the weight given to that bin. (At the start the bins sum to 600. Pick a random from 1 to 600, subtract bins from it until <= 0. The bin you just subtracted is your result.) When an entry is picked decrement that bin by one. Once you've made 120 draws from the pile add 1 to every bin.
Obviously this becomes impractical if the total number of possibilities is too high.