将一个数字分成三个带有约束的桶

发布于 2024-12-15 09:24:44 字数 723 浏览 1 评论 0原文

是否有一个好的算法可以将随机生成的数字分成三个桶,每个桶都限制它们可以包含总数的多少。

例如,假设我随机生成的数字是 1,000,我需要将其分成桶 a、b 和 c。

These ranges are only an example. See my edit for possible ranges.
Bucket a may only be between 10% - 70% of the number (100 - 700)
Bucket b may only be between 10% - 50% of the number (100 - 500)
Bucket c may only be between 5% - 25% of the number (50 - 250)
a + b + c must equal the randomly generated number 

您希望分配的金额完全随机,因此桶 a 达到其最大值的机会与桶 c 相同,并且所有三个桶都接近其百分比平均值的机会相同。

编辑:以下内容很可能始终为真:a + b + c 的低端 < 100%,a+b+c的高端>100% 100%。这些百分比仅表示可接受的 a、b 和 c 值。如果 a 为 10%,而 b 和 c 为最大值(分别为 50% 和 25%),则必须重新分配数字,因为总数不等于 100%。这正是我试图通过找到一种方法一次性分配这些数字来避免的情况。

我想找到一种方法在一次传递的范围内随机选择这些数字。

Is there a good algorithm to split a randomly generated number into three buckets, each with constraints as to how much of the total they may contain.

For example, say my randomly generated number is 1,000 and I need to split it into buckets a, b, and c.

These ranges are only an example. See my edit for possible ranges.
Bucket a may only be between 10% - 70% of the number (100 - 700)
Bucket b may only be between 10% - 50% of the number (100 - 500)
Bucket c may only be between 5% - 25% of the number (50 - 250)
a + b + c must equal the randomly generated number 

You want the amounts assigned to be completely random so there's just as equal a chance of bucket a hitting its max as bucket c in addition to as equal a chance of all three buckets being around their percentage mean.

EDIT: The following will most likely always be true: low end of a + b + c < 100%, high end of a + b + c > 100%. These percentages are only to indicate acceptable values of a, b, and c. In a case where a is 10% while b and c are their max (50% and 25% respectively) the numbers would have to be reassigned since the total would not equal 100%. This is the exact case I'm trying to avoid by finding a way to assign these numbers in one pass.

I'd like to find a way to pick these number randomly within their range in one pass.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

九八野马 2024-12-22 09:24:45

更新:是的,你是对的,结果不是均匀分布的。

假设你的百分比值是自然数(如果这个假设是错误的,你不必进一步阅读:)在这种情况下我没有解决方案)。

让我们将事件 e 定义为 3 个值的元组(每个存储桶的百分比): e = (pa, pb, p<子>c)。接下来,创建所有可能的事件 en。这里有一个由离散数量的事件组成的元组空间。所有可能发生的事件都应该具有相同的发生可能性。

假设我们有一个函数 f(n) => en。然后,我们所要做的就是获取一个随机数 n 并一次返回 en

现在,问题仍然是创建这样一个函数 f :)

在伪代码中,一个非常慢的方法(仅用于说明):

function f(n) {
    int c = 0
    for i in [10..70] {
        for j in [10..50] {
            for k in [5..25] {
                if(i + j + k == 100) {
                    if(n == c) {
                        return (i, j, k) // found event!
                    } else {
                        c = c + 1
                    }
                }
            }
        }
    }
}

您所知道的是单遍解决方案,但问题只是被移走了。函数 f 非常慢。但你可以做得更好:我认为,如果你正确设置范围并计算偏移量而不是迭代范围,你可以更快地计算所有内容。

这够清楚了吗?


首先,您可能必须调整您的范围。存储桶 a 中的 10% 是不可能的,因为您无法满足条件 a+b+c = number

关于您的问题:(1) 在您的范围内为存储桶 a 选择一个随机数,然后 (2) 使用最小和最大百分比更新存储桶 b 的范围(您应该只是缩小范围)。然后 (3) 为存储桶 b 选取一个随机数。最后,应计算出 c 满足您的条件 (4)。

例:

    n = 1000
(1) a = 40%
(2) range b [35,50], because 40+35+25 = 100%
(3) b = 45%
(4) c = 100-40-45 = 15%

或:

    n = 1000
(1) a = 70%
(2) range b [10,25], because 70+25+5 = 100%
(3) b = 20%
(4) c = 100-70-20 = 10%

检查所有事件是否均匀分布。如果这是一个问题,您可能需要在步骤 2 中随机化范围更新。

Update: Yes, you're right, the result is not uniformly distributed.

Let's say your percent values are natural numbers (if this assumption is wrong, you don't have to read further :) In that case I don't have a solution).

Let's define an event e as a tuple of 3 values (percentage of each bucket): e = (pa, pb, pc). Next, create all possible events en. What you have here is a tuple space consisting of a discrete number of events. All of the possible events should have the same possibility to occur.

Let's say we have a function f(n) => en. Then, all we have to do is take a random number n and return en in a single pass.

Now, the problem remains to create such a function f :)

In pseudo code, a very slow method (just for illustration):

function f(n) {
    int c = 0
    for i in [10..70] {
        for j in [10..50] {
            for k in [5..25] {
                if(i + j + k == 100) {
                    if(n == c) {
                        return (i, j, k) // found event!
                    } else {
                        c = c + 1
                    }
                }
            }
        }
    }
}

What you have know is a single pass solution, but problem is only moved away. The function f is very slow. But you can do better: I think you can calculate everything a bit faster if you set your ranges correctly and calculate offsets instead of iterating through your ranges.

Is this clear enough?


First of all you probably have to adjust your ranges. 10% in bucket a is not possible, since you can't get condition a+b+c = number to hold.

Concerning your question: (1) Pick a random number for bucket a inside your range, then (2) update the range for bucket b with minimum and maximum percentage (you should only narrow the range). Then (3) pick a random number for bucket b. In the end c should be calculated that your condition holds (4).

Example:

    n = 1000
(1) a = 40%
(2) range b [35,50], because 40+35+25 = 100%
(3) b = 45%
(4) c = 100-40-45 = 15%

Or:

    n = 1000
(1) a = 70%
(2) range b [10,25], because 70+25+5 = 100%
(3) b = 20%
(4) c = 100-70-20 = 10%

It is to check whether all the events are uniformly distributed. If that should be a problem you might want to randomize the range update in step 2.

流星番茄 2024-12-22 09:24:44

该问题相当于在 N 维对象中选择一个随机点(在您的示例中 N=3),该对象由方程定义(在您的示例中):

0.1  <= x  <= 0.7
0.1  <= y  <= 0.5
0.05 <= z  <= 0.25
x + y + z   = 1 (*)

显然是因为最后一个方程 (*) 的坐标之一是多余的,即 x 和 y 的选取值决定了 z。

消除(*) 和其他方程之一,我们得到一个(N-1) 维盒子,例如,

0.1 <= x  <= 0.7
0.1 <= y  <= 0.5

不等式所切割

0.05 <= (1 - x - y) <= 0.25 (**)

它被从(*) 和z 方程导出的 。这基本上是穿过盒子的对角条纹。

为了使结果统一,我只是重复采样(N-1)维盒子,并接受满足(**)的第一个采样点。单遍解决方案最终可能会出现有偏差的分布。

The problem is equivalent to selecting a random point in an N-dimensional object (in your example N=3), the object being defined by the equations (in your example):

0.1  <= x  <= 0.7
0.1  <= y  <= 0.5
0.05 <= z  <= 0.25
x + y + z   = 1 (*)

Clearly because of the last equation (*) one of the coordinates is redundant, i.e. picking values for x and y dictates z.

Eliminating (*) and one of the other equations leaves us with an (N-1)-dimensional box, e.g.

0.1 <= x  <= 0.7
0.1 <= y  <= 0.5

that is cut by the inequality

0.05 <= (1 - x - y) <= 0.25 (**)

that derives from (*) and the equation for z. This is basically a diagonal stripe through the box.

In order for the results to be uniform, I would just repeatedly sample the (N-1)-dimensional box, and accept the first sampled point that fulfills (**). Single-pass solutions might end up having biased distributions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文