从任意大样本中伪随机选择时对不同结果进行加权

发布于 2024-09-19 17:13:17 字数 430 浏览 8 评论 0原文

所以，我坐在后院思考口袋妖怪，就像我们都习惯做的那样，这让我思考：当你遇到“随机”口袋妖怪时，某些样本比其他样本出现的频率要高得多，这意味着它们的权重与看起来较小的权重不同。

现在，如果我要解决让不同的 Pokemon 以一定的概率出现的问题，我很可能会通过简单地增加某些 Pokemon 在选择池中的条目数量来实现（就像这样），

Pool:
C1 C1 C1 C1
C2 C2
C3 C3 C3 C3 C3
C4

所以 C1 有被拉动的机会有 1/3，C2 有 1/6 的机会等等，但我知道这可能是一种非常简单和幼稚的方法，并且不太可能在大量选择的情况下很好地扩展。

所以，我的问题是这样的，S/O：给定任意大的样本量，您将如何权衡一种结果大于另一种结果的机会？并且，作为后续问题，假设您希望某些选项的概率以浮点精度的比率出现，而不是整数比率？

原文

So, I was sitting in my backyard thinking about Pokemon, as we're all wont to do, and it got me thinking: When you encounter a 'random' Pokemon, some specimen appear a lot more often than others, which means that they're weighted differently than the ones that appear less.

Now, were I to approach the problem of getting the different Pokemon to appear with a certain probability, I would most likely do so by simply increasing the number of entries that certain Pokemon have in the pool of choices (like so),

Pool:
C1 C1 C1 C1
C2 C2
C3 C3 C3 C3 C3
C4

so C1 has a 1/3 chance of being pulled, C2 has a 1/6th chance, etc, but I understand that this may be a very simple and naive approach, and is unlikely to scale well with a large number of choices.

So, my question is this, S/O: Given an arbitrarily large sample size, how would you go about weighting the chance of one outcome as greater than another? And, as a follow up question, assume that you want the probability of certain options to occur in a ratio with floating-point precision as opposed to whole number ratios?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

剑心龙吟 2024-09-26 17:13:17

如果您知道每个事件发生的概率，则需要将这些概率映射到 0-100 的范围（如果您想使用实数和概率，则为 0 到 1）。

因此，在上面的示例中，有 12 个 C。 C1 为 4/12 或 ~33%，
C2 为约 17% 的 2/12，C3 为 5/12 或约 42%，C4 为 1/12 或约 8%。

请注意，这些加起来都是 100%。因此，如果我们选择 0 到 100 之间的随机数，我们可以将 C1 映射到 0-33，将 C2 映射到 33-50（比 C1 的值多 17），将 C3 映射到 50-92，将 C4 映射到 92-100。

if 语句可以做出选择：

r = rand() # between 0-100
if (r <33)
  return "C1"
elsif (r < 50)
  return "C2"
elsif (r < 92)
  return "C3"
elsif (r < 100)
  return "C4"

如果您想要比百分之一更高的精度，只需选择 1-1000 或您想要的任何范围。使用整数并缩放它们可能是比使用浮点数更好的形式，因为如果值之间的差异变大，浮点数可能会出现奇怪的行为。

如果你想像上面显示的那样进行分箱路线，你可以尝试这样的事情（在红宝石中，尽管这个想法更通用）：

a = ["C1"]*4 + ["C2"]*2 + ["C3"]*5 + ["C4"]
# ["C1", "C1", "C1", "C1", "C2", "C2", 
#  "C3", "C3", "C3", "C3", "C3", "C4"]
a[rand(a.length)] # => "C1' w/ probability 4/12

分箱会更慢，因为你需要创建数组，但更容易添加替代方案，因为你不会'不需要每次都重新计算概率。

您还可以从数组表示形式生成上述 if 代码，这样您只需在生成代码时进行一次预处理，然后从创建的代码中获得快速答案。

If you know the probability of each event happening you need to map these probabilities to the range 0-100 (or 0 to 1 if you want to use real numbers and probabilities.)

So in the example above there are 12 Cs. C1 is 4/12 or ~33%,
C2 is 2/12 of ~17%, C3 is 5/12 or ~42%, and C4 is 1/12 or ~8%.

Notice that these all add up to 100%. So if we choose a random number between 0 and 100 we can map C1 to 0-33, C2 to 33-50 (17 more than C1's value) , C3 to 50-92, and C4 to 92-100.

An if statement could make the choice:

r = rand() # between 0-100
if (r <33)
  return "C1"
elsif (r < 50)
  return "C2"
elsif (r < 92)
  return "C3"
elsif (r < 100)
  return "C4"

If you wanted more accuracy than 1 in 100 just go from 1-1000 or whatever range you want. It's probably better form to use integers and scale them rather than use floating point numbers as floating point can have odd behavior if the spread between values gets large.

If you wanted to go the binning route like you show above you could try something like so (in ruby though the idea is more general):

a = ["C1"]*4 + ["C2"]*2 + ["C3"]*5 + ["C4"]
# ["C1", "C1", "C1", "C1", "C2", "C2", 
#  "C3", "C3", "C3", "C3", "C3", "C4"]
a[rand(a.length)] # => "C1' w/ probability 4/12

Binning would be slower as you need to create the array, but easier to add alternatives as you wouldn't need to recalculate the probabilities each time.

You could also generate the above if code from the array representation so you'd just take the pre-processing hit once when the code was generated and then get a fast answer from the created code.

回复收藏 0 原文

~没有更多了~