费希尔耶茨变体

发布于 2024-12-27 03:33:28 字数 496 浏览 1 评论 0原文

经典的 Fisher Yates 看起来像这样：

void shuffle1(std::vector<int>& vec)
{
    int n = vec.size();
    for (int i = n - 1; i > 0; --i)
    {
        std::swap(vec[i], vec[rand() % (i + 1)]);
    }
}

昨天，我错误地“向后”实现了迭代：

void shuffle2(std::vector<int>& vec)
{
    int n = vec.size();
    for (int i = 1; i < n; ++i)
    {
        std::swap(vec[i], vec[rand() % (i + 1)]);
    }
}

这个版本是否比第一个版本更差（或更好）？它会扭曲结果概率吗？

原文

The classic Fisher Yates looks something like this:

void shuffle1(std::vector<int>& vec)
{
    int n = vec.size();
    for (int i = n - 1; i > 0; --i)
    {
        std::swap(vec[i], vec[rand() % (i + 1)]);
    }
}

Yesterday, I implemented the iteration "backwards" by mistake:

void shuffle2(std::vector<int>& vec)
{
    int n = vec.size();
    for (int i = 1; i < n; ++i)
    {
        std::swap(vec[i], vec[rand() % (i + 1)]);
    }
}

Is this version in any way worse (or better) than the first? Does it skew the resulting probabilities?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尛丟丟 2025-01-03 03:33:28

是的，假设 rand() 是均匀分布。我们将通过证明每个输入可以以相等的概率生成每个排列来证明这一点。

N=2很容易证明。
我们将其绘制为一棵树，其中子级代表通过将逗号后的字符插入最左边的字符串可以获得的每个字符串。

  0,1   //input where 0,1 represent indices
01  10  //output. Represents permutations of 01. It is clear that each one has equal probability

对于 N，我们将拥有 N-1 的所有排列，并随机交换 N 的最后一个字符。

    (N-1 0th permutation),N     .....          (N-1 Ith permutation),N ________________________  
      /              \                       /                   \                             \ 
0th permutation of N  1st permutation....   (I*N)th permutation   ((I*N)+1)th permutation .... (I*N)+(I-1)th permutation

这个糟糕的归纳应该会导致它具有均匀分布。

示例：

N=2：

  0,1
01  10 // these are the permutations. Each one has equal probability

N=3：

           0,1|2           // the | is used to separate characters that we will insert later
    01,2           10,2    // 01, 10 are permutations from N-1, 2 is the new value
 210 021 012   201 120 102 // these are the permutations, still equal probability

N=4：（弯曲以帮助阅读）

                                                           0,1|23

                                                       01,2|3  10,2|3

                                           012,3 021,3 210,3    102,3 120,3 201,3

0123 0132 0321 3230                                                                                  2013 2031 2310 3012
                    0213 0231 0312 3210                                          1203 1230 1302 3201
                                        2103 2130 2301 3102  1023 1032 1320 3021

等

Yes it's even distribution assuming rand() is. We will prove this by showing that each input can generate each permutation with equal probability.

N=2 can be easily proven.
We will draw it as a tree where the the children represent each string you can get by inserting the character after comma into the leftmost string.

  0,1   //input where 0,1 represent indices
01  10  //output. Represents permutations of 01. It is clear that each one has equal probability

For N, we will have every permutations for N-1, and randomly swapping the last character for N

    (N-1 0th permutation),N     .....          (N-1 Ith permutation),N ________________________  
      /              \                       /                   \                             \ 
0th permutation of N  1st permutation....   (I*N)th permutation   ((I*N)+1)th permutation .... (I*N)+(I-1)th permutation

This shitty induction should lead you to it having even distribution.

Example:

N=2:

  0,1
01  10 // these are the permutations. Each one has equal probability

N=3:

           0,1|2           // the | is used to separate characters that we will insert later
    01,2           10,2    // 01, 10 are permutations from N-1, 2 is the new value
 210 021 012   201 120 102 // these are the permutations, still equal probability

N=4: (curved to aid reading)

                                                           0,1|23

                                                       01,2|3  10,2|3

                                           012,3 021,3 210,3    102,3 120,3 201,3

0123 0132 0321 3230                                                                                  2013 2031 2310 3012
                    0213 0231 0312 3210                                          1203 1230 1302 3201
                                        2103 2130 2301 3102  1023 1032 1320 3021

etc

回复收藏 0 原文

故事未完 2025-01-03 03:33:28

对我来说看起来不错（假设 rand() % N 是无偏的，但事实并非如此）。似乎应该可以证明输入的每个排列都是由 1 个随机选择序列产生的，其中每个随机选择都是平衡的。

将此与错误的实现进行比较，例如

for (int i = 0; i < v.size(); ++i) {
  swap(v[i], v[rand() % v.size()]);
}

Here 你可以看到有 nⁿ 同等可能的方式来产生 n！排列，并且由于 nⁿ 不能被 n 整除！其中n> 2、其中一些排列必须比其他排列更频繁地产生。

Looks OK to me (assuming rand() % N is unbiased, which it isn't). It seems it should be possible to demonstrate that every permutation of input is produced by exactly 1 sequence of random choices, where each random choice is balanced.

Compare this with a faulty implementation, such as

for (int i = 0; i < v.size(); ++i) {
  swap(v[i], v[rand() % v.size()]);
}

Here you can see that there are nⁿ equally likely ways to produce n! permutations, and since nⁿ is not evenly divisible by n! where n > 2, some of those permutations have to be produced more often than others.

回复收藏 0 原文

~没有更多了~