验证 Knuth 洗牌算法尽可能无偏

发布于 2024-08-10 10:11:26 字数 1317 浏览 6 评论 0原文

我正在为 C++ 项目实现 Knuth shuffle正在努力。我试图从我的洗牌中获得最公正的结果(而且我不是(伪)随机数生成方面的专家)。我只是想确保这是最公正的洗牌实现。

draw_t 是字节类型(typedef 转换为 unsigned char)。 items 是列表中的项目数。我在下面添加了 random::get(draw_t max) 的代码。

for( draw_t pull_index = (items - 1); pull_index > 1; pull_index-- )
{
    draw_t push_index = random::get( pull_index );

    draw_t push_item = this->_list[push_index];
    draw_t pull_item = this->_list[pull_index];

    this->_list[push_index] = pull_item;
    this->_list[pull_index] = push_item;
}

我使用的随机函数已经过修改,以消除模偏差RAND_MAX 分配给 random::_internal_max

draw_t random::get( draw_t max )
{
    if( random::_is_seeded == false )
    {
        random::seed( );
    }

    int rand_value = random::_internal_max;
    int max_rand_value = random::_internal_max - ( max - ( random::_internal_max % max ) );

    do
    {
        rand_value = ::rand( );
    } while( rand_value >= max_rand_value );

    return static_cast< draw_t >( rand_value % max );
}

I'm implementing a Knuth shuffle for a C++ project I'm working on. I'm trying to get the most unbiased results from my shuffle (and I'm not an expert on (pseudo)random number generation). I just want to make sure this is the most unbiased shuffle implementation.

draw_t is a byte type (typedef'd to unsigned char). items is the count of items in the list. I've included the code for random::get( draw_t max ) below.

for( draw_t pull_index = (items - 1); pull_index > 1; pull_index-- )
{
    draw_t push_index = random::get( pull_index );

    draw_t push_item = this->_list[push_index];
    draw_t pull_item = this->_list[pull_index];

    this->_list[push_index] = pull_item;
    this->_list[pull_index] = push_item;
}

The random function I'm using has been modified to eliminate modulo bias. RAND_MAX is assigned to random::_internal_max.

draw_t random::get( draw_t max )
{
    if( random::_is_seeded == false )
    {
        random::seed( );
    }

    int rand_value = random::_internal_max;
    int max_rand_value = random::_internal_max - ( max - ( random::_internal_max % max ) );

    do
    {
        rand_value = ::rand( );
    } while( rand_value >= max_rand_value );

    return static_cast< draw_t >( rand_value % max );
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

仅一夜美梦 2024-08-17 10:11:26

好吧,作为黑盒测试,您可以做的一件事是采用一些相对较小的数组大小,对其执行大量洗牌,计算观察每个排列的次数,然后执行 Pearson 卡方 检验以确定结果是否均匀分布在排列空间上。

另一方面,只要索引所来自的随机数生成器是无偏的,Knuth 洗牌(又称 Fisher-Yates 洗牌)就被证明是无偏的。

Well, one thing you could do as a black-box test is take some relatively small array size, perform a large number of shuffles on it, count how many times you observe each permutation, and then perform Pearson's Chi-square test to determine whether the results are uniformly distributed over the permutation space.

On the other hand, the Knuth shuffle, AKA the Fisher-Yates shuffle, is proven to be unbiased as long as the random number generator that the indices are coming from is unbiased.

2024-08-17 10:11:26

如果我没看错的话,您的 random::get (max) 不包括 max

此行:

draw_t push_index = random::get( pull_index );

然后会产生一个“经典”的相差一错误,因为您的 pull_indexpush_index 永远不可能相同。这会产生一个微妙的偏差,即您永远无法将项目放在洗牌之前的位置。在一个极端的例子中,这种“洗牌”下的两项列表总是会颠倒过来。

If I see that right, your random::get (max) doesn't include max.

This line:

draw_t push_index = random::get( pull_index );

then produces a "classical" off-by-one error, as your pull_index and push_index erroneously can never be the same. This produces a subtle bias that you can never have an item where it was before the shuffle. In an extreme example, two-item lists under this "shuffle" would always be reversed.

空城旧梦 2024-08-17 10:11:26

看看 Jeff Atwood 的这篇文章:

洗牌
http://www.codinghorror.com/blog/archives/001008.html

另请参阅:

天真的危险
http://www.codinghorror.com/blog/archives/001015.html

Have a look at this article from Jeff Atwood:

Shuffling
http://www.codinghorror.com/blog/archives/001008.html

See also:

The Danger of Naïveté
http://www.codinghorror.com/blog/archives/001015.html

莫言歌 2024-08-17 10:11:26

Knuth 洗牌本身被证明是无偏的:恰好存在一系列操作来产生每种可能的洗牌。然而,您的 PRNG 不太可能有足够的状态位来表达每种可能的洗牌,因此真正的问题是您的 PRNG 就其实际产生的洗牌集而言是否“足够随机”,以及您的播种策略是否足够安全。

只有您可以决定这一点,因为这取决于不够随机的洗牌的后果。例如,如果您处理的是真钱,我建议改用加密安全的 PRNG 并改进您的播种策略。尽管大多数内置 PRNG 都会产生良好的随机性,但它们也很容易进行逆向工程,并且调用不带参数的 seed() 可能会根据当前时间进行播种,这很容易预测。

The Knuth shuffle itself is provably unbiased: There exists exactly one series of operations that yields each possible shuffle. It's unlikely your PRNG has enough bits of state to express every possible shuffle, however, so the real question is if your PRNG is 'random enough' with regards to the set of shuffles it will actually produce, and whether your seeding strategy is secure enough.

Only you can decide this, as it depends on the consequences of a shuffle that isn't random enough. If you're dealing with real money, for example, I would suggest switching to a cryptographically secure PRNG and improving your seeding strategy. Although most built in PRNGs generate good randomness, they're also quite easy to reverse engineer, and calling seed() with no arguments is likely seeding based on the current time, which is pretty easy to predict.

别念他 2024-08-17 10:11:26
#include <cstdlib> // srand() && rand()

/** Shufle the first 'dim' values in array 'V[]'.
    - Implements the Fisher–Yates_shuffle.
    - Uses the standard function 'rand()' for randomness.
    - Initialices the random sequence using 'seed'.
    - Uses 'dim' swaps.
    \see http://stackoverflow.com/questions/1685339/
    \see http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm
*/
template <class T>
void Fisher_Yates_shuffle( T* V, unsigned dim , unsigned seed ) {
    srand(seed);
    T temp;
    unsigned i,iPP;

    i   = dim-1;
    iPP = dim;
    while ( i>0 ) {
        unsigned j = rand() % iPP;
        if ( i!=j ) { // swap
            temp = V[i]; V[i] = V[j]; V[j] = temp;
        }
        iPP = i;
        --i;
    }
/*
    This implementation depends on the randomness of the random number
    generator used ['rand()' in this case].
*/
}
#include <cstdlib> // srand() && rand()

/** Shufle the first 'dim' values in array 'V[]'.
    - Implements the Fisher–Yates_shuffle.
    - Uses the standard function 'rand()' for randomness.
    - Initialices the random sequence using 'seed'.
    - Uses 'dim' swaps.
    \see http://stackoverflow.com/questions/1685339/
    \see http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm
*/
template <class T>
void Fisher_Yates_shuffle( T* V, unsigned dim , unsigned seed ) {
    srand(seed);
    T temp;
    unsigned i,iPP;

    i   = dim-1;
    iPP = dim;
    while ( i>0 ) {
        unsigned j = rand() % iPP;
        if ( i!=j ) { // swap
            temp = V[i]; V[i] = V[j]; V[j] = temp;
        }
        iPP = i;
        --i;
    }
/*
    This implementation depends on the randomness of the random number
    generator used ['rand()' in this case].
*/
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文