有效计算随机排列中的第 n 项

发布于 2024-12-14 12:35:13 字数 850 浏览 3 评论 0原文

想象一下，我能够使用类似 Knuth shuffle 和使用密钥作为种子的种子随机数生成器来对 0 到 2^32 之间的所有数字进行洗牌。

从概念上讲，我需要两个数组（使用 Z₅ 而不是 Z_2³² 为简洁起见）：

[2, 0, 1, 4, 3] // perm
[1, 2, 0, 4, 3] // inv === p^-1

如果我有这些数组，我可以有效地查找排列中的第 n 个元素，并找出排列值 v 中的元素；

v = perm[n];
n == inv[v]; // true

我不想存储两个 16 GB 的 uint 数组来表示这个打乱的集合，因为我在任何时候都不对整个打乱的序列感兴趣。我只对第 n 个元素的值感兴趣。

理想情况下，我想编写两个像这样工作的纯函数：

uint nthShuffled = permutate<uint>(key, n); // O(log n)
uint n == invert<uint>(key, nthShuffled); // O(log n)

要求：

每个 32 位值映射到唯一的不同 32 位值。
排列中前 100 个元素的知识无法提供有关排列中第 101 个元素可能是什么的信息。

我明白理论上至少要有2³²！唯一的键来表示任何可能的排列，但我相信我可以在实践中隐藏这个问题在一个好的散列函数后面。

有没有什么东西接近这个？

原文

Imagine I was able to shuffle all numbers between 0 and 2^32 using something like the Knuth shuffle and a seeded random number generator seeded with a key.

Conceptually, I would need two arrays (using Z₅ instead of Z_2³² for brevity):

[2, 0, 1, 4, 3] // perm
[1, 2, 0, 4, 3] // inv === p^-1

If I had these arrays, I could efficiently look up the nth element in the permutation as well as find out with element in the purmutation value v;

v = perm[n];
n == inv[v]; // true

I don't want to store two 16 GB arrays of uint representing this shuffled set because I am never interested in the entire shuffled sequence at any time. I am only ever interested in the value of the nth element.

I ideally want to write two pure functions that work like this:

uint nthShuffled = permutate<uint>(key, n); // O(log n)
uint n == invert<uint>(key, nthShuffled); // O(log n)

Requirements:

Every 32 bit value maps to a unique different 32 bit value.
Knowldedge of the first 100 elements in the permutation provides no information on what might be the 101st element in the permutation.

I understand that in theory there must be at least 2³²! unique keys in order to represent any possible permutation, but I believe I can hide that problem in practice behind a good hashing function.

Is there anything out there that comes close to this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

記憶穿過時間隧道 2024-12-21 12:35:13

任何分组密码实际上都是伪随机排列。 32 位分组密码对 0 和 2 ^ 32 - 1 之间的整数进行排列。

给定一个密钥，用该密钥加密N会得到第N个伪随机数。

唯一的问题是找到一个好的 32 位分组密码。我唯一知道的是 SKIP32，但我对它的强度一无所知。

SKIP32 的密钥大小为 80 位。如果这是一个好的密码，那就足够了。

但同样，我不知道密码。

如果您可以选择将范围增加到 2 ^ 64 - 1 整数，您可以简单地使用众所周知的 64 位分组密码，例如 Triple-DES 或 Blowfish 。

回复收藏 0 原文

寂寞美少年 2024-12-21 12:35:13

”
排列中前 100 个元素的知识无法提供有关排列中第 101 个元素可能是什么的信息。
“

您需要将整个数组存储在内存中。我建议使用 stxxl，它是为大数据类型设计的，通过将容器的大部分存储在磁盘上。
根据随机排列的本质，您无法根据给定的 [n] 推断出 [n-1] 或 [n+1] 的值。所以看起来空间无法优化。

回复收藏 0 原文

暖伴 2024-12-21 12:35:13

从密码学的角度来看，您需要具有 32 位块的块密码。

任意（通常是小）域上的加密（又名“密钥排列”）问题是Format-保护加密是关于。

对于该特定问题，有一个通用“完美”解决方案 ——但是计算涉及通过超几何分布进行采样，这意味着大量的浮点和任意精度数字的处理，这是昂贵的。

还存在“近似”解决方案，严格来说，排列不是在所有可能的排列中统一选择的，但差异可以任意小，以至于不可能区分实现的排列和实际的排列。随机选择的排列。特别参见Thorp shuffle。

没有标准且安全的 32 位分组密码，因为 32 位不足以来确保常用分组密码的情况下的安全性（长数据流的加密，例如作为 SSL 的一部分）； 64 位块已经不受欢迎了。所以你在这里有点孤军奋战。

回复收藏 0 原文

メ斷腸人バ 2024-12-21 12:35:13

散列法无法解决随机数序列问题。

存储 2^32 位。那是 0.5 GB。

运行 Fischer-Yates 洗牌并在进行过程中“划掉”一些部分。如果您想知道第 5 个元素的内容，那么您将划掉 4，第 5 个随机值将是您的数字。

要获得第 n 个排列，您需要回溯。运行算法 n 次并得到如下数字：

Find 5th index after 4 permutations:

First iteration:
1st : skip (run through the RNG)
2nd : skip
3rd : skip
4th : 7th index to 5th index
Second iteration: (run using same seed as 1st iteration)
1st : skip
2nd : skip
3rd : 3rd index to 7th index
4th : 7th index to 5th index
Third iteration:
1st : skip
2nd : 4th index to 7th index
3rd : 3rd index to 7th index
4th : 7th index to 5th index
Fourth iteration:
1st : 8th index to 4th index
2nd : 4th index to 7th index
3rd : 3rd index to 7th index
4th : 7th index to 5th index

通过最后一次迭代，您知道第 8 个索引领先成为第 5 个索引。

编辑：我编写了一个快速程序来测试速度。每次排列需要几分钟。它很慢，但仍然可用。

Hashing isn't going so solve random number sequences.

Store 2^32 bits. That's .5 GB.

Run the Fischer-Yates shuffle and "cross off" bits as you go along. If you want to know the content of the 5th element then you'll cross out 4 and the 5th random value will be your number.

To get the nth permutation then you need to backtrack. Run the algorithm n times and get numbers like:

Find 5th index after 4 permutations:

First iteration:
1st : skip (run through the RNG)
2nd : skip
3rd : skip
4th : 7th index to 5th index
Second iteration: (run using same seed as 1st iteration)
1st : skip
2nd : skip
3rd : 3rd index to 7th index
4th : 7th index to 5th index
Third iteration:
1st : skip
2nd : 4th index to 7th index
3rd : 3rd index to 7th index
4th : 7th index to 5th index
Fourth iteration:
1st : 8th index to 4th index
2nd : 4th index to 7th index
3rd : 3rd index to 7th index
4th : 7th index to 5th index

By the last iteration, you know that the 8th index leads becomes the 5th index.

EDIT: I wrote a quick program to test the speed. It's taking a few minutes per permutation. It's slow, but still usable.

回复收藏 0 原文

~没有更多了~