有效计算随机排列中的第 n 项
想象一下,我能够使用类似 Knuth shuffle 和使用密钥作为种子的种子随机数生成器来对 0 到 2^32 之间的所有数字进行洗牌。
从概念上讲,我需要两个数组(使用 Z5 而不是 Z232 为简洁起见):
[2, 0, 1, 4, 3] // perm
[1, 2, 0, 4, 3] // inv === p^-1
如果我有这些数组,我可以有效地查找排列中的第 n 个元素,并找出排列值 v 中的元素;
v = perm[n];
n == inv[v]; // true
我不想存储两个 16 GB 的 uint 数组来表示这个打乱的集合,因为我在任何时候都不对整个打乱的序列感兴趣。我只对第 n 个元素的值感兴趣。
理想情况下,我想编写两个像这样工作的纯函数:
uint nthShuffled = permutate<uint>(key, n); // O(log n)
uint n == invert<uint>(key, nthShuffled); // O(log n)
要求:
- 每个 32 位值映射到唯一的不同 32 位值。
- 排列中前 100 个元素的知识无法提供有关排列中第 101 个元素可能是什么的信息。
我明白理论上至少要有232!唯一的键来表示任何可能的排列,但我相信我可以在实践中隐藏这个问题在一个好的散列函数后面。
有没有什么东西接近这个?
Imagine I was able to shuffle all numbers between 0 and 2^32 using something like the Knuth shuffle and a seeded random number generator seeded with a key.
Conceptually, I would need two arrays (using Z5 instead of Z232 for brevity):
[2, 0, 1, 4, 3] // perm
[1, 2, 0, 4, 3] // inv === p^-1
If I had these arrays, I could efficiently look up the nth element in the permutation as well as find out with element in the purmutation value v;
v = perm[n];
n == inv[v]; // true
I don't want to store two 16 GB arrays of uint representing this shuffled set because I am never interested in the entire shuffled sequence at any time. I am only ever interested in the value of the nth element.
I ideally want to write two pure functions that work like this:
uint nthShuffled = permutate<uint>(key, n); // O(log n)
uint n == invert<uint>(key, nthShuffled); // O(log n)
Requirements:
- Every 32 bit value maps to a unique different 32 bit value.
- Knowldedge of the first 100 elements in the permutation provides no information on what might be the 101st element in the permutation.
I understand that in theory there must be at least 232! unique keys in order to represent any possible permutation, but I believe I can hide that problem in practice behind a good hashing function.
Is there anything out there that comes close to this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
任何分组密码实际上都是伪随机排列。 32 位分组密码对
0
和2 ^ 32 - 1
之间的整数进行排列。给定一个密钥,用该密钥加密
N
会得到第N
个伪随机数。唯一的问题是找到一个好的 32 位分组密码。我唯一知道的是 SKIP32,但我对它的强度一无所知。
SKIP32 的密钥大小为 80 位。如果这是一个好的密码,那就足够了。
但同样,我不知道密码。
如果您可以选择将范围增加到
2 ^ 64 - 1
整数,您可以简单地使用众所周知的 64 位分组密码,例如 Triple-DES 或 Blowfish 。Any block cipher is actually a pseudo-random permutation. A 32-bit block cipher permutates the integers between
0
and2 ^ 32 - 1
.Given a secret key, encrypting
N
with this key gives theN-th
pseudo-random number.The only problem would be finding a good 32-bit block cipher. The only one I know is SKIP32, but I do not know anything about its strength.
SKIP32's key size is 80 bits. If it is a good cipher, that would be enough.
But again, I do not know the cipher.
If increasing the range to
2 ^ 64 - 1
integers is an option for you, you could simpply use a well-known 64-bit block cipher like Triple-DES or Blowfish.”
排列中前 100 个元素的知识无法提供有关排列中第 101 个元素可能是什么的信息。
“
您需要将整个数组存储在内存中。我建议使用 stxxl,它是为大数据类型设计的,通过将容器的大部分存储在磁盘上。
根据随机排列的本质,您无法根据给定的 [n] 推断出 [n-1] 或 [n+1] 的值。所以看起来空间无法优化。
"
Knowldedge of the first 100 elements in the permutation provides no information on what might be the 101st element in the permutation.
"
You need to store the whole array in memory. I suggest using stxxl, which is designed for large data types by storing the bulk of the container on disk.
By the very nature of random permutation, you can't extrapolate the value of [n-1] or [n+1] given [n]. So it doesn't look like space can be optimized.
从密码学的角度来看,您需要具有 32 位块的块密码。
任意(通常是小)域上的加密(又名“密钥排列”)问题是Format-保护加密 是关于。
对于该特定问题,有一个通用“完美”解决方案 ——但是计算涉及通过超几何分布进行采样,这意味着大量的浮点和任意精度数字的处理,这是昂贵的。
还存在“近似”解决方案,严格来说,排列不是在所有可能的排列中统一选择的,但差异可以任意小,以至于不可能区分实现的排列和实际的排列。随机选择的排列。特别参见Thorp shuffle。
没有标准且安全的 32 位分组密码,因为 32 位不足以来确保常用分组密码的情况下的安全性(长数据流的加密,例如作为 SSL 的一部分); 64 位块已经不受欢迎了。所以你在这里有点孤军奋战。
From a cryptographic point of view, you want a block cipher with 32-bit blocks.
The problem of encryption (aka "keyed permutation") over arbitrary (and often small) domains is what Format-Preserving Encryption is about.
There is a generic "perfect" solution for that specific problem -- but the computation involves sampling through an hypergeometric distribution, which implies a lot of mucking with floating point and arbitrary precision numbers, which is expensive.
There are also "approximate" solutions in which the permutation is not, strictly speaking, uniformly chosen among all possible permutations, but the difference can be made arbitrarily small, to the point that it is not possible to distinguish between the implemented permutation and a really randomly chosen permutation. See in particular the Thorp shuffle.
There is no standard and secure 32-bit block cipher because 32 bits are not enough to ensure security in situations where block ciphers are commonly used (encryption of long streams of data, e.g. as part of SSL); 64-bit blocks are already frowned upon. So you are a bit on your own here.
散列法无法解决随机数序列问题。
存储 2^32 位。那是 0.5 GB。
运行 Fischer-Yates 洗牌并在进行过程中“划掉”一些部分。如果您想知道第 5 个元素的内容,那么您将划掉 4,第 5 个随机值将是您的数字。
要获得第 n 个排列,您需要回溯。运行算法 n 次并得到如下数字:
通过最后一次迭代,您知道第 8 个索引领先成为第 5 个索引。
编辑:我编写了一个快速程序来测试速度。每次排列需要几分钟。它很慢,但仍然可用。
Hashing isn't going so solve random number sequences.
Store 2^32 bits. That's .5 GB.
Run the Fischer-Yates shuffle and "cross off" bits as you go along. If you want to know the content of the 5th element then you'll cross out 4 and the 5th random value will be your number.
To get the nth permutation then you need to backtrack. Run the algorithm n times and get numbers like:
By the last iteration, you know that the 8th index leads becomes the 5th index.
EDIT: I wrote a quick program to test the speed. It's taking a few minutes per permutation. It's slow, but still usable.