用固定种子打乱文件行?
我想用固定种子对文件的行进行洗牌,以便我始终获得相同的随机顺序。我正在使用的命令如下:
sort -R file.txt | head -200 > file.sff
我可以对其进行什么更改,以便它使用固定的随机种子进行排序?
I want to shuffle the lines of a file with a fixed seed so that I always get the same random order. The command I am using is as follows:
sort -R file.txt | head -200 > file.sff
What change could I make it so that it sorts with a fixed random seed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
sort
的 GNU 实现有一个--random-source
参数。将此参数与具有已知内容的文件名一起传递将产生一组可靠的输出。请参阅随机源文档GNU coreutils 手册,其中包含以下示例实现和示例:
由于 GNU
sort
也是 coreutils 的一部分,因此相关文档也适用于此:The GNU implementation of
sort
has a--random-source
argument. Passing this argument with the name of a file with known contents will result in a reliable set of output.See the Random sources documentation in the GNU coreutils manual, which contains the following sample implementation and example:
Since GNU
sort
is also part of coreutils, the relevant documentation applies there as well:Linux 的
shuf
命令可以将一个文件作为固定源使用参数--random-source
进行随机性:如果您不想提供完整的文件,您可以随时通过管道传输:
Linux's
shuf
command can take a file as a fixed source of randomness using the parameter--random-source
:If you don't want to bother with giving a full file, you can pipe one on the go:
您可能不需要使用
sort
等外部工具,其选项和用法可能会因您的操作系统而异。 Bash 有一个内部随机数生成器,可通过 $RANDOM 变量访问。通常的做法是通过设置变量来为生成器提供种子,如下所示:或
等等。但是,当然,您也可以使用可预测的种子以获得可预测的非随机结果:
随机重新排序映射文件的行,您可以使用mapfile将文件读入数组:
然后只需重写数组索引:
当读取非关联数组时,bash自然地按索引值的升序对元素进行排序。
请注意,每行的 new 索引都添加了数组元素的数量,以避免在该范围内发生冲突。这个解决方案仍然容易出错——不能保证
$RANDOM
会产生唯一的数字。您可以使用额外的代码来检查每个索引的先前使用情况来降低该风险,或者通过位移位来降低风险:这使您的索引值变成 30 位无符号整数而不是 15 位无符号整数。
You may not need to use external tools like
sort
, whose options and usage may vary depending on your operating system. Bash has an internal random number generator accessible through the$RANDOM
variable. It's common practice to seed the generator by setting the variable, like so:or
etc. But of course, you can also use a predictable seed in order to get predictable not-so-random results:
To reorder the lines of the mapped file randomly, you can read the file into an array using mapfile:
Then simply rewrite the array indices:
When reading a non-associative array, bash naturally orders elements in ascending order of index value.
Note that the new index for each line has the number of array elements added to it to avoid collisions within that range. This solution is still fallible -- there's no guarantee that
$RANDOM
will produce unique numbers. You can mitigate that risk with extra code that checks for prior use of each index, or reduce the risk with bit-shifting:This makes your index values into a 30-bit unsigned int instead of a 15 bit unsigned int.
如果你随机打乱行,那么你就没有进行排序。我之前没有见过带有
--random-source
提示的sort
。如果它确实存在那就很有趣了。但是,这并不是按固定顺序对行进行排序。我相信您必须为此编写一个程序,但我认为 Bash 无法完全做到这一点。
事实上,可能是这样。 $RANDOM 环境变量选择 0 到 32767 之间的随机数。您可以为
RANDOM
分配一个种子,随机数序列将反复出现。您可以使用发牌算法。将每一行读入 Bash 数组,然后使用发牌算法来选取每一行。我不打算编写测试程序——尤其是在 Bash 中,但你应该明白这个想法。
If you're randomly shuffling lines, you're not sorting. I haven't seen a
sort
with--random-source
prompt before. It'd be interesting if it does exist. However, that's not sorting the lines in a fixed order.I believe you'll have to write a program to that, and I don't think Bash can quite do it.
Actually, it might. The $RANDOM environment variable selects a random number from 0 to 32767. You can assign a seed to
RANDOM
and the random number sequence will appear over and over. You can use a card dealing algorithm. Read in each line into a Bash array, then use the card dealing algorithm to pick each line.I'm not going to write a test program -- especially in Bash, but you should get the idea.