用固定种子打乱文件行?

发布于 2024-11-05 20:24:00 字数 157 浏览 3 评论 0原文

我想用固定种子对文件的行进行洗牌,以便我始终获得相同的随机顺序。我正在使用的命令如下:

sort -R file.txt | head -200 > file.sff

我可以对其进行什么更改,以便它使用固定的随机种子进行排序?

I want to shuffle the lines of a file with a fixed seed so that I always get the same random order. The command I am using is as follows:

sort -R file.txt | head -200 > file.sff

What change could I make it so that it sorts with a fixed random seed?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

爱人如己 2024-11-12 20:24:00

sort 的 GNU 实现有一个 --random-source 参数。将此参数与具有已知内容的文件名一起传递将产生一组可靠的输出。

请参阅随机源文档GNU coreutils 手册,其中包含以下示例实现和示例:

get_seed_random()
{
  种子=“$1”
  openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt \
    /dev/null
}

shuf -i1-100 --random-source=<(get_seed_random 42)

由于 GNU sort 也是 coreutils 的一部分,因此相关文档也适用于此:

sort --random-source=<(get_seeded_random 42) -R file.txt | head -200 > file.sff

The GNU implementation of sort has a --random-source argument. Passing this argument with the name of a file with known contents will result in a reliable set of output.

See the Random sources documentation in the GNU coreutils manual, which contains the following sample implementation and example:

get_seeded_random()
{
  seed="$1"
  openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt \
    </dev/zero 2>/dev/null
}

shuf -i1-100 --random-source=<(get_seeded_random 42)

Since GNU sort is also part of coreutils, the relevant documentation applies there as well:

sort --random-source=<(get_seeded_random 42) -R file.txt | head -200 > file.sff
时光暖心i 2024-11-12 20:24:00

Linux 的 shuf 命令可以将一个文件作为固定源使用参数 --random-source 进行随机性:

shuf --random-source=some_file.txt file.txt | head -n200 > file.sff

如果您不想提供完整的文件,您可以随时通过管道传输:

shuf --random-source=<(yes 42) file.txt | head -n200 > file.sff

Linux's shuf command can take a file as a fixed source of randomness using the parameter --random-source:

shuf --random-source=some_file.txt file.txt | head -n200 > file.sff

If you don't want to bother with giving a full file, you can pipe one on the go:

shuf --random-source=<(yes 42) file.txt | head -n200 > file.sff
浅浅 2024-11-12 20:24:00

您可能不需要使用 sort 等外部工具,其选项和用法可能会因您的操作系统而异。 Bash 有一个内部随机数生成器,可通过 $RANDOM 变量访问。通常的做法是通过设置变量来为生成器提供种子,如下所示:

RANDOM=$

RANDOM=$(date '+%s')

等等。但是,当然,您也可以使用可预测的种子以获得可预测的非随机结果:

$ RANDOM=12345; echo $RANDOM
28207
$ RANDOM=12345; echo $RANDOM
28207

随机重新排序映射文件的行,您可以使用mapfile将文件读入数组:

$ mapfile -t a < source.txt

然后只需重写数组索引:

$ for i in ${!a[@]}; do a[$((RANDOM+${#a[@]}))]="${a[$i]}"; unset a[$i]; done

当读取非关联数组时,bash自然地按索引值的升序对元素进行排序。

请注意,每行的 new 索引都添加了数组元素的数量,以避免在该范围内发生冲突。这个解决方案仍然容易出错——不能保证 $RANDOM 会产生唯一的数字。您可以使用额外的代码来检查每个索引的先前使用情况来降低该风险,或者通过位移位来降低风险:

... a[$(( (RANDOM<<15)+RANDOM+${#a[@]} ))]= ...

这使您的索引值变成 30 位无符号整数而不是 15 位无符号整数。

You may not need to use external tools like sort, whose options and usage may vary depending on your operating system. Bash has an internal random number generator accessible through the $RANDOM variable. It's common practice to seed the generator by setting the variable, like so:

RANDOM=$

or

RANDOM=$(date '+%s')

etc. But of course, you can also use a predictable seed in order to get predictable not-so-random results:

$ RANDOM=12345; echo $RANDOM
28207
$ RANDOM=12345; echo $RANDOM
28207

To reorder the lines of the mapped file randomly, you can read the file into an array using mapfile:

$ mapfile -t a < source.txt

Then simply rewrite the array indices:

$ for i in ${!a[@]}; do a[$((RANDOM+${#a[@]}))]="${a[$i]}"; unset a[$i]; done

When reading a non-associative array, bash naturally orders elements in ascending order of index value.

Note that the new index for each line has the number of array elements added to it to avoid collisions within that range. This solution is still fallible -- there's no guarantee that $RANDOM will produce unique numbers. You can mitigate that risk with extra code that checks for prior use of each index, or reduce the risk with bit-shifting:

... a[$(( (RANDOM<<15)+RANDOM+${#a[@]} ))]= ...

This makes your index values into a 30-bit unsigned int instead of a 15 bit unsigned int.

你好,陌生人 2024-11-12 20:24:00

如果你随机打乱行,那么你就没有进行排序。我之前没有见过带有 --random-source 提示的 sort 。如果它确实存在那就很有趣了。但是,这并不是按固定顺序对行进行排序。

我相信您必须为此编写一个程序,但我认为 Bash 无法完全做到这一点。

事实上,可能是这样。 $RANDOM 环境变量选择 0 到 32767 之间的随机数。您可以为 RANDOM 分配一个种子,随机数序列将反复出现。您可以使用发牌算法。将每一行读入 Bash 数组,然后使用发牌算法来选取每一行。

我不打算编写测试程序——尤其是在 Bash 中,但你应该明白这个想法。

If you're randomly shuffling lines, you're not sorting. I haven't seen a sort with --random-source prompt before. It'd be interesting if it does exist. However, that's not sorting the lines in a fixed order.

I believe you'll have to write a program to that, and I don't think Bash can quite do it.

Actually, it might. The $RANDOM environment variable selects a random number from 0 to 32767. You can assign a seed to RANDOM and the random number sequence will appear over and over. You can use a card dealing algorithm. Read in each line into a Bash array, then use the card dealing algorithm to pick each line.

I'm not going to write a test program -- especially in Bash, but you should get the idea.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文