在 BASH shell 中使用 awk 生成随机数

发布于 2024-09-29 11:23:45 字数 466 浏览 3 评论 0原文

我希望随机打乱文件的行(行),然后打印到不同的五个文件。

但我在 file1 到 file5 中出现的行顺序始终完全相同。随机生成过程无法正常工作。如果有任何建议,我将不胜感激。

#!/bin/bash
for i in seq 1 5
do
  awk 'BEGIN{srand();}  {print rand()"\t"$0}' shuffling.txt  | sort -k2 -k1 -n | cut -f2-  > file$i.txt
done

输入shuffle.txt

111 1032192
111 2323476
111 1698881
111 2451712
111 2013780
111  888105
112 2331004
112 1886376
112 1189765
112 1877267
112 1772972
112  574631

I wish to shuffle the lines (the rows) of a file at random then print out to different five files.

But I keep having exactly the same order of lines appeared in file1 to file5. The random generation process does not work properly. I would be grateful for any advices.

#!/bin/bash
for i in seq 1 5
do
  awk 'BEGIN{srand();}  {print rand()"\t"$0}' shuffling.txt  | sort -k2 -k1 -n | cut -f2-  > file$i.txt
done

Input shuffling.txt

111 1032192
111 2323476
111 1698881
111 2451712
111 2013780
111  888105
112 2331004
112 1886376
112 1189765
112 1877267
112 1772972
112  574631

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

记忆消瘦 2024-10-06 11:23:45

如果您不向 srand 提供种子,它将使用当前日期和时间或固定的起始种子(这可能因实现而异)。这意味着,对于前者,如果您的进程运行得足够快,它们都将使用相同的种子并生成相同的序列。

而且,对于后者,无论您等待多久,每次运行都会得到相同的序列。

您可以通过使用 shell 提供的不同种子来解决这些问题。

awk -v seed=$RANDOM 'BEGIN{srand(seed);}{print rand()" "$0}' ...

$RANDOM 提供的数字在每次迭代中都会发生变化,因此每次运行 awk 程序都会获得不同的种子。

您可以在以下文字记录中看到这一点:

pax> for i in $(seq 1 5) ; do
...> awk 'BEGIN{srand();print rand()}'
...> done
0.0435039
0.0435039
0.0435039
0.0435039
0.0435039

pax> for i in $(seq 1 5) ; do
...> awk -v seed=$RANDOM 'BEGIN{srand(seed);print rand()}'
...> done
0.283898
0.0895895
0.841535
0.249817
0.398753

If you don't provide a seed to srand, it will either use the current date and time or a fixed starting seed (this may vary with the implementation). That means, for the former, if your processes run fast enough, they'll all use the same seed and generate the same sequence.

And, for the latter, it won't matter how long you wait, you'll get the same sequence each time you run.

You can get around either of these by using a different seed, provided by the shell.

awk -v seed=$RANDOM 'BEGIN{srand(seed);}{print rand()" "$0}' ...

The number provided by $RANDOM changes in each iteration so each run of the awk program gets a different seed.

You can see this in action in the following transcript:

pax> for i in $(seq 1 5) ; do
...> awk 'BEGIN{srand();print rand()}'
...> done
0.0435039
0.0435039
0.0435039
0.0435039
0.0435039

pax> for i in $(seq 1 5) ; do
...> awk -v seed=$RANDOM 'BEGIN{srand(seed);print rand()}'
...> done
0.283898
0.0895895
0.841535
0.249817
0.398753
晨与橙与城 2024-10-06 11:23:45

awk 的伪随机不是很随机,您需要不断播种,在大多数情况下您应该能够使用微秒,否则您可能需要查看 Bash ${RANDOM} 或点击 /dev/urandom 直接:

awk 'BEGIN{"date +%N"|getline rseed;srand(rseed);close("date +%N");print rand()}'

for((i=1;i<=5;i++));do awk 'BEGIN{"date +%N"|getline rseed;srand(rseed);close("date +%N");print rand()}';done

Awk's pseudo-random is not very random, you need to keep seeding, you should be able to use microseconds for most situations, otherwise you may want to look into Bash ${RANDOM} or hitting /dev/urandom direct:

awk 'BEGIN{"date +%N"|getline rseed;srand(rseed);close("date +%N");print rand()}'

for((i=1;i<=5;i++));do awk 'BEGIN{"date +%N"|getline rseed;srand(rseed);close("date +%N");print rand()}';done
緦唸λ蓇 2024-10-06 11:23:45
#!/bin/bash
for i in {1..5}
do
    shuf -o "file$i.txt" shuffling.txt
done
#!/bin/bash
for i in {1..5}
do
    shuf -o "file$i.txt" shuffling.txt
done
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文