是否有一种比使用WAILE循环更有效地创建具有重复文本的文件的更有效的方法? (100万行+)

发布于 2025-01-19 15:59:27 字数 328 浏览 0 评论 0原文

我需要创建一个仅包含点符号“.”的文本文件。在每一行上重复,直到达到变量中存储的特定行数。我现在使用 while 循环,但那些带点的文件需要大约 0.5-5 百万行。因此,它需要的时间比我想要的要长一些。下面是我当前的代码:

j=0
while [[ $j != $length ]] 
do
  echo "." >> $file
  ((j++))
done

所以我的问题是:除了使用 while 循环之外,是否有更有效的方法来创建一个包含 x 行且每行包含相同字符(或字符串)重复的文件?

谢谢,

I need to create a text file that includes just the dot symbol "." on every line, repeatedly, until a specific number of lines stored in a variable, is reached. I'm now using a while loop, but those files with dots need to be around 0.5-5 million lines. Therefore, it takes a bit longer than I would like it to. Below is my current code:

j=0
while [[ $j != $length ]] 
do
  echo "." >> $file
  ((j++))
done

So my question is: Is there a more efficient way of creating a file with x number of lines that each contain the same character (or string) repeating, other than using a while loop?

Thanks,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

情域 2025-01-26 15:59:27

您可以使用 yes 和 head:

yes . | head -n "$length" > "$file"

这比重复打开和关闭文件一次写入两个字节要快得多。

You can use yes and head:

yes . | head -n "$length" > "$file"

This should be dramatically much faster than repeatedly opening and closing the file to write two bytes at a time.

╰◇生如夏花灿烂 2025-01-26 15:59:27

使用dd写入输出文件(少于2秒钟)

time yes . | dd of=dotbig.txt count=1024 bs=1048576 iflag=fullblock
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.76116 s, 610 MB/s

real    0m1.814s
user    0m0.076s
sys     0m0.686s

的行内容计数

wc -l dotbig.txt
536870912 dotbig.txt

样本:

head -n 3 dotbig.txt ; tail -n 3 dotbig.txt
.
.
.
.
.
.

Using dd to write to the output file (took less than 2 secs)

time yes . | dd of=dotbig.txt count=1024 bs=1048576 iflag=fullblock
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.76116 s, 610 MB/s

real    0m1.814s
user    0m0.076s
sys     0m0.686s

Count of lines

wc -l dotbig.txt
536870912 dotbig.txt

Contents sample:

head -n 3 dotbig.txt ; tail -n 3 dotbig.txt
.
.
.
.
.
.
活雷疯 2025-01-26 15:59:27

此代码中资源最密集的部分是重定向(echo '.' > $file)。为了解决这个问题,您需要“构建”一个字符串并仅重定向到 $file 一次,而不是 $length 次。

j=0
while [[ $j != $length ]]
do
    builder=${builder}.
done
echo "$builder" > $file

然而,您仍然处于一个循环中,这可能不是资源的最佳利用。为了解决这个问题,让我们从这个答案中获得灵感:

printf '.\n%.0s' $(seq $length) > $file

请注意,这里我们使用$(seq $length)< /code> 而不是 {1..$length} 因为 bash 不会将 {1..$length} 扩展为 0 1 2 3 4 5 6 7 8 9 10 如果长度为 10(请参阅此问题)

The most resource intensive piece of this code is the redirection (echo '.' > $file). To get around this you will want to "build" a string and redirect to $file only once rather then $length times.

j=0
while [[ $j != $length ]]
do
    builder=${builder}.
done
echo "$builder" > $file

However you are still in a loop which probably isn't the best use of resources. To get around this lets take inspiration from this answer:

printf '.\n%.0s' $(seq $length) > $file

Note that here we use $(seq $length) rather than {1..$length} since bash does not expand {1..$length} to 0 1 2 3 4 5 6 7 8 9 10 if length is 10 (see this question)

自演自醉 2025-01-26 15:59:27

如果仅仅是由于为每个命令启动一个新的操作系统进程(每次通过循环)的开销,bash 中的重复操作(例如,通过循环)总是会很慢。环形。在这种情况下,每次循环时打开和关闭输出文件都会产生额外的开销。

您想要寻找一种解决方案来限制需要创建/关闭的操作系统进程的数量(在这种情况下限制打开/关闭输出文件的次数)。根据您想要使用的软件/工具/二进制文件,将会有很多选项。

一个 awk 想法:

awk -v len="${length}" 'BEGIN {for (i=1;i<=len;i++) print "."}' > newfile

虽然这确实在 awk 中使用了“循环”,但我们只在 bash 级别查看单个操作系统进程,并且我们只打开/关闭输出文件一次。

Repetive actions in bash (eg, via a loop) are always going to be slow if simply due to the overhead of spinning up a new OS process (for each pass through the loop) for each command within the loop. In this case there's going to be an additional overhead for opening and closing the output file on each pass through the loop.

You want to look for a solution that limits the number of OS processes that need to be created/closed (and in this case limit the number of times you open/close the output file). There are going to be a lot of options depending on what software/tool/binary you want to use.

One awk idea:

awk -v len="${length}" 'BEGIN {for (i=1;i<=len;i++) print "."}' > newfile

While this does use a 'loop' within awk, we're only looking at a single OS process at the bash level, and we're only opening/closing the output file once.

︶葆Ⅱㄣ 2025-01-26 15:59:27

这应该每次都会使文件大小加倍。也许它比其他一些解决方案更有效,也许不是。文件“b”的大小将继续加倍,直到加倍使其超过长度大小。当长度是 2 的幂时,我认为这会非常有效。

let n=2
let length=1000000
echo '.' > a
cat a a > b
rm a
while [[ $((n*2)) -le $length ]]; do
  mv b a
  cat a a > b
  rm a 
  let n=n*2
done
# do something here to fill out the remaining length-n lines

This should double the size of the file each time. Maybe it's more efficient than some of the other solutions, maybe not. File "b" will keep doubling in size until a doubling would take it over the size of length. When length is a power of 2, I think this would be pretty efficient.

let n=2
let length=1000000
echo '.' > a
cat a a > b
rm a
while [[ $((n*2)) -le $length ]]; do
  mv b a
  cat a a > b
  rm a 
  let n=n*2
done
# do something here to fill out the remaining length-n lines
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文