是否有一种比使用WAILE循环更有效地创建具有重复文本的文件的更有效的方法? (100万行+)
我需要创建一个仅包含点符号“.”的文本文件。在每一行上重复,直到达到变量中存储的特定行数。我现在使用 while 循环,但那些带点的文件需要大约 0.5-5 百万行。因此,它需要的时间比我想要的要长一些。下面是我当前的代码:
j=0
while [[ $j != $length ]]
do
echo "." >> $file
((j++))
done
所以我的问题是:除了使用 while 循环之外,是否有更有效的方法来创建一个包含 x 行且每行包含相同字符(或字符串)重复的文件?
谢谢,
I need to create a text file that includes just the dot symbol "." on every line, repeatedly, until a specific number of lines stored in a variable, is reached. I'm now using a while loop, but those files with dots need to be around 0.5-5 million lines. Therefore, it takes a bit longer than I would like it to. Below is my current code:
j=0
while [[ $j != $length ]]
do
echo "." >> $file
((j++))
done
So my question is: Is there a more efficient way of creating a file with x number of lines that each contain the same character (or string) repeating, other than using a while loop?
Thanks,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以使用 yes 和 head:
这比重复打开和关闭文件一次写入两个字节要快得多。
You can use
yes
andhead
:This should be dramatically much faster than repeatedly opening and closing the file to write two bytes at a time.
使用
dd
写入输出文件(少于2秒钟)的行内容计数
样本:
Using
dd
to write to the output file (took less than 2 secs)Count of lines
Contents sample:
此代码中资源最密集的部分是重定向(
echo '.' > $file
)。为了解决这个问题,您需要“构建”一个字符串并仅重定向到$file
一次,而不是$length
次。然而,您仍然处于一个循环中,这可能不是资源的最佳利用。为了解决这个问题,让我们从这个答案中获得灵感:
请注意,这里我们使用
$(seq $length)< /code> 而不是
{1..$length}
因为 bash 不会将{1..$length}
扩展为0 1 2 3 4 5 6 7 8 9 10
如果长度为 10(请参阅此问题)The most resource intensive piece of this code is the redirection (
echo '.' > $file
). To get around this you will want to "build" a string and redirect to$file
only once rather then$length
times.However you are still in a loop which probably isn't the best use of resources. To get around this lets take inspiration from this answer:
Note that here we use
$(seq $length)
rather than{1..$length}
since bash does not expand{1..$length}
to0 1 2 3 4 5 6 7 8 9 10
if length is 10 (see this question)如果仅仅是由于为每个命令启动一个新的操作系统进程(每次通过循环)的开销,bash 中的重复操作(例如,通过循环)总是会很慢。环形。在这种情况下,每次循环时打开和关闭输出文件都会产生额外的开销。
您想要寻找一种解决方案来限制需要创建/关闭的操作系统进程的数量(在这种情况下限制打开/关闭输出文件的次数)。根据您想要使用的软件/工具/二进制文件,将会有很多选项。
一个
awk
想法:虽然这确实在
awk
中使用了“循环”,但我们只在bash
级别查看单个操作系统进程,并且我们只打开/关闭输出文件一次。Repetive actions in
bash
(eg, via a loop) are always going to be slow if simply due to the overhead of spinning up a new OS process (for each pass through the loop) for each command within the loop. In this case there's going to be an additional overhead for opening and closing the output file on each pass through the loop.You want to look for a solution that limits the number of OS processes that need to be created/closed (and in this case limit the number of times you open/close the output file). There are going to be a lot of options depending on what software/tool/binary you want to use.
One
awk
idea:While this does use a 'loop' within
awk
, we're only looking at a single OS process at thebash
level, and we're only opening/closing the output file once.这应该每次都会使文件大小加倍。也许它比其他一些解决方案更有效,也许不是。文件“b”的大小将继续加倍,直到加倍使其超过长度大小。当长度是 2 的幂时,我认为这会非常有效。
This should double the size of the file each time. Maybe it's more efficient than some of the other solutions, maybe not. File "b" will keep doubling in size until a doubling would take it over the size of length. When length is a power of 2, I think this would be pretty efficient.