在 Bash 中循环遍历文件的内容
如何使用 Bash 迭代文本文件的每一行?
使用此脚本:
echo "Start!"
for p in (peptides.txt)
do
echo "${p}"
done
我在屏幕上得到以下输出:(
Start!
./runPep.sh: line 3: syntax error near unexpected token `('
./runPep.sh: line 3: `for p in (peptides.txt)'
后来我想使用 $p
做一些更复杂的事情,而不仅仅是输出到屏幕。)
环境变量 SHELL是(来自环境):
SHELL=/bin/bash
/bin/bash --version
输出:
GNU bash, version 3.1.17(1)-release (x86_64-suse-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.
cat /proc/version
输出:
Linux version 2.6.18.2-34-default (geeko@buildhost) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Mon Nov 27 11:46:27 UTC 2006
文件peptides.txt 包含:
RKEKNVQ
IPKKLLQK
QYFHQLEKMNVK
IPKKLLQK
GDLSTALEVAIDCYEK
QYFHQLEKMNVKIPENIYR
RKEKNVQ
VLAKHGKLQDAIN
ILGFMK
LEDVALQILL
How do I iterate through each line of a text file with Bash?
With this script:
echo "Start!"
for p in (peptides.txt)
do
echo "${p}"
done
I get this output on the screen:
Start!
./runPep.sh: line 3: syntax error near unexpected token `('
./runPep.sh: line 3: `for p in (peptides.txt)'
(Later I want to do something more complicated with $p
than just output to the screen.)
The environment variable SHELL is (from env):
SHELL=/bin/bash
/bin/bash --version
output:
GNU bash, version 3.1.17(1)-release (x86_64-suse-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.
cat /proc/version
output:
Linux version 2.6.18.2-34-default (geeko@buildhost) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Mon Nov 27 11:46:27 UTC 2006
The file peptides.txt contains:
RKEKNVQ
IPKKLLQK
QYFHQLEKMNVK
IPKKLLQK
GDLSTALEVAIDCYEK
QYFHQLEKMNVKIPENIYR
RKEKNVQ
VLAKHGKLQDAIN
ILGFMK
LEDVALQILL
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(16)
一种方法是:
正如评论中所指出的,这具有修剪前导空格、解释反斜杠序列以及跳过最后一行(如果缺少终止换行符)的副作用。如果存在这些问题,您可以执行以下操作:
特殊情况下,如果 循环body 可以从标准输入读取,您可以使用不同的文件描述符打开该文件:
这里,10 只是一个任意数字(与 0、1、2 不同)。
One way to do it is:
As pointed out in the comments, this has the side effects of trimming leading whitespace, interpreting backslash sequences, and skipping the last line if it's missing a terminating linefeed. If these are concerns, you can do:
Exceptionally, if the loop body may read from standard input, you can open the file using a different file descriptor:
Here, 10 is just an arbitrary number (different from 0, 1, 2).
和单行变体:
如果没有尾随换行符,这些选项将跳过文件的最后一行。
您可以通过以下方式避免这种情况:
and the one-liner variant:
These options will skip the last line of the file if there is no trailing line feed.
You can avoid this by the following:
选项 1a: While 循环:一次一行:输入重定向
选项 1b: While 循环:一次一行:
打开文件,从文件描述符(在本例中为文件描述符#4)读取。
注意:选项 2 已删除。忽略涉及选项 2 的评论,因为它已作为反模式正确删除(如果您确实需要知道,请参阅编辑)
Option 1a: While loop: Single line at a time: Input redirection
Option 1b: While loop: Single line at a time:
Open the file, read from a file descriptor (in this case file descriptor #4).
Note: Option 2 was removed. Ignore comments referring to Option 2 as it was rightly removed as an anti-pattern (see edits if you really need to know)
这并不比其他答案更好,但这是在没有空格的文件中完成工作的另一种方法(请参阅评论)。我发现我经常需要单行代码来挖掘文本文件中的列表,而无需使用单独的脚本文件的额外步骤。
这种格式允许我将所有内容都放在一个命令行中。将“echo $word”部分更改为您想要的任何内容,您可以发出用分号分隔的多个命令。以下示例使用该文件的内容作为您可能编写的其他两个脚本的参数。
或者,如果您打算像流编辑器一样使用它(学习 sed),您可以将输出转储到另一个文件,如下所示。
我使用了上面写的这些,因为我使用了文本文件,其中我用每行一个单词创建了它们。 (参见评论)如果您不想分割单词/行中的空格,它会变得有点难看,但相同的命令仍然可以按如下方式工作:
这只是告诉 shell 仅在换行符上分割,而不是空格,然后将环境恢复到以前的状态。不过,此时,您可能需要考虑将其全部放入 shell 脚本中,而不是将其全部压缩到一行中。
祝你好运!
This is no better than other answers, but is one more way to get the job done in a file without spaces (see comments). I find that I often need one-liners to dig through lists in text files without the extra step of using separate script files.
This format allows me to put it all in one command-line. Change the "echo $word" portion to whatever you want and you can issue multiple commands separated by semicolons. The following example uses the file's contents as arguments into two other scripts you may have written.
Or if you intend to use this like a stream editor (learn sed) you can dump the output to another file as follows.
I've used these as written above because I have used text files where I've created them with one word per line. (See comments) If you have spaces that you don't want splitting your words/lines, it gets a little uglier, but the same command still works as follows:
This just tells the shell to split on newlines only, not spaces, then returns the environment back to what it was previously. At this point, you may want to consider putting it all into a shell script rather than squeezing it all into a single line, though.
Best of luck!
其他答案中未涵盖的其他内容:
从分隔文件中读取
使用进程替换从另一个命令的输出中读取
这种方法比
command ... |同时读取 -r 行; do ...
因为这里的 while 循环在当前 shell 中运行,而不是像后者那样在子 shell 中运行。请参阅相关帖子 在 while 循环内修改的变量是不记得了。从空分隔输入读取,例如
find ... -print0
相关阅读: BashFAQ/020 - 如何查找并安全地处理包含换行符、空格或两者兼而有之的文件名?
一次读取多个文件
基于 @chepner 的回答这里:
-u
是一个bash扩大。对于 POSIX 兼容性,每个调用看起来类似于read -r X <&3
。将整个文件读入数组(Bash 版本 4 之前的版本)
如果文件以不完整的行结尾(末尾缺少换行符),则:
将整个文件读入数组(Bash 版本 4x 及更高版本)
或
然后
阅读
和readarray
命令 - GNU有关
IFS
的更多信息 - 维基百科相关文章:
A few more things not covered by other answers:
Reading from a delimited file
Reading from the output of another command, using process substitution
This approach is better than
command ... | while read -r line; do ...
because the while loop here runs in the current shell rather than a subshell as in the case of the latter. See the related post A variable modified inside a while loop is not remembered.Reading from a null delimited input, for example
find ... -print0
Related read: BashFAQ/020 - How can I find and safely handle file names containing newlines, spaces or both?
Reading from more than one file at a time
Based on @chepner's answer here:
-u
is a bash extension. For POSIX compatibility, each call would look something likeread -r X <&3
.Reading a whole file into an array (Bash versions earlier to 4)
If the file ends with an incomplete line (newline missing at the end), then:
Reading a whole file into an array (Bash versions 4x and later)
or
And then
More about the shell builtins
read
andreadarray
commands - GNUMore about
IFS
- WikipediaRelated posts:
使用 while 循环,如下所示:
注意:
如果没有正确设置
IFS
,则会丢失缩进。您几乎应该始终使用 -r 选项进行读取。
不要读取带有
for
的行Use a while loop, like this:
Notes:
If you don't set the
IFS
properly, you will lose indentation.You should almost always use the -r option with read.
Don't read lines with
for
如果您不希望读取被换行符打断,请使用 -
然后以文件名作为参数运行脚本。
If you don't want your read to be broken by newline character, use -
Then run the script with file name as parameter.
假设您有这个文件:
有四个元素将改变许多 Bash 解决方案读取的文件输出的含义:
如果您希望文本文件逐行包含空白行和不带 CR 的终止行,则必须使用 while 循环,并且必须对最后一行进行替代测试。
以下是可能更改文件的方法(与
cat
返回的内容相比):1) 丢失最后一行以及前导和尾随空格:(
如果您执行
while IFS= read -rp ; do printf "%s\n" "'$p'" did 相反,您保留前导和尾随空格,但如果不以以下方式终止,仍然会丢失最后一行CR)
2) 使用
cat
进行进程替换将会一口气读取整个文件并失去各个行的含义:(如果您从
$ 中删除
您逐字读取文件,而不是一口气读取文件。也可能不是预期的内容...)"
(cat /tmp/test.txt)逐行读取文件的最强大、最简单的方法。并保留所有空格的方法是:
如果要删除前导空格和交易空格,请删除
IFS=
部分:(没有终止
\n
的文本文件,虽然相当常见,在 POSIX 下被认为是损坏的,如果您可以依靠尾随的\n
,则不需要|| [[ -n $line ]]
在while
循环中。)更多信息请参见 BASH 常见问题解答
Suppose you have this file:
There are four elements that will alter the meaning of the file output read by many Bash solutions:
If you want the text file line by line including blank lines and terminating lines without CR, you must use a while loop and you must have an alternate test for the final line.
Here are the methods that may change the file (in comparison to what
cat
returns):1) Lose the last line and leading and trailing spaces:
(If you do
while IFS= read -r p; do printf "%s\n" "'$p'"; done </tmp/test.txt
instead, you preserve the leading and trailing spaces but still lose the last line if it is not terminated with CR)2) Using process substitution with
cat
will reads the entire file in one gulp and loses the meaning of individual lines:(If you remove the
"
from$(cat /tmp/test.txt)
you read the file word by word rather than one gulp. Also probably not what is intended...)The most robust and simplest way to read a file line-by-line and preserve all spacing is:
If you want to strip leading and trading spaces, remove the
IFS=
part:(A text file without a terminating
\n
, while fairly common, is considered broken under POSIX. If you can count on the trailing\n
you do not need|| [[ -n $line ]]
in thewhile
loop.)More at the BASH FAQ
这可能是最简单的答案,也许它不适用于所有情况,但它对我来说非常有用:
如果您需要用括号括起空格:
啊,这与获得最多支持的答案几乎相同,但这一切都在一条线上。
This might be the simplest answer and maybe it don't work in all cases, but it is working great for me:
if you need to enclose in parenthesis for spaces:
Ahhh this is pretty much the same as the answer that got upvoted most, but its all on one line.
我喜欢使用 xargs 而不是 while。
xargs
功能强大且命令行友好cateptides.txt | xargs -I % sh -c "echo %"
使用
xargs
,您还可以使用-t
添加详细信息,并使用-p
添加验证代码>I like to use
xargs
instead ofwhile
.xargs
is powerful and command line friendlycat peptides.txt | xargs -I % sh -c "echo %"
With
xargs
, you can also add verbosity with-t
and validation with-p
这是我现实生活中的示例,如何循环另一个程序输出的行,检查子字符串,从变量中删除双引号,在循环之外使用该变量。我想很多人迟早都会问这些问题。
在循环外部声明变量,设置值并在循环外部使用它需要完成<<<< “$(...)”语法。应用程序需要在当前控制台的上下文中运行。命令周围的引号保留输出流的换行符。
子字符串的循环匹配,然后读取 name=value 对,分割最后一个 = 字符的右侧部分,删除第一个引号,删除最后一个引号,我们有一个干净的值用于其他地方。
Here is my real life example how to loop lines of another program output, check for substrings, drop double quotes from variable, use that variable outside of the loop. I guess quite many is asking these questions sooner or later.
Declare variable outside of the loop, set value and use it outside of loop requires done <<< "$(...)" syntax. Application need to be run within a context of current console. Quotes around the command keeps newlines of output stream.
Loop match for substrings then reads name=value pair, splits right-side part of last = character, drops first quote, drops last quote, we have a clean value to be used elsewhere.
这来得相当晚了,但考虑到它可能对某人有帮助,我添加了答案。而且这可能不是最好的方法。
head
命令可以与-n
参数一起使用,从文件开头读取 n 行,同样tail
命令可用于从底部读取。现在,要从文件中获取第 n 行,我们将 n 行作为头部,将数据通过管道传输到管道数据的尾部仅 1 行。This is coming rather very late, but with the thought that it may help someone, i am adding the answer. Also this may not be the best way.
head
command can be used with-n
argument to read n lines from start of file and likewisetail
command can be used to read from bottom. Now, to fetch nth line from file, we head n lines, pipe the data to tail only 1 line from the piped data.@Peter:这可能适合你-
这将返回输出-
@Peter: This could work out for you-
This would return the output-
使用 xargs
echo 的另一种方法可以用其他命令替换或进一步通过管道传输。
Another way to go about using xargs
echo can be replaced with other commands or piped further.
对于“cateptes.txt”中的 p
做
回显“${p}”
完毕
for p in `cat peptides.txt`
do
echo "${p}"
done