Unix - 需要剪切一个有多个空格作为分隔符的文件 - awk 或 cut?
我需要从 Unix 中的文本文件中获取记录。分隔符是多个空格。例如:
2U2133 1239
1290fsdsf 3234
由此,我需要提取
1239
3234
所有记录的分隔符将始终为 3 个空格。
我需要在 unix 脚本(.scr)中执行此操作并将输出写入另一个文件或将其用作 do-while 循环的输入。我尝试了以下操作:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then
int_1=0
else
int_2=0
fi
done < awk -F' ' '{ print $2 }' ${Directoty path}/test_file.txt
test_file.txt 是输入文件,file1.txt 是查找文件。但上述方法不起作用,并在 awk -F 附近给我语法错误,
我尝试将输出写入文件。以下在命令行中工作:
more test_file.txt | awk -F' ' '{ print $2 }' > output.txt
这是工作并将记录写入命令行中的output.txt。但相同的命令在 unix 脚本中不起作用(它是一个 .scr 文件)
请让我知道哪里出了问题以及如何解决此问题。
谢谢,
维沙赫
I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:
2U2133 1239
1290fsdsf 3234
From this, I need to extract
1239
3234
The delimiter for all records will be always 3 blanks.
I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then
int_1=0
else
int_2=0
fi
done < awk -F' ' '{ print $2 }' ${Directoty path}/test_file.txt
test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F
I tried writing the output to a file. The following worked in command line:
more test_file.txt | awk -F' ' '{ print $2 }' > output.txt
This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)
Please let me know where I am going wrong and how I can resolve this.
Thanks,
Visakh
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
将多个分隔符替换为一个分隔符的工作留给了
tr
:tr
翻译或删除字符,非常适合为以下目的准备数据:cut
才能正常工作。手册指出:
The job of replacing multiple delimiters with just one is left to
tr
:tr
translates or deletes characters, and is perfectly suited to prepare your data forcut
to work properly.The manual states:
这取决于您计算机上
cut
的版本或实现。某些版本支持一个选项,通常是-i
,这意味着“忽略空白字段”,或者等效地允许字段之间有多个分隔符。如果支持,请使用:如果不支持(而且它不是通用的,甚至可能不广泛,因为 GNU 和 MacOS X 都没有这个选项),那么使用 awk 更好、更便携。
不过,您需要将 awk 的输出通过管道传输到循环中:
唯一的遗留问题是 while 循环是否位于子 shell 中,因此不会修改您的 main shell 脚本变量,只是这些变量的自己的副本。
对于 bash,您可以使用 进程替换
:
while
在当前 shell 中循环,但会安排命令的输出,就像来自文件一样。${Directory path}
中的空白通常是不合法的——除非它是我错过的另一个 Bash 功能;您在一处也有拼写错误(Directoty
)。It depends on the version or implementation of
cut
on your machine. Some versions support an option, usually-i
, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using
awk
is better and more portable.You need to pipe the output of
awk
into your loop, though:The only residual issue is whether the
while
loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.With bash, you can use process substitution:
This leaves the
while
loop in the current shell, but arranges for the output of the command to appear as if from a file.The blank in
${Directory path}
is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty
) in one place.除了执行相同操作的其他方法之外,程序中的错误是这样的:您无法从 (
<
) 另一个程序的输出重定向。翻转你的脚本并使用像这样的管道:等等
。此外,使用“readline”作为变量名可能会或可能不会给你带来问题。
Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (
<
) the output of another program. Turn your script around and use a pipe like this:etc.
Besides, the use of "readline" as a variable name may or may not get you into problems.
在这种特殊情况下,您可以使用以下行
来获取第二列。
In this particular case, you can use the following line
to get your second columns.
在 bash 中你可以从这样开始:
In bash you can start from something like this:
这应该是一条评论,但由于我还不能评论,所以我在这里添加它。
这是来自这里的一个很好的答案: https://stackoverflow.com/a/4483833/3138875
tr - s ''
将
的多个重复实例压缩为一个。This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875
tr -s '<character>'
squeezes multiple repeated instances of<character>
into one.由于“Directo*t*y 路径”(脚本的最后一行)中的拼写错误,它在脚本中不起作用。
It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).
剪切不够灵活。我通常使用 Perl 来实现这一点:
您可以放置任何 Perl 正则表达式,而不是在 -F 之后添加三个空格。您可以通过 $F[n] 访问字段,其中 n 是字段编号(从零开始计数)。这样就不需要 sed 或 tr。
Cut isn't flexible enough. I usually use Perl for that:
Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.