Unix - 需要剪切一个有多个空格作为分隔符的文件 - awk 或 cut?

发布于 2024-10-06 03:13:38 字数 871 浏览 9 评论 0原文

我需要从 Unix 中的文本文件中获取记录。分隔符是多个空格。例如:

2U2133   1239  
1290fsdsf   3234

由此,我需要提取

1239  
3234

所有记录的分隔符将始终为 3 个空格。

我需要在 unix 脚本(.scr)中执行此操作并将输出写入另一个文件或将其用作 do-while 循环的输入。我尝试了以下操作:

while read readline  
do  
        read_int=`echo "$readline"`  
        cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`  
if [ $cnt_exc -gt 0 ]  
then  
  int_1=0  
else  
  int_2=0  
fi  
done < awk -F'  ' '{ print $2 }' ${Directoty path}/test_file.txt  

test_file.txt 是输入文件,file1.txt 是查找文件。但上述方法不起作用,并在 awk -F 附近给我语法错误,

我尝试将输出写入文件。以下在命令行中工作:

more test_file.txt | awk -F'   ' '{ print $2 }' > output.txt

这是工作并将记录写入命令行中的output.txt。但相同的命令在 unix 脚本中不起作用(它是一个 .scr 文件)

请让我知道哪里出了问题以及如何解决此问题。

谢谢,
维沙赫

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:

2U2133   1239  
1290fsdsf   3234

From this, I need to extract

1239  
3234

The delimiter for all records will be always 3 blanks.

I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:

while read readline  
do  
        read_int=`echo "$readline"`  
        cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`  
if [ $cnt_exc -gt 0 ]  
then  
  int_1=0  
else  
  int_2=0  
fi  
done < awk -F'  ' '{ print $2 }' ${Directoty path}/test_file.txt  

test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F

I tried writing the output to a file. The following worked in command line:

more test_file.txt | awk -F'   ' '{ print $2 }' > output.txt

This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)

Please let me know where I am going wrong and how I can resolve this.

Thanks,
Visakh

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

初懵 2024-10-13 03:13:38

将多个分隔符替换为一个分隔符的工作留给了 tr

cat <file_name> | tr -s ' ' | cut -d ' ' -f 2

tr 翻译或删除字符,非常适合为以下目的准备数据: cut 才能正常工作。

手册指出:

-s, --squeeze-repeats
          替换重复字符的每个序列
          在最后指定的 SET 中列出,仅出现一次
          那个角色的

The job of replacing multiple delimiters with just one is left to tr:

cat <file_name> | tr -s ' ' | cut -d ' ' -f 2

tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.

The manual states:

-s, --squeeze-repeats
          replace each sequence  of  a  repeated  character  that  is
          listed  in the last specified SET, with a single occurrence
          of that character
够钟 2024-10-13 03:13:38

这取决于您计算机上 cut 的版本或实现。某些版本支持一个选项,通常是 -i,这意味着“忽略空白字段”,或者等效地允许字段之间有多个分隔符。如果支持,请使用:

cut -i -d' ' -f 2 data.file

如果不支持(而且它不是通用的,甚至可能不广泛,因为 GNU 和 MacOS X 都没有这个选项),那么使用 awk 更好、更便携。

不过,您需要将 awk 的输出通过管道传输到循环中:

awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done

唯一的遗留问题是 while 循环是否位于子 shell 中,因此不会修改您的 main shell 脚本变量,只是这些变量的自己的副本。

对于 bash,您可以使用 进程替换

while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)

while 在当前 shell 中循环,但会安排命令的输出,就像来自文件一样。

${Directory path} 中的空白通常是不合法的——除非它是我错过的另一个 Bash 功能;您在一处也有拼写错误(Directoty)。

It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:

cut -i -d' ' -f 2 data.file

If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.

You need to pipe the output of awk into your loop, though:

awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done

The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.

With bash, you can use process substitution:

while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)

This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.

The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.

画▽骨i 2024-10-13 03:13:38

除了执行相同操作的其他方法之外,程序中的错误是这样的:您无法从 (<) 另一个程序的输出重定向。翻转你的脚本并使用像这样的管道:

awk -F'   ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline

等等

。此外,使用“readline”作为变量名可能会或可能不会给你带来问题。

Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:

awk -F'   ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline

etc.

Besides, the use of "readline" as a variable name may or may not get you into problems.

緦唸λ蓇 2024-10-13 03:13:38

在这种特殊情况下,您可以使用以下行

sed 's/   /\t/g' <file_name> | cut -f 2

来获取第二列。

In this particular case, you can use the following line

sed 's/   /\t/g' <file_name> | cut -f 2

to get your second columns.

离鸿 2024-10-13 03:13:38

在 bash 中你可以从这样开始:

for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
    grep -c $n ${Directory path}/file*.txt
}

In bash you can start from something like this:

for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
    grep -c $n ${Directory path}/file*.txt
}
三生池水覆流年 2024-10-13 03:13:38

这应该是一条评论,但由于我还不能评论,所以我在这里添加它。
这是来自这里的一个很好的答案: https://stackoverflow.com/a/4483833/3138875

tr -s ' ' <text.txt | cut -d ' ' -f4

tr - s '' 的多个重复实例压缩为一个。

This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875

tr -s ' ' <text.txt | cut -d ' ' -f4

tr -s '<character>' squeezes multiple repeated instances of <character> into one.

ぽ尐不点ル 2024-10-13 03:13:38

由于“Directo*t*y 路径”(脚本的最后一行)中的拼写错误,它在脚本中不起作用。

It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).

坦然微笑 2024-10-13 03:13:38

剪切不够灵活。我通常使用 Perl 来实现这一点:

cat file.txt | perl -F'   ' -e 'print $F[1]."\n"'

您可以放置​​任何 Perl 正则表达式,而不是在 -F 之后添加三个空格。您可以通过 $F[n] 访问字段,其中 n 是字段编号(从零开始计数)。这样就不需要 sedtr

Cut isn't flexible enough. I usually use Perl for that:

cat file.txt | perl -F'   ' -e 'print $F[1]."\n"'

Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文