Unix - 需要剪切一个有多个空格作为分隔符的文件 - awk 或 cut？

发布于 2024-10-06 03:13:38 字数 871 浏览 13 评论 0原文

我需要从 Unix 中的文本文件中获取记录。分隔符是多个空格。例如：

2U2133   1239  
1290fsdsf   3234

由此，我需要提取

1239  
3234

所有记录的分隔符将始终为 3 个空格。

我需要在 unix 脚本（.scr）中执行此操作并将输出写入另一个文件或将其用作 do-while 循环的输入。我尝试了以下操作：

while read readline  
do  
        read_int=`echo "$readline"`  
        cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`  
if [ $cnt_exc -gt 0 ]  
then  
  int_1=0  
else  
  int_2=0  
fi  
done < awk -F'  ' '{ print $2 }' ${Directoty path}/test_file.txt

test_file.txt 是输入文件，file1.txt 是查找文件。但上述方法不起作用，并在 awk -F 附近给我语法错误，

我尝试将输出写入文件。以下在命令行中工作：

more test_file.txt | awk -F'   ' '{ print $2 }' > output.txt

这是工作并将记录写入命令行中的output.txt。但相同的命令在 unix 脚本中不起作用（它是一个 .scr 文件）

请让我知道哪里出了问题以及如何解决此问题。

谢谢，
维沙赫

原文

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:

2U2133   1239  
1290fsdsf   3234

From this, I need to extract

1239  
3234

The delimiter for all records will be always 3 blanks.

I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:

while read readline  
do  
        read_int=`echo "$readline"`  
        cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`  
if [ $cnt_exc -gt 0 ]  
then  
  int_1=0  
else  
  int_2=0  
fi  
done < awk -F'  ' '{ print $2 }' ${Directoty path}/test_file.txt

test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F

I tried writing the output to a file. The following worked in command line:

more test_file.txt | awk -F'   ' '{ print $2 }' > output.txt

This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)

Please let me know where I am going wrong and how I can resolve this.

Thanks,
Visakh

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初懵 2024-10-13 03:13:38

将多个分隔符替换为一个分隔符的工作留给了 tr：

cat <file_name> | tr -s ' ' | cut -d ' ' -f 2

tr 翻译或删除字符，非常适合为以下目的准备数据： cut 才能正常工作。

手册指出：

-s, --squeeze-repeats
          替换重复字符的每个序列
          在最后指定的 SET 中列出，仅出现一次
          那个角色的

The job of replacing multiple delimiters with just one is left to tr:

cat <file_name> | tr -s ' ' | cut -d ' ' -f 2

tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.

The manual states:

-s, --squeeze-repeats
          replace each sequence  of  a  repeated  character  that  is
          listed  in the last specified SET, with a single occurrence
          of that character

回复收藏 0 原文

够钟 2024-10-13 03:13:38

这取决于您计算机上 cut 的版本或实现。某些版本支持一个选项，通常是 -i，这意味着“忽略空白字段”，或者等效地允许字段之间有多个分隔符。如果支持，请使用：

cut -i -d' ' -f 2 data.file

如果不支持（而且它不是通用的，甚至可能不广泛，因为 GNU 和 MacOS X 都没有这个选项），那么使用 awk 更好、更便携。

不过，您需要将 awk 的输出通过管道传输到循环中：

awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done

唯一的遗留问题是 while 循环是否位于子 shell 中，因此不会修改您的 main shell 脚本变量，只是这些变量的自己的副本。

对于 bash，您可以使用进程替换

while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)

： while 在当前 shell 中循环，但会安排命令的输出，就像来自文件一样。

${Directory path} 中的空白通常是不合法的——除非它是我错过的另一个 Bash 功能；您在一处也有拼写错误（Directoty）。

It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:

cut -i -d' ' -f 2 data.file

If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.

You need to pipe the output of awk into your loop, though:

awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done

The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.

With bash, you can use process substitution:

while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)

This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.

The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.

回复收藏 0 原文

画▽骨i 2024-10-13 03:13:38

除了执行相同操作的其他方法之外，程序中的错误是这样的：您无法从 (<) 另一个程序的输出重定向。翻转你的脚本并使用像这样的管道：

awk -F'   ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline

等等

。此外，使用“readline”作为变量名可能会或可能不会给你带来问题。

Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:

awk -F'   ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline

etc.

Besides, the use of "readline" as a variable name may or may not get you into problems.

回复收藏 0 原文

緦唸λ蓇 2024-10-13 03:13:38

在这种特殊情况下，您可以使用以下行

sed 's/   /\t/g' <file_name> | cut -f 2

来获取第二列。

In this particular case, you can use the following line

sed 's/   /\t/g' <file_name> | cut -f 2

to get your second columns.

回复收藏 0 原文

离鸿 2024-10-13 03:13:38

在 bash 中你可以从这样开始：

for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
    grep -c $n ${Directory path}/file*.txt
}

In bash you can start from something like this:

for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
    grep -c $n ${Directory path}/file*.txt
}

回复收藏 0 原文

三生池水覆流年 2024-10-13 03:13:38

这应该是一条评论，但由于我还不能评论，所以我在这里添加它。
这是来自这里的一个很好的答案： https://stackoverflow.com/a/4483833/3138875

tr -s ' ' <text.txt | cut -d ' ' -f4

tr - s '' 将的多个重复实例压缩为一个。

This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875

tr -s ' ' <text.txt | cut -d ' ' -f4

tr -s '<character>' squeezes multiple repeated instances of <character> into one.

回复收藏 0 原文

ぽ尐不点ル 2024-10-13 03:13:38

由于“Directo*t*y 路径”（脚本的最后一行）中的拼写错误，它在脚本中不起作用。

回复收藏 0 原文

坦然微笑 2024-10-13 03:13:38

剪切不够灵活。我通常使用 Perl 来实现这一点：

cat file.txt | perl -F'   ' -e 'print $F[1]."\n"'

您可以放置任何 Perl 正则表达式，而不是在 -F 之后添加三个空格。您可以通过 $F[n] 访问字段，其中 n 是字段编号（从零开始计数）。这样就不需要 sed 或 tr。

Cut isn't flexible enough. I usually use Perl for that:

cat file.txt | perl -F'   ' -e 'print $F[1]."\n"'

Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

回复收藏 0 原文

~没有更多了~