如何打印在bash中与特定数字匹配的线路?

发布于 2025-02-07 03:09:25 字数 564 浏览 1 评论 0原文

文本文件,如下

01_ABC_0000  AA
02_CDE_0000  BB
03_EFG_0000  CC
04_ABC_0001  DD
05_CDE_0001  EE
06_EFG_0001  FF

我有一个称为file.txt的 file0.txt

01_ABC_0000  AA
02_CDE_0000  BB
03_EFG_0000  CC

和file1.txt

04_ABC_0001  DD
05_CDE_0001  EE
06_EFG_0001  FF

我一直在尝试的内容, cat file00.txt |尴尬'{print $ 1}'| sed's/.a \(.... \)/\ 1/')要从第一个单词中获取唯一的数字,但我无法使用它来将其向前分开,将其分隔为两个文件。

任何帮助都非常感谢。

编辑:对不起,我已经编辑了第一个字段01_ABC_0000这样的问题,如果我将字段分离器用作下划线,则无法正常工作。

I have a text file called file.txt like below,

01_ABC_0000  AA
02_CDE_0000  BB
03_EFG_0000  CC
04_ABC_0001  DD
05_CDE_0001  EE
06_EFG_0001  FF

where it should separated into two different files, like
file0.txt

01_ABC_0000  AA
02_CDE_0000  BB
03_EFG_0000  CC

and file1.txt

04_ABC_0001  DD
05_CDE_0001  EE
06_EFG_0001  FF

what i have been trying,
cat file00.txt | awk '{print $1}' | sed 's/.*\(....\)/\1/') to get the only numbers from first word but I am not able use this to go forward separating it into the two files.

Any help is much appreciated.

EDIT: Sorry, I have edited the question where the first field is 01_ABC_0000 something like this, If I use field separator as underscore it doesn't work as expected.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

后来的我们 2025-02-14 03:09:26

第一个解决方案: 考虑到您的条目是用第二列的值(0000,00001等)排序的。使用您显示的样本,请尝试以下内容AWK程序。

awk -v count="1" -F'_| +' '
prev!=$2{
  count++
  close(outputFile)
  outputFile=("file"count".txt")
  prev=$2
}
{
  print > (outputFile)
}
'  Input_file


第二个解决方案: 使用sort + awk组合解决方案如果条目未排序。

awk -F'_| +' '{print $2,$0}' Input_file | 
sort -nk1                               | 
awk -v count="1" -F'_| +' '
{
  sub(/^[^[:space:]]+[[:space:]]+/,"")
}
prev!=$2{
  count++
  close(outputFile)
  outputFile=("file"count".txt")
  prev=$2
}
{
  print > (outputFile)
}
'

1st solution: Considering your entries are sorted with values of 2nd column(0000, 00001 and so on). With your shown samples, please try following awk program.

awk -v count="1" -F'_| +' '
prev!=$2{
  count++
  close(outputFile)
  outputFile=("file"count".txt")
  prev=$2
}
{
  print > (outputFile)
}
'  Input_file


2nd solution: Using sort + awk combination solution in case entries are not sorted.

awk -F'_| +' '{print $2,$0}' Input_file | 
sort -nk1                               | 
awk -v count="1" -F'_| +' '
{
  sub(/^[^[:space:]]+[[:space:]]+/,"")
}
prev!=$2{
  count++
  close(outputFile)
  outputFile=("file"count".txt")
  prev=$2
}
{
  print > (outputFile)
}
'
花开半夏魅人心 2025-02-14 03:09:26

如果您只需要对2个数字执行此操作,则可以使用

cat file00.txt | grep "_0000" >> file1.txt
cat file00.txt | grep "_0001" >> file2.txt

If you only need to do this for 2 numbers, you can use

cat file00.txt | grep "_0000" >> file1.txt
cat file00.txt | grep "_0001" >> file2.txt
厌倦 2025-02-14 03:09:26
awk -F'[_ ]' '{print $0 > "file"substr($2,4,1)".txt"}' file.txt 
awk -F'[_ ]' '{print $0 > "file"substr($2,4,1)".txt"}' file.txt 
怕倦 2025-02-14 03:09:26

使用awk进行此操作的典型方法类似于:

awk '{outfile = sprintf("file%d.txt", $2 + 1); print > outfile}' FS='[_ ]' input

您分析要使用的相关号码的方式将随输入格式而变化。另外,随着输入文件的增长,您可能会担心资源用完,因此您可能需要用类似的内容来关闭文件:

awk '{outfile = sprintf("file%d.txt", $2 + 1); print >> outfile; close(outfile)}' FS='[_ ]' input

它要求您添加一些其他逻辑以确保文件是空或不存在的文件开始之前。

The typical way to do this with awk is something like:

awk '{outfile = sprintf("file%d.txt", $2 + 1); print > outfile}' FS='[_ ]' input

The manner in which you parse out the relevant number to use will change with the input format. Also, as the input file grows larger you may to worry about running out of resources, so you might want to close the files explictly with something like:

awk '{outfile = sprintf("file%d.txt", $2 + 1); print >> outfile; close(outfile)}' FS='[_ ]' input

Which requires you to add some additional logic to ensure that the files are empty or do not exist before you begin.

帅气尐潴 2025-02-14 03:09:26

使用SED

num=$(sort -t'_' -k2 file00.txt | sed -n '$s/[^_]*_\([^ ]*\).*/\1/p')
for ((i=0;i<=num;i++)); do
    sed -n "/0\+$i\b/p" file00.txt >> file$i.txt
done

产生以下输出

$ head file*
==> file0.txt <==
ABC_0000  AA
CDE_0000  BB
EFG_0000  CC

==> file1.txt <==
ABC_0001  DD
CDE_0001  EE
EFG_0001  FF

Using sed

num=$(sort -t'_' -k2 file00.txt | sed -n '$s/[^_]*_\([^ ]*\).*/\1/p')
for ((i=0;i<=num;i++)); do
    sed -n "/0\+$i\b/p" file00.txt >> file$i.txt
done

Which produces the following output

$ head file*
==> file0.txt <==
ABC_0000  AA
CDE_0000  BB
EFG_0000  CC

==> file1.txt <==
ABC_0001  DD
CDE_0001  EE
EFG_0001  FF
烟沫凡尘 2025-02-14 03:09:26
total 4
-rw-r--r--  1 501  20  96 Jun 14 00:49 inputfile.txt

|

 # 2nd FS choice if you're VERY certain the format is fixed
 
 {m,g}awk '{ print >("file"$(NF-!_)".txt") }' FS='^[^_]+_[^_]+_|[ \t]+[^ \t]+

                                              FS=' .+$|.+_'

total 12
-rw-r--r--  1 501  20  48 Jun 14 00:52 file0000.txt
-rw-r--r--  1 501  20  48 Jun 14 00:52 file0001.txt
-rw-r--r--  1 501  20  96 Jun 14 00:49 inputfile.txt
 

==> file0000.txt <==
01_ABC_0000  AA
02_CDE_0000  BB
03_EFG_0000  CC

==> file0001.txt <==
04_ABC_0001  DD
05_CDE_0001  EE
06_EFG_0001  FF
total 4
-rw-r--r--  1 501  20  96 Jun 14 00:49 inputfile.txt

|

 # 2nd FS choice if you're VERY certain the format is fixed
 
 {m,g}awk '{ print >("file"$(NF-!_)".txt") }' FS='^[^_]+_[^_]+_|[ \t]+[^ \t]+

                                              FS=' .+$|.+_'

total 12
-rw-r--r--  1 501  20  48 Jun 14 00:52 file0000.txt
-rw-r--r--  1 501  20  48 Jun 14 00:52 file0001.txt
-rw-r--r--  1 501  20  96 Jun 14 00:49 inputfile.txt
 

==> file0000.txt <==
01_ABC_0000  AA
02_CDE_0000  BB
03_EFG_0000  CC

==> file0001.txt <==
04_ABC_0001  DD
05_CDE_0001  EE
06_EFG_0001  FF
绮筵 2025-02-14 03:09:26

这可能对您有用(gnu uniq和csplit):

uniq -s4 -w4 --group file | csplit --supp -f file -b '%02d.txt' - '/^$/' '{*}'

使用uniq通过一个空行将每组线分开,在该线路中,通过跳过前4个字符并在下面的4个中匹配,将组确定。

将输出从上面的输出传递到CSPLIT中。

抑制匹配行(空白行),然后将组分为file%02d.txt的后缀其中%02被用0001 ...

使用CAT和SED:

cat -n file | 
sed -En 'H;1h;$!d;x
         s/^\s+(\S+).*_(\S+).*/\1 \2/mg
         s/(\S+) (\S+)\n.*(\S+) \2\n?/\1,\3w file\2.txt\n/gp' | 
sed -nf - file

创建一个SED脚本的替代方案,该脚本具有原始数据的范围和文件编号,然后根据数据运行脚本。

This might work for you (GNU uniq and csplit):

uniq -s4 -w4 --group file | csplit --supp -f file -b '%02d.txt' - '/^$/' '{*}'

Use uniq to separate each group of lines by an empty line, where groups are determined by skipping the first 4 characters and matching on the following 4.

Pass the output from the above into csplit.

Suppress the matching lines (blank lines) and split the groups into files named file and suffix of %02d.txt where %02 is replaced by 00, 01 ....

Alternative using cat and sed:

cat -n file | 
sed -En 'H;1h;$!d;x
         s/^\s+(\S+).*_(\S+).*/\1 \2/mg
         s/(\S+) (\S+)\n.*(\S+) \2\n?/\1,\3w file\2.txt\n/gp' | 
sed -nf - file

Creates a sed script with the ranges and file number from the original data and then runs the script against the data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文