如何打印在bash中与特定数字匹配的线路?
文本文件,如下
01_ABC_0000 AA
02_CDE_0000 BB
03_EFG_0000 CC
04_ABC_0001 DD
05_CDE_0001 EE
06_EFG_0001 FF
我有一个称为file.txt的 file0.txt
01_ABC_0000 AA
02_CDE_0000 BB
03_EFG_0000 CC
和file1.txt
04_ABC_0001 DD
05_CDE_0001 EE
06_EFG_0001 FF
我一直在尝试的内容, cat file00.txt |尴尬'{print $ 1}'| sed's/.a \(.... \)/\ 1/')
要从第一个单词中获取唯一的数字,但我无法使用它来将其向前分开,将其分隔为两个文件。
任何帮助都非常感谢。
编辑:对不起,我已经编辑了第一个字段01_ABC_0000
这样的问题,如果我将字段分离器用作下划线,则无法正常工作。
I have a text file called file.txt like below,
01_ABC_0000 AA
02_CDE_0000 BB
03_EFG_0000 CC
04_ABC_0001 DD
05_CDE_0001 EE
06_EFG_0001 FF
where it should separated into two different files, like
file0.txt
01_ABC_0000 AA
02_CDE_0000 BB
03_EFG_0000 CC
and file1.txt
04_ABC_0001 DD
05_CDE_0001 EE
06_EFG_0001 FF
what i have been trying,cat file00.txt | awk '{print $1}' | sed 's/.*\(....\)/\1/')
to get the only numbers from first word but I am not able use this to go forward separating it into the two files.
Any help is much appreciated.
EDIT: Sorry, I have edited the question where the first field is 01_ABC_0000
something like this, If I use field separator as underscore it doesn't work as expected.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
第一个解决方案: 考虑到您的条目是用第二列的值(0000,00001等)排序的。使用您显示的样本,请尝试以下内容
AWK
程序。第二个解决方案: 使用
sort
+awk
组合解决方案如果条目未排序。1st solution: Considering your entries are sorted with values of 2nd column(0000, 00001 and so on). With your shown samples, please try following
awk
program.2nd solution: Using
sort
+awk
combination solution in case entries are not sorted.如果您只需要对2个数字执行此操作,则可以使用
If you only need to do this for 2 numbers, you can use
使用
awk
进行此操作的典型方法类似于:您分析要使用的相关号码的方式将随输入格式而变化。另外,随着输入文件的增长,您可能会担心资源用完,因此您可能需要用类似的内容来关闭文件:
它要求您添加一些其他逻辑以确保文件是空或不存在的文件开始之前。
The typical way to do this with
awk
is something like:The manner in which you parse out the relevant number to use will change with the input format. Also, as the input file grows larger you may to worry about running out of resources, so you might want to close the files explictly with something like:
Which requires you to add some additional logic to ensure that the files are empty or do not exist before you begin.
使用
SED
产生以下输出
Using
sed
Which produces the following output
|
|
这可能对您有用(gnu uniq和csplit):
使用uniq通过一个空行将每组线分开,在该线路中,通过跳过前4个字符并在下面的4个中匹配,将组确定。
将输出从上面的输出传递到CSPLIT中。
抑制匹配行(空白行),然后将组分为
file
和%02d.txt的后缀
其中%02
被用00
,01
...
。使用CAT和SED:
创建一个SED脚本的替代方案,该脚本具有原始数据的范围和文件编号,然后根据数据运行脚本。
This might work for you (GNU uniq and csplit):
Use uniq to separate each group of lines by an empty line, where groups are determined by skipping the first 4 characters and matching on the following 4.
Pass the output from the above into csplit.
Suppress the matching lines (blank lines) and split the groups into files named
file
and suffix of%02d.txt
where%02
is replaced by00
,01
...
.Alternative using cat and sed:
Creates a sed script with the ranges and file number from the original data and then runs the script against the data.