管道尴尬和grep保存文件的特定字段
我想实现的目标:
- GREP:提取带有重叠号和长度
- 尴尬的行:删除“长度:”第2列
- 排序:按长度(按降序)排序(按降序)
当前代码
grep "length:" test_reads.fa.contigs.vcake_output | awk -F:'{print $2}' |sort -g -r > contig.txt
示例test_reads.fa.contigs的当前代码示例内容。 VCAKE_OUTPUT
:
>Contig_11 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_0 length:99995
ATTTATGCCGTTGGCCACGAATTCAGAATCATATTA
预期输出
>Contig_0 99995
>Contig_11 42
What I want to achieve:
- grep: extract lines with the contig number and length
- awk: remove "length:" from column 2
- sort: sort by length (in descending order)
Current code
grep "length:" test_reads.fa.contigs.vcake_output | awk -F:'{print $2}' |sort -g -r > contig.txt
Example content of test_reads.fa.contigs.vcake_output
:
>Contig_11 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_0 length:99995
ATTTATGCCGTTGGCCACGAATTCAGAATCATATTA
Expected output
>Contig_0 99995
>Contig_11 42
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在您显示的样本中,请尝试以下内容以下
awk
+sort
解决方案。说明: 简单说明将是,运行
awk
先于读取input_file,其中将字段分隔符设置为:或空间和检查条件是否从>
开始,然后打印其第一和第二字段,然后将其输出(作为标准输入)发送到sort> sort
命令,其中从第二个字段对其进行排序要获得所需的输出。With your shown samples, please try following
awk
+sort
solution here.Explanation: Simple explanation would be, running
awk
program to read Input_file first, where setting field separator as:
OR space and checking condition if line starts from>
then printing its 1st and 2nd fields then sending its output(as a standard input) tosort
command where sorting it from 2nd field to get required output.这是一个gnu-wawk解决方案,它在单个命令中都可以完成所有操作,而无需调用
sort
:Here is a gnu-awk solution that does it all in a single command without invoking
sort
:也许是,将Grep and Awk结合在一起:
Perhaps this, combining grep and awk:
假设:
将一些其他行添加到示例输入中:
一个
sed/sort
构想:wery:
- en
- 启用扩展正则支持并抑制输入数据的正常打印(> [^])+)+)
- (1st捕获组) ->
遵循由1个或多个非空间字符长度:
- 空间,然后是长度:
(。*)
- (第二捕获组) - 0或0更多字符(遵循结肠)$
- 线的结尾\ 1 \ 2/p
- 打印1st捕获组 +
< space>
+ +第二个捕获组-k2,2nr
- 在r
Everse n umeric order订单-k1, 1V
- 按v
ersion订单中的第1(空格删除)字段进行排序:
Assumptions:
Adding some additional lines to the sample input:
One
sed/sort
idea:Where:
-En
- enable extended regex support and suppress normal printing of input data(>[^ ])+)
- (1st capture group) ->
followed by 1 or more non-space characterslength:
- space followed bylength:
(.*)
- (2nd capture group) - 0 or more characters (following the colon)$
- end of line\1 \2/p
- print 1st capture group +<space>
+ 2nd capture group-k2,2nr
- sort by 2nd (spaced-delimited) field inr
eversen
umeric order-k1,1V
- sort by 1st (space-delimited) field inV
ersion orderThis generates: