管道尴尬和grep保存文件的特定字段

发布于 2025-01-24 13:05:12 字数 540 浏览 1 评论 0原文

我想实现的目标：

GREP：提取带有重叠号和长度
尴尬的行：删除“长度：”第2列
排序：按长度（按降序）排序（按降序）

当前代码

grep "length:" test_reads.fa.contigs.vcake_output | awk -F:'{print $2}' |sort -g -r > contig.txt

示例test_reads.fa.contigs的当前代码示例内容。 VCAKE_OUTPUT：

>Contig_11 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_0 length:99995
ATTTATGCCGTTGGCCACGAATTCAGAATCATATTA

预期输出

>Contig_0 99995
>Contig_11 42

原文

What I want to achieve:

grep: extract lines with the contig number and length
awk: remove "length:" from column 2
sort: sort by length (in descending order)

Current code

grep "length:" test_reads.fa.contigs.vcake_output | awk -F:'{print $2}' |sort -g -r > contig.txt

Example content of test_reads.fa.contigs.vcake_output:

>Contig_11 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_0 length:99995
ATTTATGCCGTTGGCCACGAATTCAGAATCATATTA

Expected output

>Contig_0 99995
>Contig_11 42

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟酉 2025-01-31 13:05:12

在您显示的样本中，请尝试以下内容以下awk + sort解决方案。

awk -F'[: ]' '/^>/{print $1,$3}' Input_file | sort -nrk2

说明： 简单说明将是，运行awk先于读取input_file，其中将字段分隔符设置为：或空间和检查条件是否从＆gt;开始，然后打印其第一和第二字段，然后将其输出（作为标准输入）发送到sort> sort命令，其中从第二个字段对其进行排序要获得所需的输出。

With your shown samples, please try following awk + sort solution here.

awk -F'[: ]' '/^>/{print $1,$3}' Input_file | sort -nrk2

Explanation: Simple explanation would be, running awk program to read Input_file first, where setting field separator as : OR space and checking condition if line starts from > then printing its 1st and 2nd fields then sending its output(as a standard input) to sort command where sorting it from 2nd field to get required output.

回复收藏 0 原文

命硬 2025-01-31 13:05:12

这是一个gnu-wawk解决方案，它在单个命令中都可以完成所有操作，而无需调用sort：

awk -F '[:[:blank:]]' '
$2 == "length" {arr[$1] = $3}
END {
   PROCINFO["sorted_in"] = "@ind_num_asc"
   for (i in arr)
      print i, arr[i]
}' file

>Contig_0 99995
>Contig_11 42

Here is a gnu-awk solution that does it all in a single command without invoking sort:

awk -F '[:[:blank:]]' '
$2 == "length" {arr[$1] = $3}
END {
   PROCINFO["sorted_in"] = "@ind_num_asc"
   for (i in arr)
      print i, arr[i]
}' file

>Contig_0 99995
>Contig_11 42

回复收藏 0 原文

拥醉 2025-01-31 13:05:12

也许是，将Grep and Awk结合在一起：

awk -F '[ :]' '$2 == "length" {print $1, $3}' file | sort ...

Perhaps this, combining grep and awk:

awk -F '[ :]' '$2 == "length" {print $1, $3}' file | sort ...

回复收藏 0 原文

命硬 2025-01-31 13:05:12

假设：

如果多个行的长度相同，则使用“版本”对第一个列进行排序，然后

将一些其他行添加到示例输入中：

$ cat test_reads.fa.contigs.vcake_output
>Contig_0 length:99995
ATTTATGCCGTTGGCCACGAATTCAGAATCATATTA
>Contig_11 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_17 length:93
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_837 ignore-this-length:1000000
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_8 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT

一个sed/sort构想：

$ sed -rn 's/(>[^ ]+) length:(.*)$/\1 \2/p' test_reads.fa.contigs.vcake_output | sort -k2,2nr -k1,1V

wery：

- en - 启用扩展正则支持并抑制输入数据的正常打印
（＆gt; [^]）+）+） - （1st捕获组） - ＆gt;遵循由1个或多个非空间字符
长度： - 空间，然后是长度：
（。*） - （第二捕获组） - 0或0更多字符（遵循结肠）
$ - 线的结尾
\ 1 \ 2/p
- 打印1st捕获组 + ＆lt; space＆gt; + +第二个捕获组
-k2,2nr - 在r Everse n umeric order订单
-k1， 1V - 按v ersion订单中的第1（空格删除）字段进行排序

：

>Contig_0 99995
>Contig_17 93
>Contig_8 42
>Contig_11 42

Assumptions:

if more than one row has the same length then additionally sort the 1st column using 'version' sort

Adding some additional lines to the sample input:

$ cat test_reads.fa.contigs.vcake_output
>Contig_0 length:99995
ATTTATGCCGTTGGCCACGAATTCAGAATCATATTA
>Contig_11 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_17 length:93
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_837 ignore-this-length:1000000
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_8 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT

One sed/sort idea:

$ sed -rn 's/(>[^ ]+) length:(.*)$/\1 \2/p' test_reads.fa.contigs.vcake_output | sort -k2,2nr -k1,1V

Where:

-En - enable extended regex support and suppress normal printing of input data
(>[^ ])+) - (1st capture group) - > followed by 1 or more non-space characters
length: - space followed by length:
(.*) - (2nd capture group) - 0 or more characters (following the colon)
$ - end of line
\1 \2/p - print 1st capture group + <space> + 2nd capture group
-k2,2nr - sort by 2nd (spaced-delimited) field in reverse numeric order
-k1,1V - sort by 1st (space-delimited) field in Version order

This generates:

>Contig_0 99995
>Contig_17 93
>Contig_8 42
>Contig_11 42

回复收藏 0 原文

~没有更多了~

关于作者

情绪少女

暂无简介

文章

29 人气

关注发私信

友情链接

文江博客

管道尴尬和grep保存文件的特定字段

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

管道尴尬和grep保存文件的特定字段

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。