如何替换以 > 开头的行通过 awk 命令与同一行的 15 列?
我的文件如下所示:
>gi|358482566|ref|NW_003766328.1| Gallus gallus breed Red Jungle fowl, inbred line UCD001 unplaced genomic scaffold, Gallus_gallus-4.0 ChrUn_7180000961607, whole genome shotgun sequence
TCTGTCTCTTGTCACTGTATTGTAGTGTGAACCCCTTAAAGGGAAGACCTGCTCTCCTTTGAAAATGCTT
GCTCATCTATATGCCTCATGCATACCCTCACTGGCAAAGGAGAGCTGAAGTAATTTTAGGACAGAGGAGT
ACTAGATTGTA
>gi|358482565|ref|NW_003766329.1| Gallus gallus breed Red Jungle fowl, inbred line UCD001 unplaced genomic scaffold, Gallus_gallus-4.0 ChrUn_7180000961609, whole genome shotgun sequence
TTTGACCAATGCATTTCAGCATGTTTTTTGACACTAGGTATGCCATTTGGGATGACAATATCAGTTTCCA
TTTCCATTAGAGGAAAATAAGGTT
我想将所有以 >
开头的行替换为其第 15 列。我不知道如何用列替换该行,所以我试图用第 15 列替换该行的所有列。
所以我期望的输出是:
>ChrUn_7180000961607
TCTGTCTCTTGTCACTGTATTGTAGTGTGAACCCCTTAAAGGGAAGACCTGCTCTCCTTTGAAAATGCTT
GCTCATCTATATGCCTCATGCATACCCTCACTGGCAAAGGAGAGCTGAAGTAATTTTAGGACAGAGGAGT
ACTAGATTGTA
>ChrUn_7180000961609
TTTGACCAATGCATTTCAGCATGTTTTTTGACACTAGGTATGCCATTTGGGATGACAATATCAGTTTCCA
TTTCCATTAGAGGAAAATAAGGTT
这些是我的命令:
awk '{if ($1 ~ />/) for (i=1; i<=19; i++) gsub ($i, $15)}'
test.fa
当我使用它时,我在文件中得到了一些更改,但不是我想要的!第 15 栏已删除!
awk '{if ($1 ~ />/) for (i=1; i<=19; i++) a= $15 gsub($i, a)}'
gga_ref_Gallus_gallus-4.0_unplaced.fa
当我使用这个时,我收到此错误!
awk: (FILENAME=gga_ref_Gallus_gallus-4.0_unplaced.fa FNR=1) fatal: sub_common: buf: can't allocate 521711124992 bytes of memory (Cannot allocate memory)
所以我想要的是将*所有以* >
开头的行替换为第15列,并且我仍然想要开头有>
!
I have file which looks like this:
>gi|358482566|ref|NW_003766328.1| Gallus gallus breed Red Jungle fowl, inbred line UCD001 unplaced genomic scaffold, Gallus_gallus-4.0 ChrUn_7180000961607, whole genome shotgun sequence
TCTGTCTCTTGTCACTGTATTGTAGTGTGAACCCCTTAAAGGGAAGACCTGCTCTCCTTTGAAAATGCTT
GCTCATCTATATGCCTCATGCATACCCTCACTGGCAAAGGAGAGCTGAAGTAATTTTAGGACAGAGGAGT
ACTAGATTGTA
>gi|358482565|ref|NW_003766329.1| Gallus gallus breed Red Jungle fowl, inbred line UCD001 unplaced genomic scaffold, Gallus_gallus-4.0 ChrUn_7180000961609, whole genome shotgun sequence
TTTGACCAATGCATTTCAGCATGTTTTTTGACACTAGGTATGCCATTTGGGATGACAATATCAGTTTCCA
TTTCCATTAGAGGAAAATAAGGTT
I want to replace all the lines that start with >
with its 15th column. what I don't know is how to replace the line with a column so I was trying to replace all the columns of that line by 15th column.
so what I expect to have as an output is:
>ChrUn_7180000961607
TCTGTCTCTTGTCACTGTATTGTAGTGTGAACCCCTTAAAGGGAAGACCTGCTCTCCTTTGAAAATGCTT
GCTCATCTATATGCCTCATGCATACCCTCACTGGCAAAGGAGAGCTGAAGTAATTTTAGGACAGAGGAGT
ACTAGATTGTA
>ChrUn_7180000961609
TTTGACCAATGCATTTCAGCATGTTTTTTGACACTAGGTATGCCATTTGGGATGACAATATCAGTTTCCA
TTTCCATTAGAGGAAAATAAGGTT
these are my commands:
awk '{if ($1 ~ />/) for (i=1; i<=19; i++) gsub ($i, $15)}'
test.fa
when I use this I get some changes in the file but not what I want!! the column 15 is removed!!!
awk '{if ($1 ~ />/) for (i=1; i<=19; i++) a= $15 gsub($i, a)}'
gga_ref_Gallus_gallus-4.0_unplaced.fa
and when I use this one I get this error!
awk: (FILENAME=gga_ref_Gallus_gallus-4.0_unplaced.fa FNR=1) fatal: sub_common: buf: can't allocate 521711124992 bytes of memory (Cannot allocate memory)
so what I want is two replace a*ll the lines that start with* >
with the 15th column and I want to still have >
in the beginning!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为这将满足您的要求:
它使所有不以
>
开头的行保持不变。这是通过使用next
告诉 awk 对于以>
开头的行跳到下一条记录来完成的。1
之所以存在,是因为它始终为 true,因此对于任何不以>
开头的行,都会调用打印该行的默认操作。I think this will do what you want:
It leaves all lines that do not start with
>
unchanged. This is accomplished by usingnext
to tell awk to skip to the next record for the case of lines starting with>
. The1
is there because it is always true, so the default action of printing the line is invoked for any line that does not start with>
.这可能对你有用:
This might work for you: