Bash 代码每 4 行拆分一次,然后合并
也许我的标题并不能完全说明我的意图。 我有一个如下所示的数据列表:
@HWI-ST150_0129:3:8:21208:93107#0/1
TGTCTAGTTTTTATAGGAAGATATTTCCTTTTCTACCTTTGACTTCAAAGCGGCTGAAATCTCCACTTGCAAATTCCACAAAAAGAGTGTTACAAGTCT
+
Yeeeeeeeeeceed]dddddd^YdceeeedaeeddYccccc\ddceeYeYY`[`bcYc^_XY^_]d^dd`abdddee\e\ddLb]`_`cTbbbYbaM_]
@HWI-ST150_0129:3:8:21208:93107#0/2
TTTGTAAAGTCTGCACGTGGATAACTTGACCACTTAGAGGCCTTCGTTGGAAACGGGTTTTTTTCATGTAAGGCTAGACAGAAGAATTCTCAGTAACTTCAAGTTACTGAGAATTCTTCTGTCTAGCCTTACATGAAAAAAACCCGTTTCCAACGAAGGCCTCTAAGTGGTCAAGTTATCCACGTGCAGACTTTACAAA
+
ffcaefffcdeeeeeeeeeedff^f`\\eeedaec^d^d`deaffeeTecb^bbbddadYcccW[X\MZ\XaU_UTI\]TZ]K[VQX^aIb`b`^X^YSYHWI-ST150_0129:3:8:21208:93107#0
我们可以看到第一行和第五行都是头/名称,但以 #0/1 或 #0/2 结尾。现在我希望每 4 行进行分组,但稍后将所有带有 #0/1 的行合并在一起,然后将 #0/2 合并在一起。
应该是这样的:
@HWI....#0/1
TTCCGC
+
cffccc
@HWI....#0/1
CCGGGG
+
abbcgg
....
另一个文件是: @HWI...#0/1 ATTCCG + FCCFCC @HWI...#0/1 CGCCGG + gbbcaa
我知道如何用一个简单的 python 脚本来做到这一点。但只是想知道我们是否只能使用一些非常简单的 bash 代码? 谢谢
Maybe my title cannot fully explain my intention.
I have a list of data like below:
@HWI-ST150_0129:3:8:21208:93107#0/1
TGTCTAGTTTTTATAGGAAGATATTTCCTTTTCTACCTTTGACTTCAAAGCGGCTGAAATCTCCACTTGCAAATTCCACAAAAAGAGTGTTACAAGTCT
+
Yeeeeeeeeeceed]dddddd^YdceeeedaeeddYccccc\ddceeYeYY`[`bcYc^_XY^_]d^dd`abdddee\e\ddLb]`_`cTbbbYbaM_]
@HWI-ST150_0129:3:8:21208:93107#0/2
TTTGTAAAGTCTGCACGTGGATAACTTGACCACTTAGAGGCCTTCGTTGGAAACGGGTTTTTTTCATGTAAGGCTAGACAGAAGAATTCTCAGTAACTTCAAGTTACTGAGAATTCTTCTGTCTAGCCTTACATGAAAAAAACCCGTTTCCAACGAAGGCCTCTAAGTGGTCAAGTTATCCACGTGCAGACTTTACAAA
+
ffcaefffcdeeeeeeeeeedff^f`\\eeedaec^d^d`deaffeeTecb^bbbddadYcccW[X\MZ\XaU_UTI\]TZ]K[VQX^aIb`b`^X^YSYHWI-ST150_0129:3:8:21208:93107#0
We can see the first line and 5th line are both head/name, but ending with either #0/1 or #0/2. Now I hope to group every 4 lines, but later merge all those with #0/1 together, and #0/2 together.
Should be like:
@HWI....#0/1
TTCCGC
+
cffccc
@HWI....#0/1
CCGGGG
+
abbcgg
....
also another file is:
@HWI....#0/1
ATTCCG
+
fccfcc
@HWI....#0/1
CGCCGG
+
gbbcaa
I know how to do this with a simple python script. But just wondering if we can do only with some quite simple bash code?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
sed -n '1,${p;n;n;n;}'
应该可以获取每第四行:用于
sed
的有用单行脚本man sed
sed -n '1,${p;n;n;n;}'
should work for getting every 4th line:Useful One-Line Scripts For
sed
man sed
我不确定我是否理解你的意思,但是使用 GNU sed 获取每 4 行是微不足道的:
将四行分组,我认为你的意思是将 4 行减少到一行:
这使用了你的正则表达式上面提到的即以
#0/1
或#0/2
结尾的行I'm not sure I understand you, however getting every 4th line is trivial with GNU sed:
To
group
four lines, by which I presume you mean reduce 4 lines to one:This uses the regex which you mentioned above i.e. a line ending in
#0/1
or#0/2