如何从FastQ文件中提取唯一的读取ID?
我想在FastQ文件中提取所有唯一的读取ID,并将唯一的读取ID输出到文本文件中。 (我使用samtools完成了BAM文件的同一任务,但我不知道任何可以处理FastQ文件的工具。)
BAM文件: samtools view input.bam | cut -f1 |排序| uniq>> unique.reads.txt
for fastq:(需要帮助)
寻找一个单线命令或可以做到这一点的工具。
谢谢。
I want to extract all the unique read IDs in a fastq file and output the unique read IDs to a text file. (I have done the same task for bam files using the samtools but I don't know any tools that would handle fastq files.)
for BAM files: samtools view input.bam|cut -f1 | sort | uniq >> unique.reads.txt
for fastq: (need help)
Looking for a one-liner command or a tool that can do that.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用seqkit(无需排序):
在这里,您可以:
seqkit seqkit fx2tab reads.fq中打印ID。 awk -v ofs ='\ t''{array [$ 1] = 1}结束{for(readid in array)print print readid}'> unique.reads.txt
您也可以做到这一点:
seqkit fx2tab reads.fq |剪切-f 1 |排序| uniq> unique.reads.txt
,但是您需要先对文件进行排序,
但没有seqkit:
grep“@” reads.fq |排序| uniq> unique.reads.txt
grep“@” reads.fq | awk -v ofs ='\ t''{array [$ 1] = 1}结束{for(readid in array)print print readid}'> unique.reads.txt
,但我一般像seqkit一样,始终宣传它
using seqkit (no need to sort):
here you basically:
seqkit fx2tab reads.fq | awk -v OFS='\t' '{array[$1]=1} END {for (readID in array) print readID}' > unique.reads.txt
also you can do this:
seqkit fx2tab reads.fq | cut -f 1 | sort | uniq > unique.reads.txt
but then you'll need to sort the file first
or almost the same but without seqkit:
grep "@" reads.fq | sort | uniq > unique.reads.txt
grep "@" reads.fq | awk -v OFS='\t' '{array[$1]=1} END {for (readID in array) print readID}' > unique.reads.txt
but I in general like seqkit, always advertise it