bash命令搜索模式(序列),并打印所有内容(右侧和左侧)旁边的所有内容
我正在尝试根据人口(FASTA格式)和保守区域的Poolseq文件重建基因序列。我想搜索文件中的匹配项,然后从该保守序列开始建立相邻区域。
因此,我基本上需要一个bash命令来搜索fasta文件以获取序列段,并在每个读取中打印比赛的相邻区域。
文件: 物种
输入的Dieverse个人的Fasta文件: 20-30 bp序列
输出: 所有读取序列和相邻区域的读取
I'm trying to reconstruct a gene sequence based on a PoolSeq file of a population (fasta format) and a conserved area. I want to search the file for matches with this sequence and then build up the neighboring area starting from that conserved sequence.
So I basically need a Bash command to search a fasta file for a sequence segment and to print the neighboring region of the match in every read.
File:
Fasta file of dieverse Individuals of a species
Input:
20-30 bp Sequence
Output:
All reads with that sequence and the neighboring region in that read
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以尝试使用GREP:
GREP -O -E'。{,20} atgcgt。{,20}'test.fasta
描述:
正格的一部分
-o
仅显示匹配-e
使用扩展的正则:
。{,20}在
。{,20}
最多20个charyou can try with grep:
grep -o -E '.{,20}ATGCGT.{,20}' test.fasta
description:
-o
show only the part of the line that match-E
use extended regexREGEX:
.{,20}
up to 20 any char before.{,20}
up to 20 any char after