如何使用替换矩阵修改 Smith-Waterman 算法以在 Perl 中对齐蛋白质?
如何使用 Smith-Waterman 算法 修改Perl 中对齐蛋白质的替换矩阵?
[需要引用]
How can I modify the Smith-Waterman algorithm using a substitution matrix to align proteins in Perl?
[citations needed]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我实际上是一名生物信息学研究员,正在等待他自己的生物信息学代码运行,所以我将尝试回答你的问题,尽管它的提出相当糟糕。
我不知道为什么你认为你需要“修改”史密斯-沃特曼算法。 Smith-Waterman 算法对齐蛋白质而不是 DNA 时唯一需要的是蛋白质的替换矩阵。查看 BLOSUM 或 PAM。这些是基于很久以前由一些生物学家手工比对的序列中各种氨基酸对的取代频率。
构建蛋白质序列的替换矩阵比 DNA 序列复杂得多。例如,您期望一种亲水性氨基酸相对频繁地替代另一种亲水性氨基酸,因为它通常能够这样做而不会导致蛋白质失去功能。然而,您不会期望疏水性氨基酸经常取代亲水性氨基酸,因为这会更彻底地改变蛋白质结构。
如果您将替换矩阵视为输入而不是算法的一部分,则史密斯-沃特曼算法虽然通常应用于 DNA 或蛋白质,但从技术上讲是一种通用的字符串对齐算法。
I'm actually a bioinformatics researcher, and one that is waiting for his own bioinformatics code to run, so I'll attempt to answer your question even though it's rather poorly posed.
I'm not sure why you think you need to "modify" the Smith-Waterman algorithm. The only thing the Smith-Waterman algorithm needs to align proteins instead of DNA is a substitution matrix for proteins. Look into BLOSUM or PAM. These are based on the substitution frequencies of various amino acid pairs in sequences hand-aligned by some biologists a long time ago.
Constructing a substitution matrix for protein sequences is much more complicated than for DNA sequences. For example, you'd expect one hydrophilic amino acid to substitute for another relatively frequently because it would often be able to do so w/o causing the protein to lose function. However, you wouldn't expect a hydrophobic amino acid to substitute for a hydrophilic amino acid as often because this would change the protein structure more drastically.
If you view the substitution matrix as an input instead of part of the algorithm, the Smith-Waterman algorithm, while typically applied to DNA or proteins, is technically a general string alignment algorithm.
也许从 Bio::Tools::pSW 开始,尝试修改按照您想要的方式进行,并在遇到困难时提出具体问题。
Maybe start with Bio::Tools::pSW, try to modify it the way you want and ask specific questions if you run in to difficulty.