评估 BLASTn 分数的重要性?
我正在运行独立的命令行blast,将许多查询序列与大型数据库核苷酸序列进行对齐。我可以修改blastn程序的命令行参数来更改各种参数,例如匹配/不匹配分数。
我想知道 - 对于blastn输出的“位分数”,比较具有相同查询和数据库序列但不同匹配/不匹配参数的比对的位分数是否有意义?我正在尝试评估爆炸在各种参数值下的表现如何,但我想确保所有内容都在平等的基础上进行比较。谢谢。
I am running standalone command line blast to align many query sequences against a large database sequence of nucleotides. I can modify the command line parameters of the blastn program to change various parameters such as the match/mismatch scores.
I am wondering - for the 'bit score' that blastn outputs, does it make sense to compare the bit scores for alignments with identical query and database sequences but different match/mismatch parameters? I am trying to assess how well blast is performing with various parameter values, but I want to make sure that everything is being compared on even grounds. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不清楚为什么您认为比较位分数可以让您了解 BLAST 的性能。通常的方法
不幸的是,BLAST 和其他比对程序的大部分工作都是基于查看局部的、无间隙的比对,并凭经验将这些理论扩展到有间隙的比对。具体来说,位分数的计算方式如下:
在上面的公式中,K 和 lambda 是替换矩阵的常数,S 是分数(替换分数和间隙分数之和),S' 是位分数。这意味着您的位分数肯定会由于改变间隙打开/间隙扩展参数而改变,这意味着您的比较无效。这是一个不幸的结果,因为关于缺口比对的理论很少,因此必须根据经验测量给定系统的最佳缺口分数。
由于位分数不具有可比性,因此我建议您根据不涉及对齐分数的备用数据集进行评估。例如,如果我对用于比较蛋白质序列的最佳缺口打开/缺口延伸参数感兴趣,我可以查看已知结构的蛋白质,并根据其进行具有结构意义的比对的能力来评估每个参数集。这避免了完全比较对齐分数,这很好,因为单独比较位分数显然没有用。
It's not clear to me why you think that comparing bit scores will give you an insight as to how well BLAST is performing. The usual method for doing
Unfortunately, much of the work on BLAST and other alignment programs is based on looking at local, ungapped alignments and empirically extending those that theory to gapped alignments. In particular, the bit scores are calculated like this:
In the formula above, K and lambda are constants for your substitution matrix, S is the score (sum of substitution and gap scores), and S' is the bit score. This means that your bit scores will certainly change as a result of varying the gap open/gap extend parameters, which means that your comparison is invalid. This is an unfortunate result of the fact that there is little theory about gapped alignments, so the optimal gap scores for a given system have to be measured empirically.
Because bit scores aren't comparable, I suggest you do your assessment based on an alternate set of data that doesn't involve the alignment scores. For example, if I'm interested in the optimal gap opening/gap extension parameters for comparing protein sequences, I can look at proteins of known structure and assess each parameter set based on its ability make alignments that make structural sense. This avoids comparing the alignment scores entirely, which is good because comparing bit scores on their own isn't obviously useful.
我不确定你能做到这一点。
您真的需要改变匹配/不匹配参数吗?你的目标是什么?
I'm not sure you can do that.
Do you really need to vary the match/mismatch parameters? What is your aim?
位分数不具有可比性并不一定是正确的。来自 NCBI 网站上的 BLAST 文档:
“位分数是标准化的,这意味着即使使用了不同的评分矩阵,也可以比较不同比对的位分数。”
http://www.ncbi.nlm.nih .gov/bookshelf/br.fcgi?book=handbook&part=ch16
It's not necessarily true that bit scores aren't comparable. From the BLAST documentation on NCBI's web site:
"Bit scores are normalized, which means that the bit scores from different alignments can be compared, even if different scoring matrices have been used."
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook&part=ch16