在源代码中编码 Blosum62
我正在尝试使用“Needleman -Wunsch”的“全局对齐”算法来实现蛋白质成对序列对齐。
我不清楚如何在源代码中包含“Blosum62 Matrix”来进行评分或填充二维矩阵?
我用谷歌搜索发现大多数人建议使用包含标准“Blosum62 Matrix”的平面文件。这是否意味着我需要从这个平面文件中读取并填充我编码的“Blosum62 Martrix”?
另外,另一种方法可能是使用一些数学公式并将其包含在您的编程逻辑中来构造“Blosum62 Matrix”。但不是非常确定这个选项。
谢谢
。
I am trying to implement protein pairwise sequence alignment using "Global Alignment" algorithm by 'Needleman -Wunsch'.
I am not clear about how to include 'Blosum62 Matrix' in my source code to do the scoring or to fill the two-dimensional matrix?
I have googled and found that most people suggested to use flat file which contains the standard 'Blosum62 Matrix'. Does it mean that I need to read from this flat file and fill my coded "Blosum62 Martrix' ?
Also, the other approach could be is to use some mathematical formula and include it in your programming logic to construct 'Blosum62 Matrix'. But not very sure about this option.
Any ideas or insights are appreciated.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
了解您使用的语言会很有帮助,这样我们就可以帮助您使用正确的术语,但我所做的是使用地图的地图(如果您使用 Python,则使用字典)。
这是我在 Groovy 中的代码示例,但它对于其他语言来说相当可移植:
使用它,您只需调用
It would help to know what language you're working in so we can help you with the correct terms, but what I did was use a map of maps (or dictionaries if you're using Python).
Here's an example of my code in Groovy, but it's fairly portable to other languages:
Using this you can just call
您可以随时从 NCBI 网站下载该矩阵:
ftp://ftp.ncbi.nih .gov/blast/matrices/BLOSUM62
其他矩阵也可以从父目录中获得。
我从未见过使用矩阵计算实现 Needleman-Wunsch 。将矩阵包含在代码中或作为单独的文件要容易得多。
您可以在此处找到一些如何计算 BLOSUM 矩阵的详细信息,例如:http://en.wikipedia.org/维基/BLOSUM。
You can always download the matrix from NCBI web site:
ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62
Other matrices are also available from the parent directory.
I never saw implementation of Needleman-Wunsch with matrix calculation. It's much easier just to include the matrix in the code or as a separate file.
You can find some details how BLOSUM matrices are calculated for example here: http://en.wikipedia.org/wiki/BLOSUM.
您无法像 PAM 矩阵那样从另一个 blosum 矩阵中推断出 blosum 矩阵:所有 blosum 都是根据不同的数据集计算出来的,并且它们之间不相关。
例如,PAM250 矩阵只是 PAM1 矩阵与其自身相乘 250 倍;但这对于 BLOSUM 来说并非如此,例如,您无法从 BLOSUM64 推断出 BLOSUM80。
You can't infer a blosum matrix from another as you can do for PAM ones: all the blosum are calculated from a different set of data and are not correlated within theirselves.
For example, a PAM250 Matrix is just a PAM1 matrix multiplied 250 times by itself; but this is not true for BLOSUMs, and you can't infer BLOSUM80 from BLOSUM64, for example.
是的,您可以将 blosum 矩阵实现为硬连线代码段,您可能会因此获得一些速度。但你肯定会失去灵活性。我建议编写一个 NCBI 格式的阅读器,例如返回 SubstitutionMatrix 数据类型。然后你可以将这样的矩阵作为对象传递。
SubstitutionMatrix 对象可以保存一个 2D 矩阵和负责解码氨基酸名称的“东西”,例如散列数组。根据您选择的语言,您还可以使用枚举来表示氨基酸类型。在这种情况下,您可以直接使用它们来寻址二维数组。
希望这是清楚的,如果您喜欢/需要,我可以写更多细节。
Yes, you can implement a blosum matrix as hardwired piece of code, you might gain some speed with this. But definitely you loose flexibility. I would recommend writing a reader for NCBI format, e.g returning SubstitutionMatrix data type. Then you can pass around such a matrix as an object.
SubstitutionMatrix object may hold a 2D matrix and "something" responsible for decoding amino acid names, e.g. a hashing array. Depending on the language you choose, you may also use enums to represent amino acid types. In such a case you can use them directly to address the 2D array.
Hopefully this is clear, I can write more details if you like/need.
以下是从链接 ftp://ftp.ncbi.nih 解析 blosum62 文件的示例。 Java 中的 gov/blast/matrices/BLOSUM62。
创建类解析:
现在你想要计算示例 A 和 * 的成本,它返回 -4,你应该为此编写方法:
最后在主方法中
这将打印 - 4.
Here is example of parsing the blosum62 file from this link ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62 in Java.
Create class Parsing:
Now you want to calculate the the cost of example A and * which returns -4 you should write method for this:
Finally in main method
This will print -4.