Biopython 中带间隙对齐的 PWM

发布于 2024-11-29 02:19:12 字数 380 浏览 3 评论 0原文

我正在尝试通过 Clustalw 多序列比对在 Biopython 中生成位置加权矩阵(PWM)。每次我使用间隙对齐时都会收到“字母表错误”错误。通过阅读文档,我认为我需要利用间隙字母表来处理间隙对齐中的“-”字符。但当我这样做时,它仍然无法解决错误。有谁看到这段代码的问题,或者有更好的方法从有间隙的 Clustal 对齐生成 PWM?

from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
    m.add_instance(a.seq)
m.pwm()

I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?

from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
    m.add_instance(a.seq)
m.pwm()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

赠我空喜 2024-12-06 02:19:13

那么您想使用 clustal 来进行这些间隙对齐吗?我用的是Perl,我看你用的是Python,但是逻辑基本是一样的。我使用对 clustal 可执行文件的系统调用,而不是使用 BioPerl/Biopython。我相信 clustalw2 可执行文件可以处理间隙对齐,而无需调用字母表。不是百分百确定,但这是我使用的对我有用的脚本。创建一个目录,其中包含所有对齐文件(我使用 .fasta,但您可以更改系统调用上的标志以接受其他文件)。这是我的 Perl 脚本,您必须修改最后一行中的可执行路径以匹配 clustal 在计算机上的位置。希望这会有所帮助。作为旁注,这对于非常快速地进行许多对齐非常有用,这就是我使用它的目的,但如果您只想对齐几个文件,可能需要跳过创建目录的整个过程并修改代码以接受文件路径而不是目录路径。

#!/usr/bin/perl 


use warnings;

print "Please type the list file name of protein fasta files to align (end the directory    path with a / or this will fail!): ";
$directory = <STDIN>;
chomp $directory;

opendir (DIR,$directory) or die $!;

my @file = readdir DIR;
closedir DIR;

my $add="_align.fasta";

foreach $file (@file) {
     my $infile = "$directory$file";
 (my $fileprefix = $infile) =~ s/\.[^.]+$//;
 my $outfile="$fileprefix$add";
 system "/Users/Wes/Desktop/eggNOG_files/clustalw-2.1-macosx/clustalw2 -INFILE=$infile -OUTFILE=$outfile -OUTPUT=FASTA -tree";
}

干杯,
韦斯

So you want to use clustal to make these gapped alignments? I use Perl, I see you are using Python, but the logic is basically the same. I use a system call to the clustal executable instead of using BioPerl/Biopython. I believe the clustalw2 executable handles gapped alignments without the need to call an alphabet. Not 100 percent sure, but this is a script I use that works for me. Create a directory with all of your aligments files in it (I use .fasta but you can change the flags on the system call to accept others). This is my Perl script, you must modify the executable path in the last line to match clustal's location on your computer. Hope this helps a bit. As a side note, this is good for making many alignments very quickly, which is what I use it for but if you are only looking to align a few files, might want to skip the whole creating a directory and modify the code to accept a filepath and not a dirpath.

#!/usr/bin/perl 


use warnings;

print "Please type the list file name of protein fasta files to align (end the directory    path with a / or this will fail!): ";
$directory = <STDIN>;
chomp $directory;

opendir (DIR,$directory) or die $!;

my @file = readdir DIR;
closedir DIR;

my $add="_align.fasta";

foreach $file (@file) {
     my $infile = "$directory$file";
 (my $fileprefix = $infile) =~ s/\.[^.]+$//;
 my $outfile="$fileprefix$add";
 system "/Users/Wes/Desktop/eggNOG_files/clustalw-2.1-macosx/clustalw2 -INFILE=$infile -OUTFILE=$outfile -OUTPUT=FASTA -tree";
}

Cheers,
Wes

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文