Matlab:具有统一绘图列高度的 seqlogo

发布于 2024-10-19 08:03:52 字数 422 浏览 10 评论 0原文

在Matlab中,我想制作一个seqlogo图氨基酸序列概况。但我不希望通过熵来缩放绘图列的高度,而是希望所有列具有相同的高度。

我正在修改 答案中的代码这个问题,但我想知道是否有 seqlogo 的参数或我错过的其他一些函数可以使列高度统一。

或者,是否可以对序列配置文件应用统计转换来破解所需的输出? (列高统一,每个字母的高度与 它在 seqprofile 中的概率)

In Matlab, I want to make a seqlogo plot of an amino acid sequence profile. But instead of scaling the heights of the plot columns by entropy, I want all the columns to be the same height.

I'm in the process of modifying the code from the answers to this question, but I wonder if there is a parameter to seqlogo or some other function that I have missed that will make the column heights uniform.

Alternatively, is there a statistical transformation I can apply to the sequence profile to hack the desired output? (column heights uniform, height of each letter linearly proportion to
its probability in the seqprofile)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

南汐寒笙箫 2024-10-26 08:03:52

解决此问题的最简单方法可能是直接修改 生物信息学的代码工具箱函数 SEQLOGO(如果可能)。在 R2010b 中,您可以执行以下操作:

edit seqlogo

该函数的代码将显示在编辑器中。接下来,找到以下行(第 267-284 行),然后将其注释掉或完全删除:

S_before = log2(nSymbols);
freqM(freqM == 0) = 1; % log2(1) = 0

% The uncertainty after the input at each position
S_after = -sum(log2(freqM).*freqM, 1);

if corrError
    % The number of sequences correction factor
    e_corr = (nSymbols -1)/(2* log(2) * numSeq);
    R = S_before - (S_after + e_corr);
else
    R = S_before - S_after;
end

nPos = (endPos - startPos) + 1;
for i =1:nPos
    wtM(:, i) = wtM(:, i) * R(i);
end

然后将此行放在其位置:

wtM = bsxfun(@times,wtM,log2(nSymbols)./sum(wtM));

您可能希望以新名称保存文件,例如 seqlogo_norm.m ,因此您仍然可以使用原始未修改的SEQLOGO 函数。现在,您可以创建序列剖面图,其中所有列均标准化为相同高度。例如:

S = {'LSGGQRQRVAIARALAL',...      %# Sample amino acid sequence
     'LSGGEKQRVAIARALMN',...
     'LSGGQIQRVLLARALAA',...
     'LSGGERRRLEIACVLAL',...
     'FSGGEKKKNELWQMLAL',...
     'LSGGERRRLEIACVLAL'};
seqlogo_norm(S,'alphabet','aa');  %# Use the modified SEQLOGO function

标准化序列配置文件

旧答案:

我不是确定如何转换序列配置文件信息以从 生物信息学获得所需的输出工具箱函数 SEQLOGO,但我可以向您展示如何修改我为 我对您链接到的相关问题的回答。如果将初始化 bitValues 的行从以下位置更改为

bitValues = W{2};

bitValues = bsxfun(@rdivide,W{2},sum(W{2}));

则应将每列的高度缩放为 1。例如:

S = {'ATTATAGCAAACTA',...  %# Sample sequence
     'AACATGCCAAAGTA',...
     'ATCATGCAAAAGGA'};
seqlogo_new(S);            %# After applying the above modification

标准化序列概况

Probably the easiest way around this problem is to directly modify the code for the Bioinformatics Toolbox function SEQLOGO (if possible). In R2010b, you can do:

edit seqlogo

And the code for the function will be shown in the editor. Next, find the following lines (lines 267-284) and either comment them out or remove them entirely:

S_before = log2(nSymbols);
freqM(freqM == 0) = 1; % log2(1) = 0

% The uncertainty after the input at each position
S_after = -sum(log2(freqM).*freqM, 1);

if corrError
    % The number of sequences correction factor
    e_corr = (nSymbols -1)/(2* log(2) * numSeq);
    R = S_before - (S_after + e_corr);
else
    R = S_before - S_after;
end

nPos = (endPos - startPos) + 1;
for i =1:nPos
    wtM(:, i) = wtM(:, i) * R(i);
end

Then put this line in their place:

wtM = bsxfun(@times,wtM,log2(nSymbols)./sum(wtM));

You will probably want to save the file under a new name, like seqlogo_norm.m, so you can still use the original unmodified SEQLOGO function. Now you can create sequence profile plots with all the columns normalized to the same height. For example:

S = {'LSGGQRQRVAIARALAL',...      %# Sample amino acid sequence
     'LSGGEKQRVAIARALMN',...
     'LSGGQIQRVLLARALAA',...
     'LSGGERRRLEIACVLAL',...
     'FSGGEKKKNELWQMLAL',...
     'LSGGERRRLEIACVLAL'};
seqlogo_norm(S,'alphabet','aa');  %# Use the modified SEQLOGO function

normalized sequence profile

OLD ANSWER:

I'm not sure how to transform the sequence profile information to get the desired output from the Bioinformatics Toolbox function SEQLOGO, but I can show you how to modify the alternative seqlogo_new.m that I wrote for my answer to the related question you linked to. If you change the line that initializes bitValues from this:

bitValues = W{2};

to this:

bitValues = bsxfun(@rdivide,W{2},sum(W{2}));

Then you should get each column scaled to a height of 1. For example:

S = {'ATTATAGCAAACTA',...  %# Sample sequence
     'AACATGCCAAAGTA',...
     'ATCATGCAAAAGGA'};
seqlogo_new(S);            %# After applying the above modification

normalized sequence profile

你对谁都笑 2024-10-26 08:03:52

目前,我的解决方法是生成一堆与序列配置文件匹配的假序列,然后将这些序列提供给 http ://weblogo.berkeley.edu/logo.cgi。这是制作假序列的代码:

function flatFakeSeqsFromPwm(pwm, letterOrder, nSeqsToGen, outFilename)
%translates a pwm into a bunch of fake seqs with the same probabilities
%for use with http://weblogo.berkeley.edu/

%pwm should be a 4xn or a 20xn position weight matrix. Each col must sum to 1
%letterOrder = e.g. 'ARNDCQEGHILKMFPSTWYV' for my data
%nSeqsToGen should be >= the # of pixels tall you plan to make your chart

[height windowWidth] = size(pwm);
assert(height == length(letterOrder));
assert(isequal(abs(1-sum(pwm)) < 1.0e-10, ones(1, windowWidth))); %assert all cols of pwm sum to 1.0

fd = fopen(outFilename, 'w');

for i = 0:nSeqsToGen-1
    for seqPos = 1:windowWidth
        acc = 0; %accumulator
        idx = 0;
        while i/nSeqsToGen >= acc
            idx = idx + 1;
            acc = acc + pwm(idx, seqPos);
        end
        fprintf(fd, '%s', letterOrder(idx));
    end
    fprintf(fd, '\n');
end

fclose(fd);
end

For now, my workaround is to generate a bunch of fake sequences that match the sequence profile, then feed those sequences to http://weblogo.berkeley.edu/logo.cgi . Here is the code to make the fake sequences:

function flatFakeSeqsFromPwm(pwm, letterOrder, nSeqsToGen, outFilename)
%translates a pwm into a bunch of fake seqs with the same probabilities
%for use with http://weblogo.berkeley.edu/

%pwm should be a 4xn or a 20xn position weight matrix. Each col must sum to 1
%letterOrder = e.g. 'ARNDCQEGHILKMFPSTWYV' for my data
%nSeqsToGen should be >= the # of pixels tall you plan to make your chart

[height windowWidth] = size(pwm);
assert(height == length(letterOrder));
assert(isequal(abs(1-sum(pwm)) < 1.0e-10, ones(1, windowWidth))); %assert all cols of pwm sum to 1.0

fd = fopen(outFilename, 'w');

for i = 0:nSeqsToGen-1
    for seqPos = 1:windowWidth
        acc = 0; %accumulator
        idx = 0;
        while i/nSeqsToGen >= acc
            idx = idx + 1;
            acc = acc + pwm(idx, seqPos);
        end
        fprintf(fd, '%s', letterOrder(idx));
    end
    fprintf(fd, '\n');
end

fclose(fd);
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文