寻找序列中的零岛

发布于 2024-09-10 15:55:36 字数 1360 浏览 5 评论 0 原文

想象一下你有一个很长的序列。查找序列全为零的间隔(或更准确地说,序列下降到接近零值 abs(X))的最有效方法是什么:

为简单起见,我们假设以下序列:

sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];

我试图获取以下信息:

startIndex   EndIndex    Duration
3            6           4
12           12          1
14           16          3
25           26          2
30           30          1

然后使用此信息,我们找到持续时间 >= 某个指定值(例如 3)的间隔,并返回值的索引在所有这些间隔的总和中:

indices = [3 4 5 6 14 15 16];

最后一部分与之前的问题相关:

MATLAB:矢量化数组创建 来自开始/结束索引列表

这是我到目前为止所拥有的:

sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
len = length(sig);
thresh = 3;

%# align the signal with itself successively shifted by one
%# v will thus contain 1 in the starting locations of the zero interval
v = true(1,len-thresh+1);
for i=1:thresh
    v = v & ( sig(i:len-thresh+i) == 0 );
end

%# extend the 1's till the end of the intervals
for i=1:thresh-1
    v(find(v)+1) = true;
end

%# get the final indices
v = find(v);

我正在寻求矢量化/优化代码,但我对其他解决方案持开放态度。 我必须强调,空间和时间效率非常重要,因为我正在处理大量的长生物信号。

Imagine you have a very long sequence. What is the most efficient way of finding the intervals where the sequence is all zeros (or more precisely the sequence drops to near-zero values abs(X)<eps):

For simplicity, lets assume the following sequence:

sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];

I'm trying to get the following information:

startIndex   EndIndex    Duration
3            6           4
12           12          1
14           16          3
25           26          2
30           30          1

then using this information, we find the intervals with duration >= to some specified value (say 3), and returning the indices of the values in all these intervals combined:

indices = [3 4 5 6 14 15 16];

That last part is related to a previous question:

MATLAB: vectorized array creation
from a list of start/end indices

This is what I have so far:

sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
len = length(sig);
thresh = 3;

%# align the signal with itself successively shifted by one
%# v will thus contain 1 in the starting locations of the zero interval
v = true(1,len-thresh+1);
for i=1:thresh
    v = v & ( sig(i:len-thresh+i) == 0 );
end

%# extend the 1's till the end of the intervals
for i=1:thresh-1
    v(find(v)+1) = true;
end

%# get the final indices
v = find(v);

I'm looking to vectorize/optimize the code, but I'm open to other solutions.
I have to stress that space and time efficiencies are very important, since I'm processing a large number of long bio-signals.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

安穩 2024-09-17 15:55:36

以下是我以向量化方式解决您的问题所采取的步骤,从给定的向量 sig 开始:

These are the steps I would take to solve your problem in a vectorized way, starting with a given vector sig:

  • First, threshold the vector to get a vector tsig of zeros and ones (zeroes where the absolute value of the signal drops close enough to zero, ones elsewhere):

    tsig = (abs(sig) >= eps);  %# Using eps as the threshold
    
  • Next, find the starting indices, ending indices, and duration of each string of zeroes using the functions DIFF and FIND:

    dsig = diff([1 tsig 1]);
    startIndex = find(dsig < 0);
    endIndex = find(dsig > 0)-1;
    duration = endIndex-startIndex+1;
    
  • Then, find the strings of zeroes with a duration greater than or equal to some value (such as 3, from your example):

    stringIndex = (duration >= 3);
    startIndex = startIndex(stringIndex);
    endIndex = endIndex(stringIndex);
    
  • Finally, use the method from my answer to the linked question to generate your final set of indices:

    indices = zeros(1,max(endIndex)+1);
    indices(startIndex) = 1;
    indices(endIndex+1) = indices(endIndex+1)-1;
    indices = find(cumsum(indices));
    
冬天的雪花 2024-09-17 15:55:36

您可以通过查找长度为 thresh 的零字符串来解决此问题作为字符串搜索任务(STRFIND 函数非常快)

startIndex = strfind(sig, zeros(1,thresh));

请注意,较长的子字符串将在多个位置进行标记,但一旦我们添加,最终将被连接从 startIndex 开始到 start+thresh-1 结束的间隔的中间位置。

indices = unique( bsxfun(@plus, startIndex', 0:thresh-1) )';

请注意,您始终可以将最后一步与 @gnovice 的 CUMSUM/FIND 解决方案进行交换,来自 链接问题

You can solve this as a string search task, by finding strings of zeros of length thresh (STRFIND function is very fast)

startIndex = strfind(sig, zeros(1,thresh));

Note that longer substrings will get marked in multiple locations but will eventually be joined once we add in-between locations from intervals start at startIndex to end at start+thresh-1.

indices = unique( bsxfun(@plus, startIndex', 0:thresh-1) )';

Note that you can always swap this last step with the CUMSUM/FIND solution by @gnovice from the linked question.

冧九 2024-09-17 15:55:36
function indice=sigvec(sig,thresh)
    %extend sig head and tail to avoid 0 head and 0 tail

    exsig=[1,sig,1];
    %convolution sig with extend sig
    cvexsig=conv(exsig,ones(1,thresh));
    tempsig=double(cvexsig==0);

    indice=find(conv(tempsig,ones(1,thresh)))-thresh;
function indice=sigvec(sig,thresh)
    %extend sig head and tail to avoid 0 head and 0 tail

    exsig=[1,sig,1];
    %convolution sig with extend sig
    cvexsig=conv(exsig,ones(1,thresh));
    tempsig=double(cvexsig==0);

    indice=find(conv(tempsig,ones(1,thresh)))-thresh;
爱你是孤单的心事 2024-09-17 15:55:36

genovice 的上述答案可以修改为查找向量中非零元素的索引,如下所示:

    tsig = (abs(sig) >= eps);
    dsig = diff([0 tsig 0]);
    startIndex = find(dsig > 0);
    endIndex = find(dsig < 0)-1;
    duration = endIndex-startIndex+1;

the above answer by genovice can be modified to find the indices of non-zero elements in a vector as:

    tsig = (abs(sig) >= eps);
    dsig = diff([0 tsig 0]);
    startIndex = find(dsig > 0);
    endIndex = find(dsig < 0)-1;
    duration = endIndex-startIndex+1;
挽你眉间 2024-09-17 15:55:36

我认为最 MATLAB/“矢量化”的方法是使用 [-1 1] 等滤波器计算信号的卷积。您应该查看函数 conv 的文档。然后在 conv 的输出上使用 find 来获取相关索引。

I think the most MATLAB/"vectorized" way of doing it is by computing a convolution of your signal with a filter like [-1 1]. You should look at the documentation of the function conv. Then on the output of conv use find to get the relevant indexes.

千纸鹤带着心事 2024-09-17 15:55:36

正如 gnovice 所示,我们将进行阈值测试,使“接近零”真正为零:

logcl = abs(sig(:)) >= zero_tolerance;

然后找到累积和不增加的区域:

cs = cumsum(logcl);
islands = cs(1+thresh:end) == cs(1:end-thresh);

记住 gnovice 填写索引范围的好方法

v = Zeros(1,max(endInd)+1); %# 一个零数组
v(起始位置) = 1; %# 将 1 放在间隔的开头
v(endInd+1) = v(endInd+1)-1; %# 在间隔结束后添加 -1 一个索引
索引 = find(cumsum(v)); %# 执行累积和并找到非零条目

我们注意到,我们的 islands 向量已经在 startInd 位置中包含了一个,并且出于我们的目的,endInd 始终是 thresh稍后(较长的运行有岛屿中的运行)

endcap = zeros(thresh,1);
indices = find(cumsum([islands ; endcap] - [endcap ; islands]))

测试

sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
logcl = abs(sig(:)) >= .1;
cs = cumsum(logcl);
islands = cs(1+thresh:end) == cs(1:end-thresh);
endcap = zeros(thresh,1);
indices = find(cumsum([islands ; endcap] - [endcap ; islands]))
索引 =

     2
     3
     4
     5
    13
    14
    15

As gnovice showed, we'll do a threshold test to make "near zero" really zero:

logcl = abs(sig(:)) >= zero_tolerance;

Then find regions where the cumulative sum isn't increasing:

cs = cumsum(logcl);
islands = cs(1+thresh:end) == cs(1:end-thresh);

Remembering gnovice's great method for filling in ranges of indexes

v = zeros(1,max(endInd)+1);   %# An array of zeroes
v(startInd) = 1;              %# Place 1 at the starts of the intervals
v(endInd+1) = v(endInd+1)-1;  %# Add -1 one index after the ends of the intervals
indices = find(cumsum(v));  %# Perform a cumulative sum and find the nonzero entries

We note that our islands vector already has ones in the startInd locations, and for our purposes endInd always comes thresh spots later (longer runs have runs of ones in islands)

endcap = zeros(thresh,1);
indices = find(cumsum([islands ; endcap] - [endcap ; islands]))

Test

sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
logcl = abs(sig(:)) >= .1;
cs = cumsum(logcl);
islands = cs(1+thresh:end) == cs(1:end-thresh);
endcap = zeros(thresh,1);
indices = find(cumsum([islands ; endcap] - [endcap ; islands]))
indices =

     2
     3
     4
     5
    13
    14
    15
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文