想象一下你有一个很长的序列。查找序列全为零的间隔(或更准确地说,序列下降到接近零值 abs(X))的最有效方法是什么:
为简单起见,我们假设以下序列:
sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
我试图获取以下信息:
startIndex EndIndex Duration
3 6 4
12 12 1
14 16 3
25 26 2
30 30 1
然后使用此信息,我们找到持续时间 >= 某个指定值(例如 3
)的间隔,并返回值的索引在所有这些间隔的总和中:
indices = [3 4 5 6 14 15 16];
最后一部分与之前的问题相关:
MATLAB:矢量化数组创建
来自开始/结束索引列表
这是我到目前为止所拥有的:
sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
len = length(sig);
thresh = 3;
%# align the signal with itself successively shifted by one
%# v will thus contain 1 in the starting locations of the zero interval
v = true(1,len-thresh+1);
for i=1:thresh
v = v & ( sig(i:len-thresh+i) == 0 );
end
%# extend the 1's till the end of the intervals
for i=1:thresh-1
v(find(v)+1) = true;
end
%# get the final indices
v = find(v);
我正在寻求矢量化/优化代码,但我对其他解决方案持开放态度。
我必须强调,空间和时间效率非常重要,因为我正在处理大量的长生物信号。
Imagine you have a very long sequence. What is the most efficient way of finding the intervals where the sequence is all zeros (or more precisely the sequence drops to near-zero values abs(X)<eps
):
For simplicity, lets assume the following sequence:
sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
I'm trying to get the following information:
startIndex EndIndex Duration
3 6 4
12 12 1
14 16 3
25 26 2
30 30 1
then using this information, we find the intervals with duration >= to some specified value (say 3
), and returning the indices of the values in all these intervals combined:
indices = [3 4 5 6 14 15 16];
That last part is related to a previous question:
MATLAB: vectorized array creation
from a list of start/end indices
This is what I have so far:
sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
len = length(sig);
thresh = 3;
%# align the signal with itself successively shifted by one
%# v will thus contain 1 in the starting locations of the zero interval
v = true(1,len-thresh+1);
for i=1:thresh
v = v & ( sig(i:len-thresh+i) == 0 );
end
%# extend the 1's till the end of the intervals
for i=1:thresh-1
v(find(v)+1) = true;
end
%# get the final indices
v = find(v);
I'm looking to vectorize/optimize the code, but I'm open to other solutions.
I have to stress that space and time efficiencies are very important, since I'm processing a large number of long bio-signals.
发布评论
评论(6)
以下是我以向量化方式解决您的问题所采取的步骤,从给定的向量 sig 开始:
接下来,使用函数 差异 和查找:
然后,查找持续时间大于或等于某个值(例如 3,在您的示例中)的零字符串:
最后,使用 我对链接问题的回答生成的方法您的最终索引集:
These are the steps I would take to solve your problem in a vectorized way, starting with a given vector
sig
:First, threshold the vector to get a vector
tsig
of zeros and ones (zeroes where the absolute value of the signal drops close enough to zero, ones elsewhere):Next, find the starting indices, ending indices, and duration of each string of zeroes using the functions DIFF and FIND:
Then, find the strings of zeroes with a duration greater than or equal to some value (such as 3, from your example):
Finally, use the method from my answer to the linked question to generate your final set of indices:
您可以通过查找长度为 thresh 的零字符串来解决此问题作为字符串搜索任务(STRFIND 函数非常快)
请注意,较长的子字符串将在多个位置进行标记,但一旦我们添加,最终将被连接从
startIndex
开始到start+thresh-1
结束的间隔的中间位置。请注意,您始终可以将最后一步与 @gnovice 的 CUMSUM/FIND 解决方案进行交换,来自 链接问题。
You can solve this as a string search task, by finding strings of zeros of length
thresh
(STRFIND function is very fast)Note that longer substrings will get marked in multiple locations but will eventually be joined once we add in-between locations from intervals start at
startIndex
to end atstart+thresh-1
.Note that you can always swap this last step with the CUMSUM/FIND solution by @gnovice from the linked question.
genovice 的上述答案可以修改为查找向量中非零元素的索引,如下所示:
the above answer by genovice can be modified to find the indices of non-zero elements in a vector as:
我认为最 MATLAB/“矢量化”的方法是使用 [-1 1] 等滤波器计算信号的卷积。您应该查看函数 conv 的文档。然后在 conv 的输出上使用 find 来获取相关索引。
I think the most MATLAB/"vectorized" way of doing it is by computing a convolution of your signal with a filter like [-1 1]. You should look at the documentation of the function conv. Then on the output of conv use find to get the relevant indexes.
正如 gnovice 所示,我们将进行阈值测试,使“接近零”真正为零:
然后找到累积和不增加的区域:
记住 gnovice 填写索引范围的好方法
我们注意到,我们的
islands
向量已经在startInd
位置中包含了一个,并且出于我们的目的,endInd
始终是thresh
稍后(较长的运行有岛屿
中的运行)测试
As gnovice showed, we'll do a threshold test to make "near zero" really zero:
Then find regions where the cumulative sum isn't increasing:
Remembering gnovice's great method for filling in ranges of indexes
We note that our
islands
vector already has ones in thestartInd
locations, and for our purposesendInd
always comesthresh
spots later (longer runs have runs of ones inislands
)Test