MATLAB 中不等长度元胞数组的 Strcmp

发布于 2024-09-07 08:08:42 字数 647 浏览 4 评论 0原文

有没有一种简单的方法可以在较大的字符串元胞数组中找到较小的字符串元胞数组?我有两个列表,一个包含唯一元素,另一个包含重复元素。我想在较大的数组中找到较小数组的特定模式的全部出现。我知道 strcmp 会比较两个元胞数组,但前提是它们的长度相等。我的第一个想法是使用循环逐步遍历较大数组的子集,但必须有更好的解决方案。

例如,在以下内容中:

smallcellarray={'string1',...
                'string2',...
                'string3'};
largecellarray={'string1',...
                'string2',...
                'string3',...
                'string1',...
                'string2',...
                'string1',...
                'string2',...
                'string3'};

index=myfunction(largecellarray,smallcellarray)

将返回

index=[1 1 1 0 0 1 1 1]

Is there an easy way to find a smaller cell array of strings within a larger one? I've got two lists, one with unique elements, and one with repeating elements. I want to find whole occurrences of the specific pattern of the smaller array within the larger. I'm aware that strcmp will compare two cell arrays, but only if they're equal in length. My first thought was to step through subsets of the larger array using a loop, but there's got to be a better solution.

For example, in the following:

smallcellarray={'string1',...
                'string2',...
                'string3'};
largecellarray={'string1',...
                'string2',...
                'string3',...
                'string1',...
                'string2',...
                'string1',...
                'string2',...
                'string3'};

index=myfunction(largecellarray,smallcellarray)

would return

index=[1 1 1 0 0 1 1 1]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

或十年 2024-09-14 08:08:42

您实际上可以使用函数 ISMEMBER 来获取largecellarray 中的单元格出现在较小数组 smallcellarray 中的索引向量,然后使用函数 STRFIND (适用于字符串数字数组)查找较小数组的起始索引在更大的范围内:

>> nSmall = numel(smallcellarray);
>> [~, matchIndex] = ismember(largecellarray,...  %# Find the index of the 
                                smallcellarray);    %#   smallcellarray entry
                                                    %#   that each entry of
                                                    %#   largecellarray matches
>> startIndices = strfind(matchIndex,1:nSmall)  %# Starting indices where the
                                                %#   vector [1 2 3] occurs in
startIndices =                                  %#   matchIndex

     1     6

然后就是从这些起始索引构建向量索引的问题。这是创建此向量的一种方法:

>> nLarge = numel(largecellarray);
>> endIndices = startIndices+nSmall;  %# Get the indices immediately after
                                      %#   where the vector [1 2 3] ends
>> index = zeros(1,nLarge);           %# Initialize index to zero
>> index(startIndices) = 1;           %# Mark the start index with a 1
>> index(endIndices) = -1;            %# Mark one index after the end with a -1
>> index = cumsum(index(1:nLarge))    %# Take the cumulative sum, removing any
                                      %#   extra entry in index that may occur
index =

     1     1     1     0     0     1     1     1

使用函数 BSXFUN阿姆罗。创建它的另一种方法是:

index = cumsum([startIndices; ones(nSmall-1,numel(startIndices))]);
index = ismember(1:numel(largecellarray),index);

You could actually use the function ISMEMBER to get an index vector for where the cells in largecellarray occur in the smaller array smallcellarray, then use the function STRFIND (which works for both strings and numeric arrays) to find the starting indices of the smaller array within the larger:

>> nSmall = numel(smallcellarray);
>> [~, matchIndex] = ismember(largecellarray,...  %# Find the index of the 
                                smallcellarray);    %#   smallcellarray entry
                                                    %#   that each entry of
                                                    %#   largecellarray matches
>> startIndices = strfind(matchIndex,1:nSmall)  %# Starting indices where the
                                                %#   vector [1 2 3] occurs in
startIndices =                                  %#   matchIndex

     1     6

Then it's a matter of building the vector index from these starting indices. Here's one way you could create this vector:

>> nLarge = numel(largecellarray);
>> endIndices = startIndices+nSmall;  %# Get the indices immediately after
                                      %#   where the vector [1 2 3] ends
>> index = zeros(1,nLarge);           %# Initialize index to zero
>> index(startIndices) = 1;           %# Mark the start index with a 1
>> index(endIndices) = -1;            %# Mark one index after the end with a -1
>> index = cumsum(index(1:nLarge))    %# Take the cumulative sum, removing any
                                      %#   extra entry in index that may occur
index =

     1     1     1     0     0     1     1     1

Another way to create it using the function BSXFUN is given by Amro. Yet another way to create it is:

index = cumsum([startIndices; ones(nSmall-1,numel(startIndices))]);
index = ismember(1:numel(largecellarray),index);
肥爪爪 2024-09-14 08:08:42

这是我的版本(基于 @yuk 和 @gnovice 的答案):

g = grp2idx([S L])';
idx = strfind(g(numel(S)+1:end),g(1:numel(S)));
idx = bsxfun(@plus,idx',0:numel(S)-1);

index = zeros(size(L));
index(idx(:)) = 1;

Here's my version (based on the answers of both @yuk and @gnovice):

g = grp2idx([S L])';
idx = strfind(g(numel(S)+1:end),g(1:numel(S)));
idx = bsxfun(@plus,idx',0:numel(S)-1);

index = zeros(size(L));
index(idx(:)) = 1;
ま柒月 2024-09-14 08:08:42

在 @gnovice 回答中,第一部分可以是

l = grp2idx(largecellarray)';
s = grp2idx(smallcellarray)';
startIndices = strfind(l,s);

In @gnovice answer the first part can be

l = grp2idx(largecellarray)';
s = grp2idx(smallcellarray)';
startIndices = strfind(l,s);
请远离我 2024-09-14 08:08:42

我得到了以下解决方案,但我仍然想知道是否有更好的方法来做到这一点:(

function [output]=cellstrcmpi(largecell,smallcell)
output=zeros(size(largecell));
idx=1;
while idx<=length(largecell)-length(smallcell)+1
    if sum(strcmpi(largecell(idx:idx+length(smallcell)-1),smallcell))==length(smallcell)
       output(idx:idx+length(smallcell)-1)=1;
       idx=idx+length(smallcell);       
    else
        idx=idx+1;
    end
end

我知道,我知道,没有错误检查 - 我是一个可怕的人。)

I got the following solution working, but I'm still wondering if there's a better way to do this:

function [output]=cellstrcmpi(largecell,smallcell)
output=zeros(size(largecell));
idx=1;
while idx<=length(largecell)-length(smallcell)+1
    if sum(strcmpi(largecell(idx:idx+length(smallcell)-1),smallcell))==length(smallcell)
       output(idx:idx+length(smallcell)-1)=1;
       idx=idx+length(smallcell);       
    else
        idx=idx+1;
    end
end

(I know, I know, no error checking - I'm a horrible person.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文