当前位置：文江博客话题详情

如何在 Octave 中对（字符串）元胞数组中的元素进行排序和有效查找？

发布于 2024-12-17 08:56:02 字数 15 浏览 6 评论 0原文

有内置的功能吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一个人的夜不怕黑 2024-12-24 08:56:02

GNU Octave 以线性时间 `O(n)` 搜索字符串元胞数组：

（此答案中的 15 年前的代码已在 GNU Octave 3.8.2 上进行了测试并正确无误） code>、5.2.0 和 7.1.0)

另一个答案有 cellidx ，它已贬值八度，但它仍然运行但他们说用ismember 相反，如下所示：

%linear time string index search.
a = ["hello"; "unsorted"; "world"; "moobar"]
b = cellstr(a)
%b =
%{
%  [1,1] = hello
%  [2,1] = unsorted
%  [3,1] = world
%  [4,1] = moobar
%}
find(ismember(b, 'world'))   %returns 3

ismember 在索引槽 3 中找到“world”。这是一个昂贵的线性时间O(n)操作因为它必须迭代所有元素，无论是否找到。

要实现对数时间 O(log n) 解决方案，那么您的列表需要预先排序，然后您可以使用二分搜索：

如果您的元胞数组已经排序，您可以执行 `O(log-n)` 最坏的情况：

function i = binsearch(array, val, low, high)
  %binary search algorithm for numerics, Usage:
  %myarray = [ 30, 40, 50.15 ];        %already sorted list
  %binsearch(myarray, 30,    1, 3)     %item 30 is in slot 1
  if ( high < low )
    i = 0;
  else
    mid = floor((low + high) / 2);
    if ( array(mid) > val )
      i = binsearch(array, val, low, mid-1);
    elseif ( array(mid) < val )
      i = binsearch(array, val, mid+1, high);
    else
      i = mid;
    endif
  endif
endfunction

function i = binsearch_str(array, val, low, high)
  % binary search for strings, usage:
  %myarray2 = [ "abc"; "def"; "ghi"];       #already sorted list
  %binsearch_str(myarray2, "abc", 1, 3)     #item abc is in slot 1
  if ( high < low )
    i = 0;
  else
    mid = floor((low + high) / 2);
    if ( mystrcmp(array(mid, [1:end]), val) == 1 )
      i = binsearch(array, val, low, mid-1);
    elseif ( mystrcmp(array(mid, [1:end]), val) == -1 )
      i = binsearch_str(array, val, mid+1, high);
    else
      i = mid;
    endif
  endif
endfunction

function ret = mystrcmp(a, b)
    %this function is just an octave string compare, its behavior follows the
    %strcmp(str1,str2)'s in C and java.lang.String.compareTo(...)'s in Java,
    %that is:
    %  -returns 1 if string a > b
    %  -returns 0 if string a == b
    %  -return -1 if string a < b

    % The gt() operator does not support cell array.  If the single word
    % is passed as an one-element cell array, converts it to a string.
    a_as_string = a;
    if iscellstr( a )
       a_as_string = a{1}; %a was passed as a single-element cell array.
    endif

    % The gt() operator does not support cell array.  If the single word
    % is passed as an one-element cell array, converts it to a string.
    b_as_string = b;
    if iscellstr( b )
       b_as_string = b{1}; %b was passed as a single-element cell array.
    endif

    % Space-pad the shortest word so as they can be used with gt() and lt() operators.
    if length(a_as_string) > length( b_as_string )
       b_as_string( length( b_as_string ) + 1 : length( a_as_string ) ) = " ";
    elseif length(a_as_string) < length( b_as_string )
       a_as_string( length( a_as_string ) + 1 : length( b_as_string ) ) = " ";
    endif

    letters_gt = gt(a_as_string, b_as_string);      %list of boolean a > b
    letters_lt = lt(a_as_string, b_as_string);      %list of boolean a < b
    ret = 0;
    %octave makes us roll our own string compare because
    %strings are arrays of numerics
    len = length(letters_gt);
    for i = 1:len
        if letters_gt(i) > letters_lt(i)
            ret = 1;
            return
        elseif letters_gt(i) < letters_lt(i)
            ret = -1;
            return
        endif
    end;
endfunction

%Assuming that myarray is already sorted, (it must be for binary 
%search to finish in logarithmic time `O(log-n))` worst case, then do

myarray = [ 30, 40, 50.15 ];        %already sorted list
binsearch(myarray, 30,    1, 3)     %item 30 is in slot 1
binsearch(myarray, 40,    1, 3)     %item 40 is in slot 2
binsearch(myarray, 50,    1, 3)     %50 does not exist so return 0
binsearch(myarray, 50.15, 1, 3)     %50.15 is in slot 3

%same but for strings:
myarray2 = [ "abc"; "def"; "ghi"];       %already sorted list
binsearch_str(myarray2, "abc", 1, 3)     %item abc is in slot 1
binsearch_str(myarray2, "def", 1, 3)     %item def is in slot 2
binsearch_str(myarray2, "zzz", 1, 3)     %zzz does not exist so return 0
binsearch_str(myarray2, "ghi", 1, 3)     %item ghi is in slot 3

如果尚未对数组进行排序：

排序的复杂性取决于数据的类型你有什么GNU Octave 语言编写者选择的排序算法，它介于 O(n*log(n)) 和 O(n*n) 之间。

myarray = [ 9, 40, -3, 3.14, 20 ];        %not sorted list 
myarray = sort(myarray) 

myarray2 = [ "the"; "cat"; "sat"; "on"; "the"; "mat"];       %not sorted list 
myarray2 = sortrows(myarray2)

使其向后兼容 GNU Octave 3. 5. 和 7. 的代码爱好者请参阅此处另一个答案中的@Paulo Carvalho。

GNU Octave search a cell array of strings in linear time `O(n)`:

(The 15 year old code in this answer was tested and correct on GNU Octave 3.8.2, 5.2.0 and 7.1.0)

The other answer has cellidx which was depreciated by octave, it still runs but they say to use ismember instead, like this:

%linear time string index search.
a = ["hello"; "unsorted"; "world"; "moobar"]
b = cellstr(a)
%b =
%{
%  [1,1] = hello
%  [2,1] = unsorted
%  [3,1] = world
%  [4,1] = moobar
%}
find(ismember(b, 'world'))   %returns 3

ismember finds 'world' in index slot 3. This is a expensive linear time O(n) operation because it has to iterate through all elements whether or not it is found.

To achieve a logarathmic time O(log n) solution, then your list needs to come pre-sorted and then you can use binary search:

If your cell array is already sorted, you can do `O(log-n)` worst case:

function i = binsearch(array, val, low, high)
  %binary search algorithm for numerics, Usage:
  %myarray = [ 30, 40, 50.15 ];        %already sorted list
  %binsearch(myarray, 30,    1, 3)     %item 30 is in slot 1
  if ( high < low )
    i = 0;
  else
    mid = floor((low + high) / 2);
    if ( array(mid) > val )
      i = binsearch(array, val, low, mid-1);
    elseif ( array(mid) < val )
      i = binsearch(array, val, mid+1, high);
    else
      i = mid;
    endif
  endif
endfunction

function i = binsearch_str(array, val, low, high)
  % binary search for strings, usage:
  %myarray2 = [ "abc"; "def"; "ghi"];       #already sorted list
  %binsearch_str(myarray2, "abc", 1, 3)     #item abc is in slot 1
  if ( high < low )
    i = 0;
  else
    mid = floor((low + high) / 2);
    if ( mystrcmp(array(mid, [1:end]), val) == 1 )
      i = binsearch(array, val, low, mid-1);
    elseif ( mystrcmp(array(mid, [1:end]), val) == -1 )
      i = binsearch_str(array, val, mid+1, high);
    else
      i = mid;
    endif
  endif
endfunction

function ret = mystrcmp(a, b)
    %this function is just an octave string compare, its behavior follows the
    %strcmp(str1,str2)'s in C and java.lang.String.compareTo(...)'s in Java,
    %that is:
    %  -returns 1 if string a > b
    %  -returns 0 if string a == b
    %  -return -1 if string a < b

    % The gt() operator does not support cell array.  If the single word
    % is passed as an one-element cell array, converts it to a string.
    a_as_string = a;
    if iscellstr( a )
       a_as_string = a{1}; %a was passed as a single-element cell array.
    endif

    % The gt() operator does not support cell array.  If the single word
    % is passed as an one-element cell array, converts it to a string.
    b_as_string = b;
    if iscellstr( b )
       b_as_string = b{1}; %b was passed as a single-element cell array.
    endif

    % Space-pad the shortest word so as they can be used with gt() and lt() operators.
    if length(a_as_string) > length( b_as_string )
       b_as_string( length( b_as_string ) + 1 : length( a_as_string ) ) = " ";
    elseif length(a_as_string) < length( b_as_string )
       a_as_string( length( a_as_string ) + 1 : length( b_as_string ) ) = " ";
    endif

    letters_gt = gt(a_as_string, b_as_string);      %list of boolean a > b
    letters_lt = lt(a_as_string, b_as_string);      %list of boolean a < b
    ret = 0;
    %octave makes us roll our own string compare because
    %strings are arrays of numerics
    len = length(letters_gt);
    for i = 1:len
        if letters_gt(i) > letters_lt(i)
            ret = 1;
            return
        elseif letters_gt(i) < letters_lt(i)
            ret = -1;
            return
        endif
    end;
endfunction

%Assuming that myarray is already sorted, (it must be for binary 
%search to finish in logarithmic time `O(log-n))` worst case, then do

myarray = [ 30, 40, 50.15 ];        %already sorted list
binsearch(myarray, 30,    1, 3)     %item 30 is in slot 1
binsearch(myarray, 40,    1, 3)     %item 40 is in slot 2
binsearch(myarray, 50,    1, 3)     %50 does not exist so return 0
binsearch(myarray, 50.15, 1, 3)     %50.15 is in slot 3

%same but for strings:
myarray2 = [ "abc"; "def"; "ghi"];       %already sorted list
binsearch_str(myarray2, "abc", 1, 3)     %item abc is in slot 1
binsearch_str(myarray2, "def", 1, 3)     %item def is in slot 2
binsearch_str(myarray2, "zzz", 1, 3)     %zzz does not exist so return 0
binsearch_str(myarray2, "ghi", 1, 3)     %item ghi is in slot 3

To sort your array if it isn't already:

Complexity of sorting depends on the kind of data you have and whatever sorting algorithm GNU octave language writers selected, it's somewhere between O(n*log(n)) and O(n*n).

myarray = [ 9, 40, -3, 3.14, 20 ];        %not sorted list 
myarray = sort(myarray) 

myarray2 = [ "the"; "cat"; "sat"; "on"; "the"; "mat"];       %not sorted list 
myarray2 = sortrows(myarray2)

Code buffs to make this backward compatible with GNU Octave 3. 5. and 7. goes to @Paulo Carvalho in the other answer here.

回复收藏 0 原文

陌路黄昏 2024-12-24 08:56:02

是的，检查一下：http://www.obihiro.ac.jp/~ suzukim/masuda/octave/html3/octave_36.html#SEC75

a = ["hello"; "world"];
c = cellstr (a)
     ⇒ c =
         {
           [1,1] = hello
           [2,1] = world
         }
>>> cellidx(c, 'hello')
ans =  1

>>> cellidx(c, 'world')
ans =  2

Yes check this: http://www.obihiro.ac.jp/~suzukim/masuda/octave/html3/octave_36.html#SEC75

a = ["hello"; "world"];
c = cellstr (a)
     ⇒ c =
         {
           [1,1] = hello
           [2,1] = world
         }
>>> cellidx(c, 'hello')
ans =  1

>>> cellidx(c, 'world')
ans =  2

回复收藏 0 原文

围归者 2024-12-24 08:56:02

cellidx 解决方案不满足 OP 的效率要求，因此已被弃用（如 help cellidx 中所述）。

Håvard Geithus 在评论中建议对排序字符串元胞数组使用lookup() 函数，这比cellidx 效率更高。不过，它仍然是二分搜索，而大多数现代语言（甚至许多 20 年前的语言）使我们可以轻松访问关联数组，这将是一种更好的方法。

虽然 Octave 显然没有关联的数组，但这实际上是解释器用于 ocatve 变量（包括结构）的内容，因此您可以利用它，如下所述：
http://math-blog .com/2011/05/09/associative-arrays-and-cellular-automata-in-octave/

Built-in Function: struct ("field", value, "field", value,...)
Built-in Function: isstruct (expr)
Built-in Function: rmfield (s, f)
Function File: [k1,..., v1] = setfield (s, k1, v1,...)
Function File: [t, p] = orderfields (s1, s2)
Built-in Function: fieldnames (struct)
Built-in Function: isfield (expr, name)
Function File: [v1,...] = getfield (s, key,...)
Function File: substruct (type, subs,...)

将Matlab转换为Octave是否有一个converters.Map等效项？建议使用 javaObject("java.util.Hashtable")。这会带来一些设置开销，但如果您经常使用它，将会带来性能提升。链接某些用 C 或 C++ 编写的库甚至可能可行？不过，请考虑这是否是一个可维护的选项。

警告：我对 Octave 比较陌生，在我自己研究的过程中写下了这篇文章（这就是我在这里的结局）。我还没有对这些技术的效率进行测试，虽然我对底层算法有相当的了解，但我可能对 Octave 中的实际效率做出了不合理的假设。

The cellidx solution does not meet the OP's efficiency requirement, and is deprecated (as noted by help cellidx).

Håvard Geithus in a comment suggested using the lookup() function on a sorted cell array of strings, which is significantly more efficient than cellidx. It's still a binary search though, whereas most modern languages (and even many 20 year old ones) give us easy access to associative arrays, which would be a much better approach.

While Octave doesn't obviously have associated arrays, that's effectively what the interpreter is using for ocatve's variables, including structs, so you can make us of that, as described here:
http://math-blog.com/2011/05/09/associative-arrays-and-cellular-automata-in-octave/

Built-in Function: struct ("field", value, "field", value,...)
Built-in Function: isstruct (expr)
Built-in Function: rmfield (s, f)
Function File: [k1,..., v1] = setfield (s, k1, v1,...)
Function File: [t, p] = orderfields (s1, s2)
Built-in Function: fieldnames (struct)
Built-in Function: isfield (expr, name)
Function File: [v1,...] = getfield (s, key,...)
Function File: substruct (type, subs,...)

Converting Matlab to Octave is there a containers.Map equivalent? suggests using javaObject("java.util.Hashtable"). That would come with some setup overhead, but would be a performance win if you're using it a lot. It may even be viable to link in some library written in C or C++? Do think about whether this is a maintainable option though.

Caveat: I'm relatively new to Octave, and writing this up as I research it myself (which is how I wound up here). I haven't yet run tests on the efficiency of these techniques, and while I've got a fair knowledge of the underlying algorithms, I may be making unreasonable assumptions about what's actually efficient in Octave.

回复收藏 0 原文

愿与i 2024-12-24 08:56:02

这是 mystrcmp() 的一个版本，可在最新版本 (7.1.0) 的 Octave 中运行：

function ret = mystrcmp(a, b)
    %this function is just an octave string compare, its behavior follows the
    %strcmp(str1,str2)'s in C and java.lang.String.compareTo(...)'s in Java,
    %that is:
    %  -returns 1 if string a > b
    %  -returns 0 if string a == b
    %  -return -1 if string a < b

    % The gt() operator does not support cell array.  If the single word
    % is passed as an one-element cell array, converts it to a string.
    a_as_string = a;
    if iscellstr( a )
       a_as_string = a{1}; %a was passed as a single-element cell array.
    endif

    % The gt() operator does not support cell array.  If the single word
    % is passed as an one-element cell array, converts it to a string.
    b_as_string = b;
    if iscellstr( b )
       b_as_string = b{1}; %b was passed as a single-element cell array.
    endif

    % Space-pad the shortest word so as they can be used with gt() and lt() operators.
    if length(a_as_string) > length( b_as_string )
       b_as_string( length( b_as_string ) + 1 : length( a_as_string ) ) = " ";
    elseif length(a_as_string) < length( b_as_string )
       a_as_string( length( a_as_string ) + 1 : length( b_as_string ) ) = " ";
    endif

    letters_gt = gt(a_as_string, b_as_string);      %list of boolean a > b
    letters_lt = lt(a_as_string, b_as_string);      %list of boolean a < b
    ret = 0;
    %octave makes us roll our own string compare because
    %strings are arrays of numerics
    len = length(letters_gt);
    for i = 1:len
        if letters_gt(i) > letters_lt(i)
            ret = 1;
            return
        elseif letters_gt(i) < letters_lt(i)
            ret = -1;
            return
        endif
    end;
endfunction

This is a version of mystrcmp() that works in Octave of recent version (7.1.0):

function ret = mystrcmp(a, b)
    %this function is just an octave string compare, its behavior follows the
    %strcmp(str1,str2)'s in C and java.lang.String.compareTo(...)'s in Java,
    %that is:
    %  -returns 1 if string a > b
    %  -returns 0 if string a == b
    %  -return -1 if string a < b

    % The gt() operator does not support cell array.  If the single word
    % is passed as an one-element cell array, converts it to a string.
    a_as_string = a;
    if iscellstr( a )
       a_as_string = a{1}; %a was passed as a single-element cell array.
    endif

    % The gt() operator does not support cell array.  If the single word
    % is passed as an one-element cell array, converts it to a string.
    b_as_string = b;
    if iscellstr( b )
       b_as_string = b{1}; %b was passed as a single-element cell array.
    endif

    % Space-pad the shortest word so as they can be used with gt() and lt() operators.
    if length(a_as_string) > length( b_as_string )
       b_as_string( length( b_as_string ) + 1 : length( a_as_string ) ) = " ";
    elseif length(a_as_string) < length( b_as_string )
       a_as_string( length( a_as_string ) + 1 : length( b_as_string ) ) = " ";
    endif

    letters_gt = gt(a_as_string, b_as_string);      %list of boolean a > b
    letters_lt = lt(a_as_string, b_as_string);      %list of boolean a < b
    ret = 0;
    %octave makes us roll our own string compare because
    %strings are arrays of numerics
    len = length(letters_gt);
    for i = 1:len
        if letters_gt(i) > letters_lt(i)
            ret = 1;
            return
        elseif letters_gt(i) < letters_lt(i)
            ret = -1;
            return
        endif
    end;
endfunction

回复收藏 0 原文

~没有更多了~