matlab 中的就地快速排序

发布于 2024-12-02 01:11:21 字数 430 浏览 0 评论 0 原文

我在 matlab 中编写了一个小型快速排序实现来对一些自定义数据进行排序。因为我正在对元胞数组进行排序,并且需要排序顺序的索引,并且不想重组元胞数组本身,所以我需要自己的实现(也许有一个可用的实现,但我没有找到它) 。

我当前的实现方法是划分为 leftright 数组,然后将这些数组传递给递归调用。因为我不知道 leftright 的大小,所以我只是将它们放在一个循环中,我知道这在 matlab 中非常慢。

我知道你可以进行就地快速排序,但我被警告不要修改传递给函数的变量的内容,因为引用调用并不是按照人们在 matlab 中期望的方式实现的(或者我被告知)。这是正确的吗?就地快速排序在 matlab 中是否可以按预期工作,或者有什么我需要注意的吗?对于实施此类事情,您还有什么其他提示?

I wrote a small quicksort implementation in matlab to sort some custom data. Because I am sorting a cell-array and I need the indexes of the sort-order and do not want to restructure the cell-array itself I need my own implementation (maybe there is one available that works, but I did not find it).

My current implementation works by partitioning into a left and right array and then passing these arrays to the recursive call. Because I do not know the size of left and and right I just grow them inside a loop which I know is horribly slow in matlab.

I know you can do an in place quicksort, but I was warned about never modifying the content of variables passed into a function, because call by reference is not implemented the way one would expect in matlab (or so I was told). Is this correct? Would an in-place quicksort work as expected in matlab or is there something I need to take care of? What other hints would you have for implementing this kind of thing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

在风中等你 2024-12-09 01:11:21

与 Matlab 的内置函数相比,由于 M 级操作的开销,在用户 M 代码中实现对复杂数据的排序可能会造成性能损失。尝试根据 Matlab 现有的矢量化函数重新构建操作。

根据您的评论,听起来您正在对单元格结构内部的单值键进行排序。您可以通过将排序键提取到原始数值数组并调用内置的 sort 来获得良好的加速。

%// An example cell array of structs that I think looks like your input
c = num2cell(struct('foo',{'a','b','c','d'}, 'bar',{6 1 3 2}))
%// Let's say the "bar" field is what you want to sort on.
key = cellfun(@(s)s.bar, c) %// Extract the sort key using cellfun
[sortedKey,ix] = sort(key) %// Sort on just the key using fast numeric sort() builtin
sortedC = c(ix); %// ix is a reordering index in to c; apply the sort using a single indexing operation
reordering = cellfun(@(s)s.foo, sortedC)  %// for human readability of results

如果您要对多个字段值进行排序,请将 n 个单元格中的所有 m 个键值提取到一个 n×m 数组中,其中各列按优先级降序排列,然后对其使用 sortrows

%// Multi-key sort
keyCols = {'bar','baz'};
key = NaN(numel(c), numel(keyCols));
for i = 1:numel(keyCols)
    keyCol = keyCols{i};
    key(:,i) = cellfun(@(s)s.(keyCol), c);
end
[sortedKey,ix] = sortrows(key);
sortedC = c(ix);
reordering = cellfun(@(s)s.foo, sortedC)

Matlab 性能的关键之一是获取原始数组中的数据,并对这些原始数组使用向量化操作。 Matlab 代码看起来像带有算法和比较函数引用等的 C++ STL 代码,通常会很慢;即使您的代码在 O(n) 复杂度方面表现良好,用户级 M 代码操作的固定成本(尤其是在非基元上)也可能是一个杀手。

另外,如果您的结构体是同构的(即它们都具有相同的字段集),则可以将它们直接存储在结构体数组中,而不是结构体元胞数组中,并且它会更紧凑。如果您可以进行更广泛的重新设计,将数据结构重新排列为“平面组织” - 其中您有一个数组结构,将所有字段的第 i 个元素作为记录读取,而不是标量字段结构数组- 可能是一个很好的效率胜利。这些重组中的任何一个都会使构建排序键数组的成本降低。

Implementing a sort on complex data in user M-code is probably going to be a loss in terms of performance due to the overhead of M-level operations compared to Matlab's builtins. Try to reframe the operation in terms of Matlab's existing vectorized functions.

Based on your comment, it sounds like you're sorting on a single-value key that's inside the structs in the cells. You can probably get a good speedup by extracting the sort key to a primitive numeric array and calling the builtin sort on that.

%// An example cell array of structs that I think looks like your input
c = num2cell(struct('foo',{'a','b','c','d'}, 'bar',{6 1 3 2}))
%// Let's say the "bar" field is what you want to sort on.
key = cellfun(@(s)s.bar, c) %// Extract the sort key using cellfun
[sortedKey,ix] = sort(key) %// Sort on just the key using fast numeric sort() builtin
sortedC = c(ix); %// ix is a reordering index in to c; apply the sort using a single indexing operation
reordering = cellfun(@(s)s.foo, sortedC)  %// for human readability of results

If you're sorting on multiple field values, extract all the m key values from the n cells to an n-by-m array, with columns in descending order of precedence, and use sortrows on it.

%// Multi-key sort
keyCols = {'bar','baz'};
key = NaN(numel(c), numel(keyCols));
for i = 1:numel(keyCols)
    keyCol = keyCols{i};
    key(:,i) = cellfun(@(s)s.(keyCol), c);
end
[sortedKey,ix] = sortrows(key);
sortedC = c(ix);
reordering = cellfun(@(s)s.foo, sortedC)

One of the keys to performance in Matlab is to get your data in primitive arrays, and use vectorized operations on those primitive arrays. Matlab code that looks like C++ STL code with algorithms and references to comparison functions and the like will often be slow; even if your code is good in O(n) complexity terms, the fixed cost of user-level M-code operations, especially on non-primitives, can be a killer.

Also, if your structs are homogeneous (that is, they all have the same set of fields), you can store them directly in a struct array instead of a cell array of structs, and it will be more compact. If you can do more extensive redesign, rearranging your data structures to be "planar-organized" - where you have a struct of arrays, reading across the ith elemnt of all the fields as a record, instead of an array of structs of scalar fields - could be a good efficiency win. Either of these reorganizations would make constructing the sort key array cheaper.

无边思念无边月 2024-12-09 01:11:21

在这篇文章中,我仅解释 MATLAB 函数调用约定,而不讨论快速排序算法的实现。

调用函数时,MATLAB 传递内置数据类型 按值,并且对此类参数所做的任何更改在函数外部都不可见。

function y = myFunc(x)
    x = x .* 2;         %# pass-by-value, changes only visible inside function
    y = x;
end

对于大数据来说,这可能效率低下,尤其是如果它们没有在函数内部修改的话。因此,MATLAB 在内部实现了写时复制机制:例如,当复制向量时,仅复制一些元数据,而数据本身在向量的两个副本之间共享。只有当其中之一被修改时,数据才真正被复制。

function y = myFunc(x)
    %# x was never changed, thus passed-by-reference avoiding making a copy
    y = x .* 2;
end

请注意,对于单元格数组和结构体,只有修改的单元格/字段是按值传递的(这是因为单元格/字段在内部单独存储),这使得此类数据结构的复制更加高效。有关更多信息,请阅读此博客发布

此外,版本 R2007 及更高版本(我认为)检测 对数据进行就地操作并优化此类情况。

function x = myFunc(x)
    x = x.*2;
end

显然,调用此类函数时,左侧必须与右侧相同(x = myFunc(x);)。此外,为了利用此优化,必须从另一个函数内部调用就地函数。

在 MEX 函数中,虽然可以在不复制的情况下更改输入变量,但它不受官方支持,并且可能会产生意外结果...

对于 用户定义类型 (OOP),MATLAB 引入了 .com/help/matlab/matlab_oop/comparing-handle-and-value-classes.html" rel="nofollow noreferrer">值对象与句柄对象 支持 参考语义

In this post, I only explain MATLAB function-calling convention, and am not discussing the quick-sort algorithm implementation.

When calling functions, MATLAB passes built-in data types by-value, and any changes made to such arguments are not visible outside the function.

function y = myFunc(x)
    x = x .* 2;         %# pass-by-value, changes only visible inside function
    y = x;
end

This could be inefficient for large data especially if they are not modified inside the functions. Therefore MATLAB internally implements a copy-on-write mechanism: for example when a vector is copied, only some meta-data is copied, while the data itself is shared between the two copies of the vector. And it is only when one of them is modified, that the data is actually duplicated.

function y = myFunc(x)
    %# x was never changed, thus passed-by-reference avoiding making a copy
    y = x .* 2;
end

Note that for cell-arrays and structures, only the cells/fields modified are passed-by-value (this is because cells/fields are internally stored separately), which makes copying more efficient for such data structures. For more information, read this blog post.

In addition, versions R2007 and upward (I think) detects in-place operations on data and optimizes such cases.

function x = myFunc(x)
    x = x.*2;
end

Obviously when calling such function, the LHS must be the same as the RHS (x = myFunc(x);). Also in order to take advantage of this optimization, in-place functions must be called from inside another function.

In MEX-functions, although it is possible to change input variables without making copies, it is not officially supported and might yield unexpected results...

For user-defined types (OOP), MATLAB introduced the concept of value object vs. handle object supporting reference semantics.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文