朱莉娅:如何有效地排序2个并行的2个大阵列的子阵列?

发布于 2025-02-11 04:13:51 字数 1628 浏览 2 评论 0 原文

我有大的1D数组 A b ,以及将它们分为子阵列的Pointers i 的数组。我的 a b 几乎不适合RAM,并且具有不同的dtypes(一个包含 uint32 s,另一个 Rational> Rational {Int64} < /code> s),所以我不想加入2D数组,以避免更改dtypes。

对于i [2:end] 中的每个 i与相应的子阵列 b [i [i-1],i [i] -1] 相同。我对此的尝试是:

function sort!(a,b) 
    p=sortperm(a); 
    a[:], b[:] = a[p], b[p]
    end
Threads.@threads for i in I[2:end] 
    sort!( a[I[i-1], I[i]-1],   b[I[i-1], I[i]-1] ) 
    end

但是,在一个小例子上,我看到 sort!不会改变子阵列的视图:

a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b);     println(a,"\n",b)  # works like it should

a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]);     println(a,"\n",b)  # does nothing!!!

关于如何创建此类函数排序的任何帮助!(尽可能高效)。


背景:我正在处理来自稀疏阵列的数据:

using SparseArrays
n=10^6;   x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr; 

因此,这些 a,b,i 是我发布的问题的好示例。我要做的是对每列中条目的行索引(以及对应的矩阵值)进行排序。

注意:我已经在朱莉娅话语上问了这个问题在这里,但没有收到任何答复或评论。如果我可以提高问题的质量,请随时告诉我。

I have large 1D arrays a and b, and an array of pointers I that separates them into subarrays. My a and b barely fit into RAM and are of different dtypes (one contains UInt32s, the other Rational{Int64}s), so I don’t want to join them into a 2D array, to avoid changing dtypes.

For each i in I[2:end], I wish to sort the subarray a[I[i-1],I[i]-1] and apply the same permutation to the corresponding subarray b[I[i-1],I[i]-1]. My attempt at this is:

function sort!(a,b) 
    p=sortperm(a); 
    a[:], b[:] = a[p], b[p]
    end
Threads.@threads for i in I[2:end] 
    sort!( a[I[i-1], I[i]-1],   b[I[i-1], I[i]-1] ) 
    end

However, already on a small example, I see that sort! does not alter the view of a subarray:

a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b);     println(a,"\n",b)  # works like it should

a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]);     println(a,"\n",b)  # does nothing!!!

Any help on how to create such function sort! (as efficient as possible) are welcome.


Background: I am dealing with data coming from sparse arrays:

using SparseArrays
n=10^6;   x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr; 

Thus these a,b,I serve as good examples to my posted problem. What I am trying to do is sort the row indices (and corresponding matrix values) of entries in each column.

Note: I already asked this question on Julia discourse here, but received no replies nor comments. If I can improve on the quality of the question, don't hesitate to tell me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

緦唸λ蓇 2025-02-18 04:13:51

问题在于 a [1:5] 不是视图,而只是副本。相反,将视图视为

function sort!(a,b) 
    p=sortperm(a); 
    a[:], b[:] = a[p], b[p]
end
Threads.@threads for i in I[2:end] 
    sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1)) 
end

您正在寻找的

PS。

@view a [2:3] @view(a [2:3]) @views 宏可以帮助使THINS更可读。

The problem is that a[1:5] is not a view, it's just a copy. instead make the view like

function sort!(a,b) 
    p=sortperm(a); 
    a[:], b[:] = a[p], b[p]
end
Threads.@threads for i in I[2:end] 
    sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1)) 
end

is what you are looking for

ps.

the @view a[2:3], @view(a[2:3]) or the @views macro can help making thins more readable.

一瞬间的火花 2025-02-18 04:13:51

首先,您不应该重新定义 base.sort!这样。现在, sort! Will Shadow base.sort!,如果您调用 sort!(a),您将获得错误。

另外, a [i [i-1],i [i] -1] b [i [i-1],i [i] -1] 是不是切片,它们只是单个元素,因此,如果您是否使用 views 对它们进行排序,则什么都不会发生。并以移动窗口的方式对数组进行排序是不正确的。

您想在这里做的事情,因为您的向量很大,请呼叫在循环中反复,选择 block_size 适合内存,并根据 b b 根据 p 进行修改,然后继续到<代码> a 并找到下一个 p ,然后重复直到要对 a 中的任何内容保留为止。我将把实施作为您的练习,但是如果您陷入某件事,您可以回来。

First of all, you shouldn't redefine Base.sort! like this. Now, sort! will shadow Base.sort! and you'll get errors if you call sort!(a).

Also, a[I[i-1], I[i]-1] and b[I[i-1], I[i]-1] are not slices, they are just single elements, so nothing should happen if you sort them either with views or not. And sorting arrays in a moving-window way like this is not correct.

What you want to do here, since your vectors are huge, is call p = partialsortperm(a[i:end], i:i+block_size-1) repeatedly in a loop, choosing a block_size that fits into memory, and modify both a and b according to p, then continue to the remaining part of a and find next p and repeat until nothing remains in a to be sorted. I'll leave the implementation as an exercise for you, but you can come back if you get stuck on something.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文