python中稀疏矩阵的伪逆
我正在处理来自神经成像的数据,由于数据量很大,我想在我的代码中使用稀疏矩阵(scipy.sparse.lil_matrix 或 csr_matrix)。
特别是,我需要计算矩阵的伪逆来解决最小二乘问题。 我找到了sparse.lsqr方法,但效率不是很高。有没有一种方法可以计算 Moore-Penrose 的伪逆(对应于普通矩阵的 pinv)。
我的矩阵 A 的大小约为 600'000x2000,在矩阵的每一行中,我将有 0 到 4 个非零值。矩阵 A 的大小由体素 x 纤维束(白质纤维束)给出,我们预计一个体素中最多有 4 个纤维束交叉。在大多数白质体素中,我们预计至少有 1 个区域,但我会说大约 20% 的线可能是零。
向量 b 不应该是稀疏的,实际上 b 包含每个体素的度量,通常不为零。
我需要最小化误差,但向量 x 也有一些条件。当我在较小的矩阵上尝试该模型时,我从来不需要约束系统来满足这些条件(一般为 0
有什么帮助吗?有没有办法避免采用 A 的伪逆?
谢谢
6 月 1 日更新: 再次感谢您的帮助。 我无法真正向您展示有关我的数据的任何信息,因为 python 中的代码给我带来了一些问题。然而,为了了解如何选择一个好的 k,我尝试在 Matlab 中创建一个测试函数。
代码如下:
F=zeros(100000,1000);
for k=1:150000
p=rand(1);
a=0;
b=0;
while a<=0 || b<=0
a=random('Binomial',100000,p);
b=random('Binomial',1000,p);
end
F(a,b)=rand(1);
end
solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100);
size(solution)
solution=solution';
measure=F*solution;
%check=pinvF*measure;
k=250;
F=sparse(F);
[U,S,V]=svds(F,k);
s=svds(F,k);
plot(s)
max(max(U*S*V'-F))
for s=1:k
if S(s,s)~=0
S(s,s)=1/S(s,s);
end
end
inv=V*S'*U';
inv*measure
max(inv*measure-solution)
你知道 k 与 F 的大小相比应该是多少吗?我拿了250个(超过1000个),结果不太理想(等待时间可以接受,但不短)。 现在我也可以将结果与已知的解决方案进行比较,但一般如何选择 k 呢? 我还附上了我得到的 250 个单个值及其平方标准化的图。我不知道如何更好地在 matlab 中绘制屏幕图。我现在正在继续使用更大的 k,看看该值是否会突然变小。
再次感谢, Jennifer
I am working with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix).
In particular, I will need to compute the pseudo-inverse of my matrix to solve a least-square problem.
I have found the method sparse.lsqr, but it is not very efficient. Is there a method to compute the pseudo-inverse of Moore-Penrose (correspondent to pinv for normal matrices).
The size of my matrix A is about 600'000x2000 and in every row of the matrix I'll have from 0 up to 4 non zero values. The matrix A size is given by voxel x fiber bundle (white matter fiber tracts) and we are expecting maximum 4 tracts to cross in a voxel. In most of the white matter voxels we expect to have at least 1 tract, but I will say that around 20% of the lines could be zeros.
The vector b should not be sparse, actually b contains the measure for each voxel, which is in general not zero.
I would need to minimize the error, but there are also some conditions on the vector x. As I tried the model on smaller matrices, I never needed to constrain the system in order to satisfy these conditions (in general 0
Is that of any help? Is there a way to avoid taking the pseudo-inverse of A?
Thanks
Update 1st June:
thanks again for the help.
I can't really show you anything about my data, because the code in python give me some problems. However, in order to understand how I could choose a good k I've tried to create a testing function in Matlab.
The code is as follow:
F=zeros(100000,1000);
for k=1:150000
p=rand(1);
a=0;
b=0;
while a<=0 || b<=0
a=random('Binomial',100000,p);
b=random('Binomial',1000,p);
end
F(a,b)=rand(1);
end
solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100);
size(solution)
solution=solution';
measure=F*solution;
%check=pinvF*measure;
k=250;
F=sparse(F);
[U,S,V]=svds(F,k);
s=svds(F,k);
plot(s)
max(max(U*S*V'-F))
for s=1:k
if S(s,s)~=0
S(s,s)=1/S(s,s);
end
end
inv=V*S'*U';
inv*measure
max(inv*measure-solution)
Do you have any idea of what should be k compare to the size of F? I've taken 250 (over 1000) and the results are not satisfactory (the waiting time is acceptable, but not short).
Also now I can compare the results with the known solution, but how could one choose k in general?
I also attached the plot of the 250 single values that I get and their squares normalized. I don't know exactly how to better do a screeplot in matlab. I'm now proceeding with bigger k to see if suddently the value will be much smaller.
Thanks again,
Jennifer
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以进一步研究 中提供的替代方案scipy.sparse.linalg。
无论如何,请注意,稀疏矩阵的伪逆很可能是(非常)稠密的矩阵,因此在求解稀疏线性系统时,它(通常)并不是真正有效的途径。
您可能想以稍微更详细的方式描述您的特定问题(
dot(A, x)= b+ e
)。至少指定:A
的“典型”大小A
最小二乘 意味着norm(e)
是最小化,但请说明您的主要兴趣是在x_hat
还是b_hat
,其中e= b- b_hat
和b_hat= dot (A, x_hat)
更新:如果您对
A
的排名有所了解(及其远小于列数),您可以尝试 < href="http://en.wikipedia.org/wiki/Total_least_squares" rel="noreferrer">总最小二乘方法。这是一个简单的实现,其中 k 是要使用的第一个奇异值和向量的数量(即“有效”等级)。You could study more on the alternatives offered in scipy.sparse.linalg.
Anyway, please note that a pseudo-inverse of a sparse matrix is most likely to be a (very) dense one, so it's not really a fruitful avenue (in general) to follow, when solving sparse linear systems.
You may like to describe a slight more detailed manner your particular problem (
dot(A, x)= b+ e
). At least specify:A
A
norm(e)
is minimized, but please indicate whether your main interest is onx_hat
or onb_hat
, wheree= b- b_hat
andb_hat= dot(A, x_hat)
Update: If you have some idea of the rank of
A
(and its much smaller than number of columns), you could try total least squares method. Here is a simple implementation, wherek
is the number of first singular values and vectors to use (i.e. 'effective' rank).无论我的评论的答案如何,我认为您可以使用 Moore-Penrose SVD 表示。使用 scipy.sparse.linalg.svds 查找 SVD,用其伪逆替换 Sigma,然后乘以 V*Sigma_pi*U' 以查找原始矩阵的伪逆。
Regardless of the answer to my comment, I would think you could accomplish this fairly easily using the Moore-Penrose SVD representation. Find the SVD with scipy.sparse.linalg.svds, replace Sigma by its pseudoinverse, and then multiply V*Sigma_pi*U' to find the pseudoinverse of your original matrix.