应用 SVD 会立即引发内存错误?
我正在尝试将 SVD 应用于经过一些文本处理后获得的矩阵 (3241 x 12596)(最终目标是执行潜在语义分析),我无法理解为什么会发生这种情况,因为我的 64 位机器有 16GB内存。当svd(self.A)
被调用时,它会抛出一个错误。精确的错误如下:
Traceback (most recent call last):
File ".\SVD.py", line 985, in <module>
_svd.calc()
File ".\SVD.py", line 534, in calc
self.U, self.S, self.Vt = svd(self.A)
File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 81, in svd
overwrite_a = overwrite_a)
MemoryError
所以我尝试使用
self.U, self.S, self.Vt = svd(self.A, full_matrices= False)
,这一次,它抛出以下错误:
Traceback (most recent call last):
File ".\SVD.py", line 985, in <module>
_svd.calc()
File ".\SVD.py", line 534, in calc
self.U, self.S, self.Vt = svd(self.A, full_matrices= False)
File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 71, in svd
return numpy.linalg.svd(a, full_matrices=0, compute_uv=compute_uv)
File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 1317, in svd
work = zeros((lwork,), t)
MemoryError
Is this Should be such a big matrix that Numpy无法处理,并且在这个阶段我可以做一些事情而不改变方法本身?
I am trying to apply SVD on my matrix (3241 x 12596) that was obtained after some text processing (with the ultimate goal of performing Latent Semantic Analysis) and I am unable to understand why this is happening as my 64-bit machine has 16GB RAM. The moment svd(self.A)
is called, it throws an error. The precise error is given below:
Traceback (most recent call last):
File ".\SVD.py", line 985, in <module>
_svd.calc()
File ".\SVD.py", line 534, in calc
self.U, self.S, self.Vt = svd(self.A)
File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 81, in svd
overwrite_a = overwrite_a)
MemoryError
So I tried using
self.U, self.S, self.Vt = svd(self.A, full_matrices= False)
and this time, it throws the following error:
Traceback (most recent call last):
File ".\SVD.py", line 985, in <module>
_svd.calc()
File ".\SVD.py", line 534, in calc
self.U, self.S, self.Vt = svd(self.A, full_matrices= False)
File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 71, in svd
return numpy.linalg.svd(a, full_matrices=0, compute_uv=compute_uv)
File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 1317, in svd
work = zeros((lwork,), t)
MemoryError
Is this supposed to be such a large matrix that Numpy cannot handle and is there something that I can do at this stage without changing the methodology itself?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,
scipy.linalg.svd
的full_matrices
参数很重要:您的输入高度缺乏排名(排名最高 3,241),因此您不想分配V
的整个 12,596 x 12,596 矩阵!更重要的是,来自文本处理的矩阵可能非常稀疏。 scipy.linalg.svd 很密集,并且不提供截断的 SVD,这会导致 a) 悲剧性的性能和 b) 大量的内存浪费。
查看 PyPI 中的 sparseSVD 包,它适用于稀疏输入,您可以询问 top仅
K
因子。或者尝试 scipy.sparse.linalg.svd ,尽管它效率不高并且仅在较新版本的 scipy 中可用。或者,为了完全避免细节,请使用透明地为您执行高效 LSA 的包,例如 gensim。
Yes, the
full_matrices
parameter toscipy.linalg.svd
is important: your input is highly rank-deficient (rank max 3,241), so you don't want to allocate the entire 12,596 x 12,596 matrix forV
!More importantly, matrices coming from text processing are likely very sparse. The
scipy.linalg.svd
is dense and doesn't offer truncated SVD, which results in a) tragic performance and b) lots of wasted memory.Have a look at the sparseSVD package from PyPI, which works over sparse input and you can ask for top
K
factors only. Or tryscipy.sparse.linalg.svd
, though that's not as efficient and only available in newer versions of scipy.Or, to avoid the gritty details completely, use a package that does efficient LSA for you transparently, such as gensim.
显然,事实证明,多亏了@Ferdinand Beyer,我没有注意到我在 64 位机器上使用了 32 位版本的 Python。
使用 64 位版本的 Python 并重新安装所有库解决了该问题。
Apparently, as it turns out, thanks to @Ferdinand Beyer, I did not notice that I was using a 32-bit version of Python on my 64-bit machine.
Using a 64-bit version of Python and reinstalling all the libraries solved the problem.