应用 SVD 会立即引发内存错误?

发布于 2024-12-01 06:02:36 字数 1189 浏览 2 评论 0原文

我正在尝试将 SVD 应用于经过一些文本处理后获得的矩阵 (3241 x 12596)(最终目标是执行潜在语义分析),我无法理解为什么会发生这种情况,因为我的 64 位机器有 16GB内存。当svd(self.A)被调用时,它会抛出一个错误。精确的错误如下:

Traceback (most recent call last):
  File ".\SVD.py", line 985, in <module>
    _svd.calc()
  File ".\SVD.py", line 534, in calc
    self.U, self.S, self.Vt = svd(self.A)
  File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 81, in svd
    overwrite_a = overwrite_a)
MemoryError

所以我尝试使用

self.U, self.S, self.Vt = svd(self.A, full_matrices= False)

,这一次,它抛出以下错误:

Traceback (most recent call last):
  File ".\SVD.py", line 985, in <module>
    _svd.calc()
  File ".\SVD.py", line 534, in calc
    self.U, self.S, self.Vt = svd(self.A, full_matrices= False)
  File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 71, in svd
    return numpy.linalg.svd(a, full_matrices=0, compute_uv=compute_uv)
  File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 1317, in svd
    work = zeros((lwork,), t)
MemoryError

Is this Should be such a big matrix that Numpy无法处理,并且在这个阶段我可以做一些事情而不改变方法本身?

I am trying to apply SVD on my matrix (3241 x 12596) that was obtained after some text processing (with the ultimate goal of performing Latent Semantic Analysis) and I am unable to understand why this is happening as my 64-bit machine has 16GB RAM. The moment svd(self.A) is called, it throws an error. The precise error is given below:

Traceback (most recent call last):
  File ".\SVD.py", line 985, in <module>
    _svd.calc()
  File ".\SVD.py", line 534, in calc
    self.U, self.S, self.Vt = svd(self.A)
  File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 81, in svd
    overwrite_a = overwrite_a)
MemoryError

So I tried using

self.U, self.S, self.Vt = svd(self.A, full_matrices= False)

and this time, it throws the following error:

Traceback (most recent call last):
  File ".\SVD.py", line 985, in <module>
    _svd.calc()
  File ".\SVD.py", line 534, in calc
    self.U, self.S, self.Vt = svd(self.A, full_matrices= False)
  File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 71, in svd
    return numpy.linalg.svd(a, full_matrices=0, compute_uv=compute_uv)
  File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 1317, in svd
    work = zeros((lwork,), t)
MemoryError

Is this supposed to be such a large matrix that Numpy cannot handle and is there something that I can do at this stage without changing the methodology itself?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

审判长 2024-12-08 06:02:36

是的,scipy.linalg.svdfull_matrices 参数很重要:您的输入高度缺乏排名(排名最高 3,241),因此您不想分配V 的整个 12,596 x 12,596 矩阵!

更重要的是,来自文本处理的矩阵可能非常稀疏。 scipy.linalg.svd 很密集,并且不提供截断的 SVD,这会导致 a) 悲剧性的性能和 b) 大量的内存浪费。

查看 PyPI 中的 sparseSVD 包,它适用于稀疏输入,您可以询问 top仅K 因子。或者尝试 scipy.sparse.linalg.svd ,尽管它效率不高并且仅在较新版本的 scipy 中可用。

或者,为了完全避免细节,请使用透明地为您执行高效 LSA 的包,例如 gensim。

Yes, the full_matrices parameter to scipy.linalg.svd is important: your input is highly rank-deficient (rank max 3,241), so you don't want to allocate the entire 12,596 x 12,596 matrix for V!

More importantly, matrices coming from text processing are likely very sparse. The scipy.linalg.svd is dense and doesn't offer truncated SVD, which results in a) tragic performance and b) lots of wasted memory.

Have a look at the sparseSVD package from PyPI, which works over sparse input and you can ask for top K factors only. Or try scipy.sparse.linalg.svd, though that's not as efficient and only available in newer versions of scipy.

Or, to avoid the gritty details completely, use a package that does efficient LSA for you transparently, such as gensim.

薔薇婲 2024-12-08 06:02:36

显然,事实证明,多亏了@Ferdinand Beyer,我没有注意到我在 64 位机器上使用了 32 位版本的 Python。

使用 64 位版本的 Python 并重新安装所有库解决了该问题。

Apparently, as it turns out, thanks to @Ferdinand Beyer, I did not notice that I was using a 32-bit version of Python on my 64-bit machine.

Using a 64-bit version of Python and reinstalling all the libraries solved the problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文