内存错误和列表限制?

发布于 2024-10-29 20:30:21 字数 959 浏览 0 评论 0原文

我需要出于科学目的生成很大的(非常)矩阵(马尔可夫链)。我执行微积分,将其放入 20301 个元素的列表中(=矩阵的一行)。我需要内存中的所有这些数据才能继续下一个马尔可夫步骤,但如果需要,我可以将它们存储在其他地方(例如文件),即使它会减慢我的马尔可夫链演练。我的电脑(科学实验室):Bi-xenon 6核/12线程,12GB内存,操作系统:win64

  Traceback (most recent call last):
  File "my_file.py", line 247, in <module>
    ListTemp.append(calculus)
MemoryError

微积分结果示例:9.233747520008198e-102(是的,超过1/9000)

存储第19766个元素时出现错误:

ListTemp[19766]
1.4509421012263216e-103

如果我更进一步

Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    ListTemp[19767]
IndexError: list index out of range

,那么这个列表在 19767 循环处有一个内存错误。

问题:

  1. 列表有内存限制吗? 是“按列表限制”还是 “每个脚本的全局限制”?

  2. 如何绕过这些限制? 有什么可能性吗?

  3. 使用 numpy、python64 有帮助吗?什么 他们有内存限制吗?什么 关于其他语言?

I need to produce large and big (very) matrices (Markov chains) for scientific purposes. I perform calculus that I put in a list of 20301 elements (=one row of my matrix). I need all those data in memory to proceed next Markov step but i can store them elsewhere (eg file) if needed even if it will slow my Markov chain walk-through. My computer (scientific lab): Bi-xenon 6 cores/12threads each, 12GB memory, OS: win64

  Traceback (most recent call last):
  File "my_file.py", line 247, in <module>
    ListTemp.append(calculus)
MemoryError

Example of calculus results: 9.233747520008198e-102 (yes, it's over 1/9000)

The error is raised when storing the 19766th element:

ListTemp[19766]
1.4509421012263216e-103

If I go further

Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    ListTemp[19767]
IndexError: list index out of range

So this list had a memory error at the 19767 loop.

Questions:

  1. Is there a memory limit to a list?
    Is it a "by-list limit" or a
    "global-per-script limit"?

  2. How to bypass those limits?
    Any possibilites in mind?

  3. Will it help to use numpy, python64? What
    are the memory limits with them? What
    about other languages?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

ら栖息 2024-11-05 20:30:21

首先,请参阅 Python 数组可以达到多大? 和 < a href="https://stackoverflow.com/questions/1697557/numpy-problem-with-long-arrays">Numpy,长数组的问题

其次,唯一真正的限制来自于您的内存量以及您的系统如何存储内存引用。每个列表没有限制,因此 Python 会一直运行,直到内存耗尽。两种可能性:

  1. 如果您运行在较旧的操作系统或强制进程使用有限内存的操作系统上,则可能需要增加 Python 进程可以访问的内存量。
  2. 使用分块将列表分开。例如,执行列表的前 1000 个元素,pickle 并将它们保存到磁盘,然后执行接下来的 1000 个元素。要使用它们,请一次取消 pickle 一个块,这样就不会耗尽内存。这本质上与数据库用来处理超出 RAM 容量的数据的技术相同。

First off, see How Big can a Python Array Get? and Numpy, problem with long arrays

Second, the only real limit comes from the amount of memory you have and how your system stores memory references. There is no per-list limit, so Python will go until it runs out of memory. Two possibilities:

  1. If you are running on an older OS or one that forces processes to use a limited amount of memory, you may need to increase the amount of memory the Python process has access to.
  2. Break the list apart using chunking. For example, do the first 1000 elements of the list, pickle and save them to disk, and then do the next 1000. To work with them, unpickle one chunk at a time so that you don't run out of memory. This is essentially the same technique that databases use to work with more data than will fit in RAM.
非要怀念 2024-11-05 20:30:21

您看到的 MemoryError 异常是可用 RAM 耗尽的直接结果。这可能是由于 Windows 对每个程序 2GB 的限制 (32 位程序),或者您的计算机上缺乏可用 RAM。 (此链接指向上一个问题)。

如果您使用的是 64 位 Windows 副本,您应该能够通过使用 64 位 Python 副本来扩展 2GB。

IndexError 的原因是 Python 在计算整个数组之前遇到了 MemoryError 异常。这又是一个内存问题。

为了解决这个问题,您可以尝试使用 Python 的 64 位副本,或者更好地找到一种将结果写入文件的方法。为此,请查看 numpy 的内存映射数组

您应该能够将整套计算运行到其中一个数组中,因为实际数据将写入磁盘,而只有一小部分保存在内存中。

The MemoryError exception that you are seeing is the direct result of running out of available RAM. This could be caused by either the 2GB per program limit imposed by Windows (32bit programs), or lack of available RAM on your computer. (This link is to a previous question).

You should be able to extend the 2GB by using 64bit copy of Python, provided you are using a 64bit copy of windows.

The IndexError would be caused because Python hit the MemoryError exception before calculating the entire array. Again this is a memory issue.

To get around this problem you could try to use a 64bit copy of Python or better still find a way to write you results to file. To this end look at numpy's memory mapped arrays.

You should be able to run you entire set of calculation into one of these arrays as the actual data will be written disk, and only a small portion of it held in memory.

玩套路吗 2024-11-05 20:30:21

Python 没有施加内存限制。但是,如果 RAM 不足,您将收到 MemoryError 错误。您说 list 中有 20301 个元素。对于简单数据类型(例如int)来说,这似乎太小而不会导致内存错误,但如果每个元素本身是一个占用大量内存的对象,那么您很可能会耗尽内存。

然而,IndexError 可能是因为您的 ListTemp 只有 19767 个元素(索引从 0 到 19766)而引起的,而您试图访问最后一个元素。

如果不确切知道你想要做什么,很难说你可以做什么来避免达到极限。使用 numpy 可能会有所帮助。看起来您正在存储大量数据。您可能不需要在每个阶段都存储所有内容。但不知道就不可能说。

There is no memory limit imposed by Python. However, you will get a MemoryError if you run out of RAM. You say you have 20301 elements in the list. This seems too small to cause a memory error for simple data types (e.g. int), but if each element itself is an object that takes up a lot of memory, you may well be running out of memory.

The IndexError however is probably caused because your ListTemp has got only 19767 elements (indexed 0 to 19766), and you are trying to access past the last element.

It is hard to say what you can do to avoid hitting the limit without knowing exactly what it is that you are trying to do. Using numpy might help. It looks like you are storing a huge amount of data. It may be that you don't need to store all of it at every stage. But it is impossible to say without knowing.

小瓶盖 2024-11-05 20:30:21

如果你想避免这个问题,你也可以使用架子。然后,您将创建与机器处理能力大小相同的文件,并且仅在必要时将它们放在 RAM 上,基本上写入 HD 并将信息分段拉回,以便您可以处理它。

创建二进制文件并检查信息是否已在其中,如果是,则创建一个局部变量来保存它,否则写入一些您认为必要的数据。

Data = shelve.open('File01')
   for i in range(0,100):
     Matrix_Shelve = 'Matrix' + str(i)
     if Matrix_Shelve in Data:
        Matrix_local = Data[Matrix_Shelve]
     else:
        Data[Matrix_Selve] = 'somenthingforlater'

希望这听起来不会太古板。

If you want to circumvent this problem you could also use the shelve. Then you would create files that would be the size of your machines capacity to handle, and only put them on the RAM when necessary, basically writing to the HD and pulling the information back in pieces so you can process it.

Create binary file and check if information is already in it if yes make a local variable to hold it else write some data you deem necessary.

Data = shelve.open('File01')
   for i in range(0,100):
     Matrix_Shelve = 'Matrix' + str(i)
     if Matrix_Shelve in Data:
        Matrix_local = Data[Matrix_Shelve]
     else:
        Data[Matrix_Selve] = 'somenthingforlater'

Hope it doesn't sound too arcaic.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文