使用 subprocess.Popen 的 Python 内存分配错误
我正在做一些生物信息学工作。我有一个Python脚本,它在某一时刻调用一个程序来执行一个昂贵的过程(序列对齐......使用大量的计算能力和内存)。我使用 subprocess.Popen 来调用它。当我在测试用例上运行它时,它完成并正常完成。然而,当我在完整文件上运行它时,它必须对不同的输入集多次执行此操作,它就会死掉。子进程抛出:
OSError: [Errno 12] Cannot allocate memory
我找到了一些链接此处和此处和这里类似的问题,但我不确定它们是否适用于我的情况。
默认情况下,序列对齐器将尝试请求 51000M 内存。它并不总是使用那么多,但可能会。加载并处理完整的输入后,这些内容就不再可用了。但是,限制它请求的数量或尝试使用运行时可能可用的较低数量仍然会出现相同的错误。我也尝试过使用 shell=True 运行和同样的事情。
这已经困扰我好几天了。感谢您的任何帮助。
编辑:扩展回溯:
File "..../python2.6/subprocess.py", line 1037, in _execute_child
self.pid=os.fork()
OSError: [Errno 12] Cannot allocate memory
引发错误。
Edit2:在64位ubuntu 10.4上运行python 2.6.4
I am doing some bioinformatics work. I have a python script that at one point calls a program to do an expensive process (sequence alignment..uses a lot of computational power and memory). I call it using subprocess.Popen. When I run it on a testcase, it completes and finishes fine. However, when I run it on the full file, where it would have to do this multiple times for different sets of inputs, it dies. Subprocess throws:
OSError: [Errno 12] Cannot allocate memory
I found a few links here and here and here to similar problems, but I'm not sure that they apply in my case.
By default, the sequence aligner will try to request 51000M of memory. It doesn't always use that much, but it might. With the full input loaded and processed, that much is not available. However, capping the amount it requests or will attempt to use at a lower amount that might be available when running still gives me the same error. I've also tried running with shell=True and same thing.
This has been bugging me for a few days now. Thanks for any help.
Edit: Expanding the traceback:
File "..../python2.6/subprocess.py", line 1037, in _execute_child
self.pid=os.fork()
OSError: [Errno 12] Cannot allocate memory
throws the error.
Edit2: Running in python 2.6.4 on 64 bit ubuntu 10.4
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我真的为OP感到难过。 6 年过去了,没有人提到这是 Unix 中很常见的问题,实际上与 python 或生物信息学无关。调用 os.fork() 会暂时将父进程的内存加倍(父进程的内存必须可供子进程使用),然后将其全部丢弃以执行 exec()。虽然该内存并不总是被实际复制,但系统必须有足够的内存来允许复制它,因此,如果您的父进程正在使用一半以上的系统内存,并且您的子进程甚至会输出“wc -l ”,你将会遇到内存错误。
解决方案是使用 posix_spawn,或者在脚本开头创建所有子进程,同时内存消耗较低,然后在父进程完成内存密集型操作后使用它们。
使用关键字“os.fork”和“内存”进行谷歌搜索将显示有关该主题的几篇 Stack Overflow 帖子,这些帖子可以进一步解释正在发生的事情:)
I feel really sorry for the OP. 6 years later and no one mentioned that this is a very common problem in Unix, and actually has nothing to do with python or bioinformatics. A call to os.fork() temporarily doubles the memory of the parent process (the memory of the parent process must be available to the child process), before throwing it all away to do an exec(). While this memory isn't always actually copied, the system must have enough memory to allow for it to be copied, and thus if you're parent process is using more than half of the system memory and you subprocess out even "wc -l", you're going to run into a memory error.
The solution is to use posix_spawn, or create all your subprocesses at the beginning of the script, while memory consumption is low, then use them later on after the parent process has done it's memory-intensive thing.
A google search using the keyworks "os.fork" and "memory" will show several Stack Overflow posts on the topic that can further explain what's going on :)
这与 Python 或 subprocess 模块没有任何关系。
subprocess.Popen
只是向您报告它从操作系统接收到的错误。 (顺便问一下,您使用的是什么操作系统?)来自 Linux 上的man 2 fork
:您是否多次调用
subprocess.Popen
?如果是这样,那么我认为您能做的最好的事情就是确保在下一次调用之前终止并收获进程的上一次调用。This doesn't have anything to do with Python or the
subprocess
module.subprocess.Popen
is merely reporting to you the error that it is receiving from the operating system. (What operating system are you using, by the way?) Fromman 2 fork
on Linux:Are you calling
subprocess.Popen
multiple times? If so then I think the best you can do is make sure that the previous invocation of your process is terminated and reaped before the next invocation.你使用subprocess.PIPE吗?我在使用它时遇到了问题并阅读了有关问题的信息。临时文件通常可以解决问题。
Do you use subprocess.PIPE? I had problems and read about problems when it was used. Temporary files usually solved the problem.
我会在 64 位操作系统上运行 64 位 python。
对于 32 位,在操作系统开始告诉您更多之前,您只能真正获得 3 GB RAM。
另一种选择可能是使用内存映射文件来打开文件:
http://docs.python.org /library/mmap.html
编辑: 啊,你使用的是 64 位 .. 可能原因是你的 RAM+Swap 耗尽了 .. 修复方法是增加也许是交换量。
I'd run a 64 bit python on a 64 bit OS.
With 32 bit, you can only really get 3 GB of RAM before OS starts telling you no more.
Another alternative might be to use memory mapped files to open the file:
http://docs.python.org/library/mmap.html
Edit: Ah you're on 64 bit .. possibly the cause is that you're running out of RAM+Swap .. fix would be to increase the amount of swap maybe.