使用 numpy 矢量化时如何避免巨大的额外内存消耗？

发布于 2024-11-29 20:06:41 字数 1212 浏览 4 评论 0 原文

下面的代码最好地说明了我的问题：

控制台的输出（注意，即使是第一个测试也需要大约 8 分钟才能运行）显示 512x512x512x16 位数组分配的消耗不超过预期（每个分配 256MByte），并查看“ top”，该进程通常如预期的那样保持在 600MByte 以下。

但是，当调用函数的矢量化版本时，进程会扩展到巨大大小（超过 7GByte！）。即使是我能想到的最明显的解释 - 矢量化正在内部将输入和输出转换为 float64 - 也只能解释几个 GB，即使矢量化函数返回一个 int16，并且返回的数组肯定是一个 int16。有什么办法可以避免这种情况的发生吗？我使用/理解 Vectorize 的 otypes 参数是否错误？

import numpy as np
import subprocess

def logmem():
    subprocess.call('cat /proc/meminfo | grep MemFree',shell=True)

def fn(x):
    return np.int16(x*x)

def test_plain(v):
    print "Explicit looping:"
    logmem()
    r=np.zeros(v.shape,dtype=np.int16)
    for z in xrange(v.shape[0]):
        for y in xrange(v.shape[1]):
            for x in xrange(v.shape[2]):
                r[z,y,x]=fn(x)
    print type(r[0,0,0])
    logmem()
    return r

vecfn=np.vectorize(fn,otypes=[np.int16])

def test_vectorize(v):
    print "Vectorize:"
    logmem()
    r=vecfn(v)
    print type(r[0,0,0])
    logmem()
    return r

logmem()    
s=(512,512,512)
v=np.ones(s,dtype=np.int16)
logmem()
test_plain(v)
test_vectorize(v)
v=None
logmem()

我正在使用 amd64 Debian Squeeze 系统上当前的 Python/numpy 版本（Python 2.6.6、numpy 1.4.1）。

原文

This code below best illustrates my problem:

The output to the console (NB it takes ~8 minutes to run even the first test) shows the 512x512x512x16-bit array allocations consuming no more than expected (256MByte for each one), and looking at "top" the process generally remains sub-600MByte as expected.

However, while the vectorized version of the function is being called, the process expands to enormous size (over 7GByte!). Even the most obvious explanation I can think of to account for this - that vectorize is converting the inputs and outputs to float64 internally - could only account for a couple of gigabytes, even though the vectorized function returns an int16, and the returned array is certainly an int16. Is there some way to avoid this happening ? Am I using/understanding vectorize's otypes argument wrong ?

import numpy as np
import subprocess

def logmem():
    subprocess.call('cat /proc/meminfo | grep MemFree',shell=True)

def fn(x):
    return np.int16(x*x)

def test_plain(v):
    print "Explicit looping:"
    logmem()
    r=np.zeros(v.shape,dtype=np.int16)
    for z in xrange(v.shape[0]):
        for y in xrange(v.shape[1]):
            for x in xrange(v.shape[2]):
                r[z,y,x]=fn(x)
    print type(r[0,0,0])
    logmem()
    return r

vecfn=np.vectorize(fn,otypes=[np.int16])

def test_vectorize(v):
    print "Vectorize:"
    logmem()
    r=vecfn(v)
    print type(r[0,0,0])
    logmem()
    return r

logmem()    
s=(512,512,512)
v=np.ones(s,dtype=np.int16)
logmem()
test_plain(v)
test_vectorize(v)
v=None
logmem()

I'm using whichever versions of Python/numpy are current on an amd64 Debian Squeeze system (Python 2.6.6, numpy 1.4.1).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄居者 2024-12-06 20:06:41

向量化的一个基本问题是所有中间值也是向量。虽然这是一种获得不错的速度增强的便捷方法，但它对于内存使用效率可能非常低，并且会不断地破坏您的 CPU 缓存。为了解决这个问题，您需要使用一种方法，该方法具有以编译速度而不是 python 速度运行的显式循环。执行此操作的最佳方法是使用 cython，这是用 f2py 或 numexpr。您可以在此处找到这些方法的比较这更注重速度而不是内存使用。