大型 PyPlot - 避免内存分配

发布于 2024-11-01 11:11:56 字数 335 浏览 3 评论 0原文

我正在做一个相当大的 PyPlot (Python matplotlib) (600000 个值,每个 32 位)。实际上我想我可以简单地做这样的事情:

import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])

两个数组,都在内存中分配。不过,我迟早必须绘制文件,其中包含数千兆字节的信息。

如何避免将两个数组传递到 plt.plot() 中?

然而我仍然需要一个完整的情节。因此,我认为仅使用迭代器并逐行传递值是无法完成的。

I'm doing a rather large PyPlot (Python matplotlib) (600000 values, each 32bit). Practically I guess I could simply do something like this:

import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])

Two arrays, both allocated in memory. However I'll have to plot files, which contain several Gigabyte of those information sooner or later.

How do I avoid passing two arrays into the plt.plot()?

I still need a complete plot however. So just an Iterator and passing the values line by line can't be done I suppose.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

花间憩 2024-11-08 11:11:56

如果您谈论的是千兆字节的数据,您可能会考虑批量加载和绘制数据点,然后将每个渲染图的图像数据分层到前一个图上。这是一个简单的示例,内嵌注释:

import Image
import matplotlib.pyplot as plt
import numpy

N = 20
size = 4
x_data = y_data = range(N)

fig = plt.figure()

prev = None
for n in range(0, N, size):
    # clear figure
    plt.clf()

    # set axes background transparent for plots n > 0
    if n:
        fig.patch.set_alpha(0.0)
        axes = plt.axes()
        axes.patch.set_alpha(0.0)

    plt.axis([0, N, 0, N])

    # here you'd read the next x/y values from disk into memory and plot
    # them.  simulated by grabbing batches from the arrays.
    x = x_data[n:n+size]
    y = y_data[n:n+size]
    ax = plt.plot(x, y, 'ro')
    del x, y

    # render the points
    plt.draw()

    # now composite the current image over the previous image
    w, h = fig.canvas.get_width_height()
    buf = numpy.fromstring(fig.canvas.tostring_argb(), dtype=numpy.uint8)
    buf.shape = (w, h, 4)
    # roll alpha channel to create RGBA
    buf = numpy.roll(buf, 3, axis=2)
    w, h, _ = buf.shape
    img = Image.fromstring("RGBA", (w, h), buf.tostring())
    if prev:
        # overlay current plot on previous one
        prev.paste(img)
        del prev
    prev = img

# save the final image
prev.save('plot.png')

输出:

在此处输入图像描述

If you're talking about gigabytes of data, you might consider loading and plotting the data points in batches, then layering the image data of each rendered plot over the previous one. Here is a quick example, with comments inline:

import Image
import matplotlib.pyplot as plt
import numpy

N = 20
size = 4
x_data = y_data = range(N)

fig = plt.figure()

prev = None
for n in range(0, N, size):
    # clear figure
    plt.clf()

    # set axes background transparent for plots n > 0
    if n:
        fig.patch.set_alpha(0.0)
        axes = plt.axes()
        axes.patch.set_alpha(0.0)

    plt.axis([0, N, 0, N])

    # here you'd read the next x/y values from disk into memory and plot
    # them.  simulated by grabbing batches from the arrays.
    x = x_data[n:n+size]
    y = y_data[n:n+size]
    ax = plt.plot(x, y, 'ro')
    del x, y

    # render the points
    plt.draw()

    # now composite the current image over the previous image
    w, h = fig.canvas.get_width_height()
    buf = numpy.fromstring(fig.canvas.tostring_argb(), dtype=numpy.uint8)
    buf.shape = (w, h, 4)
    # roll alpha channel to create RGBA
    buf = numpy.roll(buf, 3, axis=2)
    w, h, _ = buf.shape
    img = Image.fromstring("RGBA", (w, h), buf.tostring())
    if prev:
        # overlay current plot on previous one
        prev.paste(img)
        del prev
    prev = img

# save the final image
prev.save('plot.png')

Output:

enter image description here

初懵 2024-11-08 11:11:56

您实际上需要绘制各个点吗?看起来密度图也同样有效,因为有这么多可用的数据点。您可以查看 pylab 的 hexbin 或 numpy.histogram2d。对于如此大的文件,您可能必须使用 numpy.memmap,或者批量工作,如 @samplebias 所说。

Do you actually need to plot individual points? It seems like a density plot would work just as well, with so many datapoints available. You might look into pylab's hexbin or numpy.histogram2d. For such large files, you'd probably have to use numpy.memmap, though, or work in batches, as @samplebias says.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文