大型 PyPlot - 避免内存分配
我正在做一个相当大的 PyPlot (Python matplotlib) (600000 个值,每个 32 位)。实际上我想我可以简单地做这样的事情:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
两个数组,都在内存中分配。不过,我迟早必须绘制文件,其中包含数千兆字节的信息。
如何避免将两个数组传递到 plt.plot() 中?
然而我仍然需要一个完整的情节。因此,我认为仅使用迭代器并逐行传递值是无法完成的。
I'm doing a rather large PyPlot (Python matplotlib) (600000 values, each 32bit). Practically I guess I could simply do something like this:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
Two arrays, both allocated in memory. However I'll have to plot files, which contain several Gigabyte of those information sooner or later.
How do I avoid passing two arrays into the plt.plot()
?
I still need a complete plot however. So just an Iterator and passing the values line by line can't be done I suppose.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您谈论的是千兆字节的数据,您可能会考虑批量加载和绘制数据点,然后将每个渲染图的图像数据分层到前一个图上。这是一个简单的示例,内嵌注释:
输出:
If you're talking about gigabytes of data, you might consider loading and plotting the data points in batches, then layering the image data of each rendered plot over the previous one. Here is a quick example, with comments inline:
Output:
您实际上需要绘制各个点吗?看起来密度图也同样有效,因为有这么多可用的数据点。您可以查看 pylab 的 hexbin 或 numpy.histogram2d。对于如此大的文件,您可能必须使用 numpy.memmap,或者批量工作,如 @samplebias 所说。
Do you actually need to plot individual points? It seems like a density plot would work just as well, with so many datapoints available. You might look into pylab's hexbin or numpy.histogram2d. For such large files, you'd probably have to use numpy.memmap, though, or work in batches, as @samplebias says.