加快 Matplotlib 速度?
我在此处读到,matplotlib 擅长处理大型数据集。我正在编写一个数据处理应用程序,并将 matplotlib 绘图嵌入到 wx 中,并且发现 matplotlib 在处理大量数据方面非常糟糕,无论是在速度方面还是在内存方面。除了对输入进行下采样之外,有谁知道加速(减少内存占用)matplotlib 的方法吗?
为了说明 matplotlib 对内存的影响有多么糟糕,请考虑以下代码:
import pylab
import numpy
a = numpy.arange(int(1e7)) # only 10,000,000 32-bit integers (~40 Mb in memory)
# watch your system memory now...
pylab.plot(a) # this uses over 230 ADDITIONAL Mb of memory
I've read here that matplotlib is good at handling large data sets. I'm writing a data processing application and have embedded matplotlib plots into wx and have found matplotlib to be TERRIBLE at handling large amounts of data, both in terms of speed and in terms of memory. Does anyone know a way to speed up (reduce memory footprint of) matplotlib other than downsampling your inputs?
To illustrate how bad matplotlib is with memory consider this code:
import pylab
import numpy
a = numpy.arange(int(1e7)) # only 10,000,000 32-bit integers (~40 Mb in memory)
# watch your system memory now...
pylab.plot(a) # this uses over 230 ADDITIONAL Mb of memory
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
下采样是一个很好的解决方案——在 matplotlib 中绘制 10M 点会消耗大量内存和时间。如果您知道多少内存是可接受的,那么您可以根据该数量进行下采样。例如,假设 1M 点需要 23 MB 的额外内存,并且您发现它在空间和时间方面是可以接受的,因此您应该进行下采样,使其始终低于 1M 点
:过分降低采样,不符合您的口味。)
Downsampling is a good solution here -- plotting 10M points consumes a bunch of memory and time in matplotlib. If you know how much memory is acceptable, then you can downsample based on that amount. For example, let's say 1M points takes 23 additional MB of memory and you find it to be acceptable in terms of space and time, therefore you should downsample so that it's always below the 1M points:
Or something like the above snippet (the above may downsample too aggressively for your taste.)
我也经常对极值感兴趣,所以在绘制大块数据之前,我会这样进行:
当然
np.max
只是极值计算函数的一个示例。聚苯乙烯
使用 numpy“跨步技巧”,应该可以避免在重塑期间复制数据。
I'm often interested in the extreme values too so, before plotting large chunks of data, I proceed in this way:
Of course
np.max
is just an example of extreme calculation function.P.S.
With
numpy
"strides tricks" it should be possible to avoid copying data around during reshape.我对保留对数采样图的一侧感兴趣,所以我想出了这个:
(下采样是我的第一次尝试)
这使我能够更好地保留情节的一侧:
I was interested in preserving one side of a log sampled plot so I came up with this:
(downsample being my first attempt)
which allowed me to better preserve one side of plot: