Matplotlib,保存到 CString 对象时可以替代 savefig() 来提高性能吗?

发布于 2024-10-24 08:28:39 字数 438 浏览 1 评论 0原文

我正在尝试加快将图表保存为图像的过程。现在我正在创建一个 cString 对象,我使用 savefig 将图表保存到其中;但我真的非常非常感谢任何帮助改进这种保存图像的方法。我必须执行数十次此操作,并且 savefig 命令非常非常慢;一定有更好的方法来做到这一点。我读过一些关于将其保存为未压缩的原始图像的内容,但我不知道如何做到这一点。如果我也可以切换到另一个更快的后端,我真的不在乎 agg。

即:

RAM = cStringIO.StringIO()

CHART = plt.figure(.... 
**code for creating my chart**

CHART.savefig(RAM, format='png')

我一直在使用 matplotlib 和FigureCanvasAgg 后端。

谢谢!

I am trying to speed up the process of saving my charts to images. Right now I am creating a cString Object where I save the chart to by using savefig; but I would really, really appreciate any help to improve this method of saving the image. I have to do this operation dozens of times, and the savefig command is very very slow; there must be a better way of doing it. I read something about saving it as uncompressed raw image, but I have no clue of how to do it. I don't really care about agg if I can switch to another faster backend too.

ie:

RAM = cStringIO.StringIO()

CHART = plt.figure(.... 
**code for creating my chart**

CHART.savefig(RAM, format='png')

I have been using matplotlib with FigureCanvasAgg backend.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

绝不服输 2024-10-31 08:28:39

如果您只想要一个原始缓冲区,请尝试 fig.canvas.print_rgbfig.canvas.print_raw 等(两者之间的区别在于 raw code> 是 rgba,而 rgb 是 rgb。还有 print_pngprint_ps 等)

这将使用 fig.dpi 而不是 savefig 的默认 dpi 值 (100 dpi)。尽管如此,即使比较 fig.canvas.print_raw(f)fig.savefig(f, format='raw', dpi=fig.dpi) print_canvas 版本稍微快一点 快得多,因为它不需要重置轴补丁的颜色等,而 savefig 默认情况下会这样做。

但无论如何,以原始格式保存图形所花费的大部分时间只是绘制图形,这是无法避免的。

无论如何,作为一个毫无意义但有趣的示例,请考虑以下内容:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    fig.canvas.draw()

Brownian walk Animation

如果我们看一下原始图像绘制时间:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    fig.canvas.draw()

在我的机器上大约需要 25 秒。

如果我们将原始 RGBA 缓冲区转储到 cStringIO 缓冲区,它实际上会稍微快一些,大约 22 秒(这只是因为我使用的是交互式后端!否则它是等效的。):

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    ram = cStringIO.StringIO()
    fig.canvas.print_raw(ram)
    ram.close()

如果我们将此与使用 < code>savefig,具有相对设置的 dpi:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    ram = cStringIO.StringIO()
    fig.savefig(ram, format='raw', dpi=fig.dpi)
    ram.close()

这大约需要 23.5 秒。基本上,在本例中,savefig 只是设置一些默认参数并调用 print_raw,因此几乎没有什么区别。

现在,如果我们将原始图像格式与压缩图像格式 (png) 进行比较,我们会发现更显着的差异:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    ram = cStringIO.StringIO()
    fig.canvas.print_png(ram)
    ram.close()

这大约需要 52 秒!显然,压缩图像会产生大量开销。

无论如何,这可能是一个不必要的复杂示例......我想我只是想避免实际工作......

If you just want a raw buffer, try fig.canvas.print_rgb, fig.canvas.print_raw, etc (the difference between the two is that raw is rgba, whereas rgb is rgb. There's also print_png, print_ps, etc)

This will use fig.dpi instead of the default dpi value for savefig (100 dpi). Still, even comparing fig.canvas.print_raw(f) and fig.savefig(f, format='raw', dpi=fig.dpi) the print_canvas version is marginally faster insignificantly faster, since it doesn't bother resetting the color of the axis patch, etc, that savefig does by default.

Regardless, though, most of the time spent saving a figure in a raw format is just drawing the figure, which there's no way to get around.

At any rate, as a pointless-but-fun example, consider the following:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    fig.canvas.draw()

Brownian walk animation

If we look at the raw draw time:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    fig.canvas.draw()

This takes ~25 seconds on my machine.

If we instead dump a raw RGBA buffer to a cStringIO buffer, it's actually marginally faster at ~22 seconds (This is only true because I'm using an interactive backend! Otherwise it would be equivalent.):

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    ram = cStringIO.StringIO()
    fig.canvas.print_raw(ram)
    ram.close()

If we compare this to using savefig, with a comparably set dpi:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    ram = cStringIO.StringIO()
    fig.savefig(ram, format='raw', dpi=fig.dpi)
    ram.close()

This takes ~23.5 seconds. Basically, savefig just sets some default parameters and calls print_raw, in this case, so there's very little difference.

Now, if we compare a raw image format with a compressed image format (png), we see a much more significant difference:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    ram = cStringIO.StringIO()
    fig.canvas.print_png(ram)
    ram.close()

This takes ~52 seconds! Obviously, there's a lot of overhead in compressing an image.

At any rate, this is probably a needlessly complex example... I think I just wanted to avoid actual work...

一个人的夜不怕黑 2024-10-31 08:28:39

我还需要快速生成大量绘图。我发现多重处理可以提高可用核心数量的绘图速度。例如,如果一个进程中 100 个绘图花费了 10 秒,那么当任务分散到 4 个核心上时,大约需要 3 秒。

I needed to quickly generate lots of plots as well. I found that multiprocessing improved the plotting speed with the number of cores available. For example, if 100 plots took 10 seconds in one process, it took ~3 seconds when the task was split across 4 cores.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文