需要帮助识别涉及 matplotlib 和 Flask 的内存泄漏

发布于 2024-12-09 08:04:20 字数 846 浏览 0 评论 0原文

我使用 Flask 框架编写了一个小型 Web 应用程序,其中涉及使用 matplotlib 进行绘图。问题是每次我创建绘图时,该过程都会消耗更多内存。

我已经使用 mod_wsgi 部署了应用程序,其 .wsgi 文件看起来就像这样:

from yourapplication import app as application

当我访问创建绘图的 url 时,问题就开始了。该函数创建一个绘图仪对象,初始化时,该对象从 sqlite3 数据库获取相关数据(数据由大约 30 个整数和同样数量的日期时间对象组成),使用 matplotlib 创建绘图并返回一个 StringIO 对象,然后该对象显示在屏幕上。

到此函数就结束了。整个类可以在这里看到

    canvas = FigureCanvas(fig)
    png_output = StringIO.StringIO()
    canvas.print_png(png_output)
    return png_output.getvalue()

当我访问该网站时,一个过程使用大约 25MB 的保留内存创建。我第一次创建绘图时,它会增长到 30MB,然后每次使用绘图仪类时都会增长约 1MB。默认设置使用 5 个进程,这消耗了太多内存(几分钟内就达到了 150MB,而我只允许使用 80MB)。

我对这里涉及的所有事物(Web 框架、apache、数据库)都很陌生,所以我不知道事情可能会出错,所以任何想法都受到高度赞赏。谢谢!

I have written a small webapp using the flask framework that involves plotting using matplotlib. The problem is that every time I create the plot, the process consumes more memory.

I have deployed the app using mod_wsgi with a .wsgi file looking simply like this:

from yourapplication import app as application

The problems start when I access the url which creates the plot. The function creates a plotter object which, when initilized, takes the relevant data from a sqlite3 database (the data consist of about 30 integers and equally many datetime objects), creates a plot using matplotlib and returning a StringIO object which then is displayed on screen.

This is the end of the function. The whole class can be seen here

    canvas = FigureCanvas(fig)
    png_output = StringIO.StringIO()
    canvas.print_png(png_output)
    return png_output.getvalue()

When I visit the site, a process is created with about 25MB of reserved memory. The first time I create a plot it grows to 30MB and then with about 1MB for each time I use the plotter class. The default settings were using 5 process which consumed way too much memory (was up to 150MB within minutes and I'm only allowed 80MB).

I'm very new to all things involved here (web frameworks, apache, databases) so I don't have any feeling of were things might be going wrong, so any ideas are highly appreciated. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

饭团 2024-12-16 08:04:20

每次调用plot_month函数后执行此操作解决了泄漏问题。

import gc
gc.collect()

Doing this after each call to the plot_month function solved the leak.

import gc
gc.collect()
樱花坊 2024-12-16 08:04:20

发布此内容以防将来会对某人有所帮助。

我遇到了同样的问题,我认为 axel22 没有为我解决问题。

经过一番修补后,我意识到有两个问题:

  1. 我没有清除 Matplotlib 图形,将其永远留在内存中
  2. 我在代码的错误部分调用了垃圾收集器

第一个问题

我正在做这样的事情(不正确):

fig = util.create_figure(....)
output = io.BytesIO()
canvas = FigureCanvas(fig)
canvas.print_png(output)

但我需要这样做(正确):

fig = util.create_figure(....)
output = io.BytesIO()
canvas = FigureCanvas(fig)
canvas.print_png(output)
# Clears the figure from memory
fig.clf()

第二个问题

是我在代码的错误部分调用了垃圾收集器。您需要在调用 FigureCanvas 的范围之外调用它。

这不起作用(错误):

import gc

def do_something():
    canvas = FigureCanvas(fig)
    png_output = StringIO.StringIO()
    canvas.print_png(png_output)
    gc.collect()
    return png_output.getvalue()

do_something()

但这有效(正确):

import gc

def do_something():
    canvas = FigureCanvas(fig)
    png_output = StringIO.StringIO()
    canvas.print_png(png_output)
    return png_output.getvalue()

do_something()
gc.collect()

Posting this in case it will help someone in the future.

I had the same issue and I thought the answer provided by axel22 didn't solve the issue for me.

After a bit of tinkering I realized that there were two problems:

  1. I didn't clear the Matplotlib figure, leaving it in memory forever
  2. I was calling the garbage collector in the wrong part of my code

First problem

I was doing something like this (INCORRECT):

fig = util.create_figure(....)
output = io.BytesIO()
canvas = FigureCanvas(fig)
canvas.print_png(output)

but I needed to do this (CORRECT):

fig = util.create_figure(....)
output = io.BytesIO()
canvas = FigureCanvas(fig)
canvas.print_png(output)
# Clears the figure from memory
fig.clf()

Second problem

I was calling the garbage collector in the wrong part of my code. You need to call it outside the scope where FigureCanvas is called.

This DID NOT work (INCORRECT):

import gc

def do_something():
    canvas = FigureCanvas(fig)
    png_output = StringIO.StringIO()
    canvas.print_png(png_output)
    gc.collect()
    return png_output.getvalue()

do_something()

But this worked (CORRECT):

import gc

def do_something():
    canvas = FigureCanvas(fig)
    png_output = StringIO.StringIO()
    canvas.print_png(png_output)
    return png_output.getvalue()

do_something()
gc.collect()
蓝天白云 2024-12-16 08:04:20

当我的网站需要使用 Flask 循环生成一系列图表时,我遇到了与您相同的内存泄漏问题。 matplotlib 的 文档,位于 部分下如何在 Web 应用程序服务器中使用 Matplotlib”,实际上提到了避免使用 matplotlib.pyplot 并使用 matplotlib.figure.Figure 来避免内存占用 泄露。请注意,您需要 Matplotlib 3.1 或更高版本。

取决于您如何构建图表(CLI 与 OO 接口)。 Pyplot 类与Figure 类的交换非常简单。
From:

import matplotlib.pyplot as plt
fig = plt.figure()

To:

from matplotlib.figure import Figure
fig = Figure()

然后将那些不起作用的代码从 CLI API 替换为面向对象的 API。

I ran into the same memory leak issue as you do when my website needed to generate a series of graphs in a loop using Flask. The documentation of matplotlib, under the section "How to use Matplotlib in a web application server", actually mentioned to avoid using matplotlib.pyplot and use matplotlib.figure.Figure instead to avoid memory leak. Please note that you need Matplotlib 3.1 or above.

Depends on how you constructed the graph (CLI vs OO Interface). The swapping of Pyplot class to Figure Class is quite straight forward.
From:

import matplotlib.pyplot as plt
fig = plt.figure()

To:

from matplotlib.figure import Figure
fig = Figure()

And then just replace those codes that don't work from the CLI API to Object oriented API.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文