减少 Django 内存使用。 低垂的果实?
我的内存使用量随着时间的推移而增加,并且重新启动 Django 对用户来说并不友好。
我不确定如何分析内存使用情况,但一些关于如何开始测量的提示会很有用。
我有一种感觉,一些简单的步骤可以产生巨大的收获。 确保“调试”设置为“False”显然是一件大事。
有人可以推荐其他人吗? 在低流量站点上缓存会有多大改进?
在本例中,我使用 mod_python 在 Apache 2.x 下运行。 我听说 mod_wsgi 有点精简,但在这个阶段切换会很棘手,除非我知道收益会很大。
编辑:感谢到目前为止的提示。 有什么建议如何发现什么耗尽了内存? 有 Python 内存分析指南吗?
另外,正如前面提到的,有一些事情会让切换到 mod_wsgi 变得很棘手,所以在朝这个方向前进之前,我想了解一下我可以预期的收益。
编辑:卡尔在这里发布了一个稍微更详细的回复,值得一读: Django 部署:削减 Apache 的开销
编辑: Graham Dumpleton 的文章 是我在 MPM 和 mod_wsgi 相关内容上找到的最好的文章。 但令我相当失望的是,没有人能够提供有关调试应用程序本身内存使用情况的任何信息。
最终编辑: 嗯,我一直在与 Webfaction 讨论这个问题,看看他们是否可以帮助重新编译 Apache,这是他们对此事的看法:
“我真的不认为您会得到太多帮助。通过切换到 MPM Worker + mod_wsgi 设置,我估计您可以节省大约 20MB,但可能不会比这个多多少。”
所以! 这让我回到了最初的问题(我仍然对此一无所知)。 如何找出问题所在? 这是一条众所周知的格言,如果没有测试看看需要优化的地方就不要优化,但是关于测量 Python 内存使用情况的教程很少,而且根本没有专门针对 Django 的教程。
感谢大家的帮助,但我认为这个问题仍然悬而未决!
另一个最终编辑;-)
我在 django-users 列表上询问了这个问题并得到了一些非常有用的回复
老实说是最后一次更新!
这是刚刚发布的。 可能是迄今为止最好的解决方案: 分析 Django Pympler 的对象大小和内存使用情况
My memory usage increases over time and restarting Django is not kind to users.
I am unsure how to go about profiling the memory usage but some tips on how to start measuring would be useful.
I have a feeling that there are some simple steps that could produce big gains. Ensuring 'debug' is set to 'False' is an obvious biggie.
Can anyone suggest others? How much improvement would caching on low-traffic sites?
In this case I'm running under Apache 2.x with mod_python. I've heard mod_wsgi is a bit leaner but it would be tricky to switch at this stage unless I know the gains would be significant.
Edit: Thanks for the tips so far. Any suggestions how to discover what's using up the memory? Are there any guides to Python memory profiling?
Also as mentioned there's a few things that will make it tricky to switch to mod_wsgi so I'd like to have some idea of the gains I could expect before ploughing forwards in that direction.
Edit: Carl posted a slightly more detailed reply here that is worth reading: Django Deployment: Cutting Apache's Overhead
Edit: Graham Dumpleton's article is the best I've found on the MPM and mod_wsgi related stuff. I am rather disappointed that no-one could provide any info on debugging the memory usage in the app itself though.
Final Edit: Well I have been discussing this with Webfaction to see if they could assist with recompiling Apache and this is their word on the matter:
"I really don't think that you will get much of a benefit by switching to an MPM Worker + mod_wsgi setup. I estimate that you might be able to save around 20MB, but probably not much more than that."
So! This brings me back to my original question (which I am still none the wiser about). How does one go about identifying where the problems lies? It's a well known maxim that you don't optimize without testing to see where you need to optimize but there is very little in the way of tutorials on measuring Python memory usage and none at all specific to Django.
Thanks for everyone's assistance but I think this question is still open!
Another final edit ;-)
I asked this on the django-users list and got some very helpful replies
Honestly the last update ever!
This was just released. Could be the best solution yet: Profiling Django object size and memory usage with Pympler
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
确保您没有保留对数据的全局引用。 这可以防止 python 垃圾收集器释放内存。
不要使用
mod_python
。 它在 apache 中加载一个解释器。 如果您需要使用 apache,请改用mod_wsgi
。 切换并不困难。 这很容易。mod_wsgi
比 Brain 更容易配置 django -死mod_python
。如果您可以从您的需求中删除 apache,那对您的记忆会更好。
spawning
似乎是运行 python 的新的快速可扩展方式网络应用程序。编辑:我不认为切换到 mod_wsgi 会变得“棘手”。 这应该是一个非常简单的任务。 请详细说明您在使用交换机时遇到的问题。
Make sure you are not keeping global references to data. That prevents the python garbage collector from releasing the memory.
Don't use
mod_python
. It loads an interpreter inside apache. If you need to use apache, usemod_wsgi
instead. It is not tricky to switch. It is very easy.mod_wsgi
is way easier to configure for django than brain-deadmod_python
.If you can remove apache from your requirements, that would be even better to your memory.
spawning
seems to be the new fast scalable way to run python web applications.EDIT: I don't see how switching to mod_wsgi could be "tricky". It should be a very easy task. Please elaborate on the problem you are having with the switch.
如果您在 mod_wsgi 下运行,并且可能正在生成,因为它符合 WSGI,您可以使用 Dozer 来看看你的内存使用情况。
在 mod_wsgi 下,只需将其添加到 WSGI 脚本的底部:
然后将浏览器指向 http://domain/_dozer/index 查看所有内存分配的列表。
我还将表达我对 mod_wsgi 的支持。 它在性能和内存使用方面与 mod_python 有着天壤之别。 Graham Dumpleton 对 mod_wsgi 的支持非常出色,无论是在积极开发方面还是在帮助邮件列表中的人们优化他们的安装方面。 curse.com 的 David Cramer 发布了一些图表(不幸的是,我现在似乎找不到)显示了在该高流量站点上切换到 mod_wsgi 后,CPU 和内存使用量急剧减少。 一些 django 开发人员已经更换。 说真的,这是显而易见的:)
If you are running under mod_wsgi, and presumably spawning since it is WSGI compliant, you can use Dozer to look at your memory usage.
Under mod_wsgi just add this at the bottom of your WSGI script:
Then point your browser at http://domain/_dozer/index to see a list of all your memory allocations.
I'll also just add my voice of support for mod_wsgi. It makes a world of difference in terms of performance and memory usage over mod_python. Graham Dumpleton's support for mod_wsgi is outstanding, both in terms of active development and in helping people on the mailing list to optimize their installations. David Cramer at curse.com has posted some charts (which I can't seem to find now unfortunately) showing the drastic reduction in cpu and memory usage after they switched to mod_wsgi on that high traffic site. Several of the django devs have switched. Seriously, it's a no-brainer :)
这些是我所知道的 Python 内存分析器解决方案(与 Django 无关):
Python 内存验证器(商业)免责声明:我在后者中拥有股份。
各个项目的文档应该让您了解如何使用这些工具来分析 Python 应用程序的内存行为。
以下是一个不错的“战争故事”,也提供了一些有用的指导:
These are the Python memory profiler solutions I'm aware of (not Django related):
Python Memory Validator (commercial)Disclaimer: I have a stake in the latter.
The individual project's documentation should give you an idea of how to use these tools to analyze memory behavior of Python applications.
The following is a nice "war story" that also gives some helpful pointers:
此外,请检查您是否没有使用任何已知的泄密者。 众所周知,由于 unicode 处理中的错误,MySQLdb 会在 Django 中泄漏大量内存。 除此之外,Django 调试工具栏 可能会帮助您跟踪 hogs。
Additionally, check if you do not use any of known leakers. MySQLdb is known to leak enormous amounts of memory with Django due to bug in unicode handling. Other than that, Django Debug Toolbar might help you to track the hogs.
除了不保留对大型数据对象的全局引用之外,还应尽可能避免将大型数据集加载到内存中。
切换到守护进程模式下的mod_wsgi,并使用Apache的worker mpm代替prefork。 后一步可以让您以更少的内存开销为更多的并发用户提供服务。
In addition to not keeping around global references to large data objects, try to avoid loading large datasets into memory at all wherever possible.
Switch to mod_wsgi in daemon mode, and use Apache's worker mpm instead of prefork. This latter step can allow you to serve many more concurrent users with much less memory overhead.
Webfaction 实际上有一些提示 用于降低 django 内存使用量。
要点:
Webfaction actually has some tips for keeping django memory usage down.
The major points:
mod_wsgi 的另一个优点是:在
WSGIDaemonProcess
指令中设置一个maximum-requests
参数,mod_wsgi 将每隔一段时间重新启动守护进程。 除了第一次访问新进程时页面加载缓慢之外,用户不应该有任何明显的影响,因为它将把 Django 和您的应用程序代码加载到内存中。但即使您确实存在内存泄漏,也应该防止进程大小变得太大,而不必中断对用户的服务。
Another plus for mod_wsgi: set a
maximum-requests
parameter in yourWSGIDaemonProcess
directive and mod_wsgi will restart the daemon process every so often. There should be no visible effect for the user, other than a slow page load the first time a fresh process is hit, as it'll be loading Django and your application code into memory.But even if you do have memory leaks, that should keep the process size from getting too large, without having to interrupt service to your users.
这是我用于 mod_wsgi 的脚本(称为 wsgi.py,并放在我的 django 项目的根目录中):
根据需要调整 myproject.settings 和路径。 我将所有输出重定向到 /dev/null 因为 mod_wsgi 默认情况下会阻止打印。 请改用日志记录。
对于 apache:
希望这至少可以帮助您设置 mod_wsgi,以便您可以看看它是否有所不同。
Here is the script I use for mod_wsgi (called wsgi.py, and put in the root off my django project):
Adjust myproject.settings and the path as needed. I redirect all output to /dev/null since mod_wsgi by default prevents printing. Use logging instead.
For apache:
Hopefully this should at least help you set up mod_wsgi so you can see if it makes a difference.
缓存:确保它们被刷新。 某些东西很容易进入缓存,但由于缓存引用而永远不会被 GC 处理。
Swig 代码:确保所有内存管理都正确完成,在 python 中很容易错过这些,特别是使用第三方库
监控:如果可以的话,获取有关内存使用和点击的数据。 通常您会看到某种类型的请求与内存使用之间的相关性。
Caches: make sure they're being flushed. Its easy for something to land in a cache, but never be GC'd because of the cache reference.
Swig'd code: Make sure any memory management is being done correctly, its really easy to miss these in python, especially with third party libraries
Monitoring: If you can, get data about memory usage and hits. Usually you'll see a correlation between a certain type of request and memory usage.
我们在 Django 中偶然发现了一个带有大站点地图(10.000 个项目)的错误。 似乎 Django 在生成站点地图时试图将它们全部加载到内存中: http://code.djangoproject.com/ Ticket/11572 - 当 Google 访问该网站时,有效地终止 apache 进程。
We stumbled over a bug in Django with big sitemaps (10.000 items). Seems Django is trying to load them all in memory when generating the sitemap: http://code.djangoproject.com/ticket/11572 - effectively kills the apache process when Google pays a visit to the site.