跟踪 Django/FastCGI 进程错误
我在 Nginx 上使用 FastCGI 服务器运行一个基于 Django 的网站。该网站通常运行良好。但每隔 2-3 天,该网站就会遇到未知问题并停止响应任何请求。
Munin 图表显示 IO 块读取和读取问题期间每秒写入增加 500%。
我还编写了一个 python 脚本来每分钟记录以下统计数据。
Load Averages
CPU Usage (user, nice, system, idle, iowait)
RAM Usage
Swap Usage
Number of FastCGI Processes
RAM Used by FastCGI Processes
记录显示,在问题发生期间,FastCGI 进程的数量增加了一倍(从正常值 10-15 增加到 25-30)。 FastCGI 进程的 RAM 使用量也增加了一倍(从服务器总 RAM 的 17% 增加到 35%)。内存使用量的增加需要使用更多的交换空间,因此会减慢磁盘 IO 的速度,导致服务器无响应。
我使用的 FastCGI 参数:
maxspare=10 minspare=5 maxchildren=25 maxrequests=1000
我猜问题是由于我网站的某些部分的 Python 代码写得不好。但我只是不知道如何找出代码的哪一部分冻结了现有的 FastCGI 进程并分叉了新实例。
I run a Django based site on Nginx with FastCGI server. The site generally works great. But every 2-3 days, the site run into unknown problem and stop responding to any requests.
Munin graphs shows IO blocks read & write per second increases 500% during the problem.
I also wrote a python script to record the the following stats every one minute.
Load Averages
CPU Usage (user, nice, system, idle, iowait)
RAM Usage
Swap Usage
Number of FastCGI Processes
RAM Used by FastCGI Processes
The record shows during the problem, number of FastCGI processes doubled (from normal value of 10-15 to 25-30). And the RAM usage by FastCGI processes also doubled (from 17% to 35% of total RAM on server). The memory usage increase required more swap to be used so it slow down disk IO made the server unresponsive.
FastCGI parameters I used:
maxspare=10 minspare=5 maxchildren=25 maxrequests=1000
I guess the problem is due to poorly written Python code in some part of my site. But I just don't know how to find out which part of the code froze existing FastCGI processes and forking new instances.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您已将子进程的数量限制为 25 个,因此当有 25 个进程正在运行并处理请求时,任何其他进程都会被阻止,并且站点将显示为没有响应。
在我看来,您有一个无限(或很长)的循环导致进程阻塞。我建议您向 FastCGI 脚本添加空闲超时。这有望通过终止长时间运行的查询来允许站点继续运行,并让您通过从进程被终止的位置发送回溯来调试问题。
You've limited the number of children to 25 so when there are 25 processes running and processing requests any further ones will block and the site will appear to not be responding.
It sounds to me like you have an infinite (or very long) loop that is causing the processes to block. I suggest you add an idle-timeout to the FastCGI script. This will hopefully have the effect of allowing the site to continue by killing long running queries, and will let you debug the problem by sending tracebacks from the where the processes were killed.