跟踪 Django/FastCGI 进程错误

发布于 2024-10-17 22:58:22 字数 688 浏览 1 评论 0原文

我在 Nginx 上使用 FastCGI 服务器运行一个基于 Django 的网站。该网站通常运行良好。但每隔 2-3 天，该网站就会遇到未知问题并停止响应任何请求。

Munin 图表显示 IO 块读取和读取问题期间每秒写入增加 500%。

我还编写了一个 python 脚本来每分钟记录以下统计数据。

Load Averages
CPU Usage (user, nice, system, idle, iowait)
RAM Usage
Swap Usage
Number of FastCGI Processes
RAM Used by FastCGI Processes

记录显示，在问题发生期间，FastCGI 进程的数量增加了一倍（从正常值 10-15 增加到 25-30）。 FastCGI 进程的 RAM 使用量也增加了一倍（从服务器总 RAM 的 17% 增加到 35%）。内存使用量的增加需要使用更多的交换空间，因此会减慢磁盘 IO 的速度，导致服务器无响应。

我使用的 FastCGI 参数：

maxspare=10 minspare=5 maxchildren=25 maxrequests=1000

我猜问题是由于我网站的某些部分的 Python 代码写得不好。但我只是不知道如何找出代码的哪一部分冻结了现有的 FastCGI 进程并分叉了新实例。

原文

I run a Django based site on Nginx with FastCGI server. The site generally works great. But every 2-3 days, the site run into unknown problem and stop responding to any requests.

Munin graphs shows IO blocks read & write per second increases 500% during the problem.

I also wrote a python script to record the the following stats every one minute.

Load Averages
CPU Usage (user, nice, system, idle, iowait)
RAM Usage
Swap Usage
Number of FastCGI Processes
RAM Used by FastCGI Processes

The record shows during the problem, number of FastCGI processes doubled (from normal value of 10-15 to 25-30). And the RAM usage by FastCGI processes also doubled (from 17% to 35% of total RAM on server). The memory usage increase required more swap to be used so it slow down disk IO made the server unresponsive.

FastCGI parameters I used:

maxspare=10 minspare=5 maxchildren=25 maxrequests=1000

I guess the problem is due to poorly written Python code in some part of my site. But I just don't know how to find out which part of the code froze existing FastCGI processes and forking new instances.

分享到QQ

分享到微博