跟踪 Django/FastCGI 进程错误

发布于 2024-10-17 22:58:22 字数 688 浏览 1 评论 0原文

我在 Nginx 上使用 FastCGI 服务器运行一个基于 Django 的网站。该网站通常运行良好。但每隔 2-3 天,该网站就会遇到未知问题并停止响应任何请求。

Munin 图表显示 IO 块读取和读取问题期间每秒写入增加 500%。

我还编写了一个 python 脚本来每分钟记录以下统计数据。

Load Averages
CPU Usage (user, nice, system, idle, iowait)
RAM Usage
Swap Usage
Number of FastCGI Processes
RAM Used by FastCGI Processes

记录显示,在问题发生期间,FastCGI 进程的数量增加了一倍(从正常值 10-15 增加到 25-30)。 FastCGI 进程的 RAM 使用量也增加了一倍(从服务器总 RAM 的 17% 增加到 35%)。内存使用量的增加需要使用更多的交换空间,因此会减慢磁盘 IO 的速度,导致服务器无响应。

我使用的 FastCGI 参数:

maxspare=10 minspare=5 maxchildren=25 maxrequests=1000 

我猜问题是由于我网站的某些部分的 Python 代码写得不好。但我只是不知道如何找出代码的哪一部分冻结了现有的 FastCGI 进程并分叉了新实例。

I run a Django based site on Nginx with FastCGI server. The site generally works great. But every 2-3 days, the site run into unknown problem and stop responding to any requests.

Munin graphs shows IO blocks read & write per second increases 500% during the problem.

I also wrote a python script to record the the following stats every one minute.

Load Averages
CPU Usage (user, nice, system, idle, iowait)
RAM Usage
Swap Usage
Number of FastCGI Processes
RAM Used by FastCGI Processes

The record shows during the problem, number of FastCGI processes doubled (from normal value of 10-15 to 25-30). And the RAM usage by FastCGI processes also doubled (from 17% to 35% of total RAM on server). The memory usage increase required more swap to be used so it slow down disk IO made the server unresponsive.

FastCGI parameters I used:

maxspare=10 minspare=5 maxchildren=25 maxrequests=1000 

I guess the problem is due to poorly written Python code in some part of my site. But I just don't know how to find out which part of the code froze existing FastCGI processes and forking new instances.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

所谓喜欢 2024-10-24 22:58:22

您已将子进程的数量限制为 25 个,因此当有 25 个进程正在运行并处理请求时,任何其他进程都会被阻止,并且站点将显示为没有响应。

在我看来,您有一个无限(或很长)的循环导致进程阻塞。我建议您向 FastCGI 脚本添加空闲超时。这有望通过终止长时间运行的查询来允许站点继续运行,并让您通过从进程被终止的位置发送回溯来调试问题。

You've limited the number of children to 25 so when there are 25 processes running and processing requests any further ones will block and the site will appear to not be responding.

It sounds to me like you have an infinite (or very long) loop that is causing the processes to block. I suggest you add an idle-timeout to the FastCGI script. This will hopefully have the effect of allowing the site to continue by killing long running queries, and will let you debug the problem by sending tracebacks from the where the processes were killed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文