Lighttpd + fastcgi + django：由于意外的 EOF，发送给客户端的响应被截断

发布于 2024-09-24 03:06:20 字数 4351 浏览 9 评论 0原文

我正在尝试让我的基于 Django 的 web 应用程序进入工作部署配置，并且在花了很多时间尝试让它在 lighttpd / fastcgi 下工作之后，无法解决这个问题。当客户端第一次登录时，他们会从服务器收到大量数据转储，该数据转储被分成几个约 1MB 大小的块，并以 JSON 形式发回。

每隔一段时间，客户端就会收到其中一个块的截断响应，我会在lighttpd日志中看到这条消息：

2010-09-14 23:25:01: (mod_fastcgi.c.2582) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:8000 
2010-09-14 23:25:01: (mod_fastcgi.c.3382) response already sent out, but backend returned error on socket: tcp:127.0.0.1:8000 for /myapp.fcgi?, terminating connection

我真的很费劲，试图找出为什么会发生这种情况（运行时不会发生这种情况） Django 处于 ./manage.py runserver 模式）。以下是我尝试过但没有效果的方法：

将块大小从 1MB 减小到 256K。尽管截断通常发生在 600K - 900K 左右，但我仍然在 256K 块大小以下发生截断。
将 Django 的 runfgci 上的 minspare 和 maxchildren 值设置得非常高，以便有大量空闲线程闲置。
将 maxchildren 设置为 1，以便只有一个线程。
lighttpd 和 Django 之间的 fastcgi 连接在 UNIX 套接字模式和 TCP/IP 模式之间切换。
lighttpd 和 Django

我在 Google 上搜索了很多这些东西，但找不到任何似乎可以修复 Django 的东西（任何帮助似乎都是围绕调整 PHP 设置）。

我的设置是：

OSX 10.6.4
Python 2.6.1（系统）
从 Macports 安装的 lighttpd（1.4.1） 26_1+ssl)
flup 从 flup 网站上最新的 Python Egg 安装（尝试了 1.0.2 stable 和最新的 1.0.3 devel）
从 Django 网站上的 tarball 安装的 Django 1.2.1

1.2.1我的 lighttpd 配置中的 FastCGI 块是：

fastcgi.server             = ("/myapp.fcgi" =>
                               ("django" =>
                                 (
                                  #"socket" => lighttpd_base + "fcgi.sock",
                                  "host" => "127.0.0.1",
                                  "port" => 8000,
                                  "check-local" => "disable",
                                  "max-procs" => 1,
                                  "debug" => 1
                                 )
                               )
                             )

我用来启动 Django 的 runfcgi 命令当前是：

./manage.py runfcgi daemonize=false debug=true host=127.0.0.1 port=8000 
method=threaded maxchildren=1

如果有人对如何阻止这种情况发生有任何见解，我们将非常感谢您的帮助。如果我不能相对快速地解决这个问题，我将不得不放弃 lighttpd + fastcgi 并查看 Apache + mod_wsgi 或者 nginx + fastcgi，并且进入另一个网络服务器配置的前景并不是我所期待的......

预先感谢您的任何帮助。

编辑：附加信息

我在lighty论坛上找到了此页面这可能是 Django 的错……在这种情况下，可能是 PHP 崩溃了。我检查了 Django 端的内容，发现即使在截断之后，发送截断响应的 Python 线程之后仍将运行，并将服务后续请求，因此看起来流并未因线程遇到异常而中断并崩溃了。

我想弄清楚这里是否是 Django 的 fcgi impl 或 Lighttpd 出了问题（因为这将决定迁移到 nginx + fastcgi 是否能真正解决任何问题），所以我查看了 Wireshark 中的数据包跟踪。截断之前发生的情况的简化日志如下：

No.     Time        Info
30082   233.411743  django > lighttpd [PSH, ACK] Seq=860241 Ack=869 Win=524280 Len=8184 TSV=417114153 TSER=417114153
30083   233.411749  lighttpd > django [ACK] Seq=869 Ack=868425 Win=524280 Len=0 TSV=417114153 TSER=417114153
30084   233.412235  django > lighttpd [PSH, ACK] Seq=868425 Ack=869 Win=524280 Len=8 TSV=417114153 TSER=417114153
30085   233.412250  lighttpd > django [ACK] Seq=869 Ack=868433 Win=524280 Len=0 TSV=417114153 TSER=417114153
30086   233.412615  django > lighttpd [PSH, ACK] Seq=868433 Ack=869 Win=524280 Len=8184 TSV=417114153 TSER=417114153
30087   233.412628  lighttpd > django [ACK] Seq=869 Ack=876617 Win=524280 Len=0 TSV=417114153 TSER=417114153
30088   233.412723  lighttpd > django [FIN, ACK] Seq=869 Ack=876617 Win=524280 Len=0 TSV=417114153 TSER=417114153
30089   233.412734  django > lighttpd [ACK] Seq=876617 Ack=870 Win=524280 Len=0 TSV=417114153 TSER=417114153
30090   233.412740  [TCP Dup ACK 30088#1] lighttpd > django [ACK] Seq=870 Ack=876617 Win=524280 Len=0 TSV=417114153 TSER=417114153
30091   233.413051  django > lighttpd [PSH, ACK] Seq=876617 Ack=870 Win=524280 Len=8 TSV=417114153 TSER=417114153
30092   233.413070  lighttpd > django [RST] Seq=870 Win=0 Len=0

良好的数据包在开始时来自 Django（30082 为 8184 字节，然后再次在 30086 为另外 8184 字节），然后由于某种原因在条目 30088 处 Lighttpd 发送 TCP FIN 到 Django，这可能是导致连接终止的原因，这就是您获得截断的方式。

从表面上看，这似乎是 Lighttpd 的错，因为它看起来像是在应该关闭之前关闭了东西......尽管我不确定它是否没有这样做，因为它收到了一些错误的数据来自 Django，它的反应是关闭。

原文

I'm trying to get my Django based webapp into a working deployment configuration, and after spending a bunch of time trying to get it working under lighttpd / fastcgi, can't get past this problem. When a client logs in for the first time, they receive a large data dump from the server, which is broken into several ~1MB size chunks that are sent back as JSON.

Every so often, the client will receive a truncated response for one of the chunks, I will see this message in the lighttpd logs:

2010-09-14 23:25:01: (mod_fastcgi.c.2582) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:8000 
2010-09-14 23:25:01: (mod_fastcgi.c.3382) response already sent out, but backend returned error on socket: tcp:127.0.0.1:8000 for /myapp.fcgi?, terminating connection

I'm really pulling my hair out trying to figure out why this happens (which doesn't happen when running Django in ./manage.py runserver mode). The following are things I've tried that have had no effect:

Reducing the chunk size from 1MB to 256K. Even though the truncation usually happens at around the 600K - 900K mark, I still got truncations under the 256K chunk size.
Setting the minspare and maxchildren values on Django's runfgci really high so that there will be lots of spare threads hanging around.
Setting maxchildren to 1 so that there is only one thread.
Switching between UNIX socket mode and TCP/IP mode for the fastcgi connection between lighttpd and Django.

I've Googled a lot for this stuff but couldn't find anything that seemed to be a fix for Django (any help seemed to be around tweaking PHP settings).

My setup is:

OSX 10.6.4
Python 2.6.1 (system)
lighttpd installed from Macports (1.4.26_1+ssl)
flup installed from latest Python egg on flup website (tried both 1.0.2 stable and latest 1.0.3 devel)
Django 1.2.1 installed from tarball on Django website

The FastCGI block in my lighttpd config is:

fastcgi.server             = ("/myapp.fcgi" =>
                               ("django" =>
                                 (
                                  #"socket" => lighttpd_base + "fcgi.sock",
                                  "host" => "127.0.0.1",
                                  "port" => 8000,
                                  "check-local" => "disable",
                                  "max-procs" => 1,
                                  "debug" => 1
                                 )
                               )
                             )

The runfcgi command I'm using to start Django is currently:

./manage.py runfcgi daemonize=false debug=true host=127.0.0.1 port=8000 
method=threaded maxchildren=1

If anyone has any insight into how to stop this from happening, the help would be much appreciated. If I can't solve this relatively quickly I will have to abandon lighttpd + fastcgi and look at Apache + mod_wsgi or perhaps nginx + fastcgi, and the prospect of going into another webserver config is not something I'm looking forward to ...

Thanks in advance for any help.

Edit: Additional Info

I found this page on the lighty forums indicating that it could be Django's fault ... in that case it was that PHP was crashing. I checked my Django-side stuff and discovered that even after a truncation, the Python thread that sent the truncated response would still be running afterwards and would serve subsequent requests, so it looks like the stream is not being broken by the thread hitting an exception and crashing out.

I wanted to figure out whether or not it was Django's fcgi impl or Lighttpd that was at fault here (because that will determine whether or not moving to nginx + fastcgi would actually solve anything), so I took a look at the packet trace in Wireshark. The simplified log of what happens just before a truncation is below:

No.     Time        Info
30082   233.411743  django > lighttpd [PSH, ACK] Seq=860241 Ack=869 Win=524280 Len=8184 TSV=417114153 TSER=417114153
30083   233.411749  lighttpd > django [ACK] Seq=869 Ack=868425 Win=524280 Len=0 TSV=417114153 TSER=417114153
30084   233.412235  django > lighttpd [PSH, ACK] Seq=868425 Ack=869 Win=524280 Len=8 TSV=417114153 TSER=417114153
30085   233.412250  lighttpd > django [ACK] Seq=869 Ack=868433 Win=524280 Len=0 TSV=417114153 TSER=417114153
30086   233.412615  django > lighttpd [PSH, ACK] Seq=868433 Ack=869 Win=524280 Len=8184 TSV=417114153 TSER=417114153
30087   233.412628  lighttpd > django [ACK] Seq=869 Ack=876617 Win=524280 Len=0 TSV=417114153 TSER=417114153
30088   233.412723  lighttpd > django [FIN, ACK] Seq=869 Ack=876617 Win=524280 Len=0 TSV=417114153 TSER=417114153
30089   233.412734  django > lighttpd [ACK] Seq=876617 Ack=870 Win=524280 Len=0 TSV=417114153 TSER=417114153
30090   233.412740  [TCP Dup ACK 30088#1] lighttpd > django [ACK] Seq=870 Ack=876617 Win=524280 Len=0 TSV=417114153 TSER=417114153
30091   233.413051  django > lighttpd [PSH, ACK] Seq=876617 Ack=870 Win=524280 Len=8 TSV=417114153 TSER=417114153
30092   233.413070  lighttpd > django [RST] Seq=870 Win=0 Len=0

Good packets are coming from Django at the start (30082 for 8184 bytes, and then again at 30086 for another 8184 bytes) and then at entry 30088 for some reason Lighttpd sends a TCP FIN to Django which is presumably what causes the connection to terminate and that's how you get the truncation.

On the face of it, it seems like this is Lighttpd's fault, since it looks like it is shutting things down before it's supposed to ... although I'm not sure that it isn't doing this because it has received some bad data from Django to which it reacts by shutting down.

分享到QQ

分享到微博