我最近开始尝试使用 Python 进行 Web 开发。到目前为止,我已经在使用 Apache 和 mod_wsgi 以及 Python 2.7 的 Django Web 框架方面取得了一些成功。然而,我遇到了一些让进程不断运行、更新信息等问题。
我编写了一个名为“daemonManager.py”的脚本,它可以启动和停止所有或单个 python 更新循环(我应该称它们为守护进程吗?)。它通过分叉,然后加载应运行的特定函数的模块并启动无限循环来实现这一点。它将 PID 文件保存在 /var/run
中以跟踪进程。到目前为止,一切都很好。我遇到的问题是:
-
有时某个进程会退出。我早上检查ps
,发现这个过程就消失了。没有记录任何错误(我正在使用 logging
模块),并且我涵盖了我能想到的所有异常并记录它们。另外,我不认为这些退出进程与我的代码有任何关系,因为我的所有进程都运行完全不同的代码,并以非常相似的时间间隔退出。我当然可能是错的。 Python 进程在运行几天/几周后就死掉是正常的吗?我应该如何解决这个问题?我是否应该编写另一个守护进程来定期检查其他守护进程是否仍在运行?如果该守护进程停止了怎么办?我不知道如何处理这个问题。
-
如何以编程方式知道进程是否仍在运行?我将 PID 文件保存在 /var/run
中,并检查 PID 文件是否存在以确定进程是否正在运行。但如果进程因意外原因而终止,则 PID 文件将保留。因此,每次进程崩溃时(每周几次),我都必须删除这些文件,这有点达不到目的。我想我可以检查一个进程是否正在文件中的 PID 上运行,但是如果另一个进程已经启动并被分配了死进程的 PID 该怎么办?我的守护进程会认为该进程运行良好,即使它早已死亡。我再次不知道如何处理这个问题。
任何关于如何最好运行无限Python进程的有用答案,希望也能阐明上述问题,我会接受
我在 Ubuntu 机器上使用 Apache 2.2.14。
我的Python版本是2.7.2
I've recently started experimenting with using Python for web development. So far I've had some success using Apache with mod_wsgi and the Django web framework for Python 2.7. However I have run into some issues with having processes constantly running, updating information and such.
I have written a script I call "daemonManager.py" that can start and stop all or individual python update loops (Should I call them Daemons?). It does that by forking, then loading the module for the specific functions it should run and starting an infinite loop. It saves a PID file in /var/run
to keep track of the process. So far so good. The problems I've encountered are:
-
Now and then one of the processes will just quit. I check ps
in the morning and the process is just gone. No errors were logged (I'm using the logging
module), and I'm covering every exception I can think of and logging them. Also I don't think these quitting processes has anything to do with my code, because all my processes run completely different code and exit at pretty similar intervals. I could be wrong of course. Is it normal for Python processes to just die after they've run for days/weeks? How should I tackle this problem? Should I write another daemon that periodically checks if the other daemons are still running? What if that daemon stops? I'm at a loss on how to handle this.
-
How can I programmatically know if a process is still running or not? I'm saving the PID files in /var/run
and checking if the PID file is there to determine whether or not the process is running. But if the process just dies of unexpected causes, the PID file will remain. I therefore have to delete these files every time a process crashes (a couple of times per week), which sort of defeats the purpose. I guess I could check if a process is running at the PID in the file, but what if another process has started and was assigned the PID of the dead process? My daemon would think that the process is running fine even if it's long dead. Again I'm at a loss just how to deal with this.
Any useful answer on how to best run infinite Python processes, hopefully also shedding some light on the above problems, I will accept
I'm using Apache 2.2.14 on an Ubuntu machine.
My Python version is 2.7.2
发布评论
评论(3)
首先我要声明,这是管理长时间运行的进程 (LRP) 的一种方式——无论如何都不是事实上的方式。
根据我的经验,最好的产品来自于专注于您正在处理的特定问题,同时将支持技术委托给其他图书馆。在本例中,我指的是后台进程(双叉的艺术)、监视和日志重定向的行为。
我最喜欢的解决方案是http://supervisord.org/
使用像supervisord这样的系统,你基本上可以编写一个传统的python脚本,在卡住时执行任务处于“无限”循环中。
以这种方式编写脚本使开发和调试变得简单方便(您可以轻松地在终端中启动/停止它,随着事件的展开观察日志输出)。当需要投入生产时,您只需定义一个调用脚本的主管配置(这是定义“程序”的完整示例,其中大部分是可选的:http://supervisord.org/configuration.html#program-x-section-example)。
Supervisor 有一堆配置选项,所以我不会枚举它们,但我会说它专门解决了您描述的问题:
I'll open by stating that this is one way to manage a long running process (LRP) -- not de facto by any stretch.
In my experience, the best possible product comes from concentrating on the specific problem you're dealing with, while delegating supporting tech to other libraries. In this case, I'm referring to the act of backgrounding processes (the art of the double fork), monitoring, and log redirection.
My favorite solution is http://supervisord.org/
Using a system like supervisord, you basically write a conventional python script that performs a task while stuck in an "infinite" loop.
Writing your script this way makes it simple and convenient to develop and debug (you can easily start/stop it in a terminal, watching the log output as events unfold). When it comes time to throw into production, you simply define a supervisor config that calls your script (here's the full example for defining a "program", much of which is optional: http://supervisord.org/configuration.html#program-x-section-example).
Supervisor has a bunch of configuration options so I won't enumerate them, but I will say that it specifically solves the problems you describe:
假设您的程序、Python 解释器或您正在使用的任何 Python 库/模块中没有任何内存泄漏,您应该将 Python 进程视为能够“永远”运行。 (即使面临内存泄漏,如果 64 位机器上有足够的交换空间,您也可以永远运行。几十年,如果不是几个世纪,应该是可行的。我已经让 Python 进程存活了近乎完美)在有限的硬件上运行了两年——在硬件需要移动之前。)
当 Linux 发行版使用 SysV 风格
init
-- 只需在/etc/inittab
和init(8) 中添加新行即可
将在启动时生成您的程序,并在程序死掉时重新生成它。 (我知道没有机制可以使用新的upstart
复制此功能init
- 如今许多发行版都在使用它,我并不是说这是不可能的,我只是不知道如何做到这一点。)但即使是
init(8)<。 /code> 过去几年的机制并不像某些机制那么灵活会喜欢的。 DJB 的 daemontools 包是旨在保持守护进程正常运行的进程控制和监视工具的一个示例永远。 Linux-HA 套件提供了另一个类似的工具,尽管它可能提供太多“额外”功能来证明此任务的合理性。
monit
是另一种选择。You should consider Python processes as able to run "forever" assuming you don't have any memory leaks in your program, the Python interpreter, or any of the Python libraries / modules that you are using. (Even in the face of memory leaks, you might be able to run forever if you have sufficient swap space on a 64-bit machine. Decades, if not centuries, should be doable. I've had Python processes survive just fine for nearly two years on limited hardware -- before the hardware needed to be moved.)
Ensuring programs restart when they die used to be very simple back when Linux distributions used SysV-style
init
-- you just add a new line to the/etc/inittab
andinit(8)
would spawn your program at boot and re-spawn it if it dies. (I know of no mechanism to replicate this functionality with the newupstart
init
-replacement that many distributions are using these days. I'm not saying it is impossible, I just don't know how to do it.)But even the
init(8)
mechanism of years gone by wasn't as flexible as some would have liked. The daemontools package by DJB is one example of process control-and-monitoring tools intended to keep daemons living forever. The Linux-HA suite provides another similar tool, though it might provide too much "extra" functionality to be justified for this task.monit
is another option.我假设你正在运行 Unix/Linux,但你并没有真正说。我对你的问题没有直接的建议。所以我不希望成为这个问题的“正确”答案。但这里有一些值得探索的地方。
首先,如果你的守护进程崩溃了,你应该修复它。只有存在错误的程序才会崩溃。也许您应该在调试器下启动它们,看看它们崩溃时会发生什么(如果可能的话)。您在这些过程中有任何跟踪记录吗?如果没有,请添加它们。这可能有助于诊断您的崩溃。
其次,您的守护进程是否提供服务(打开管道并等待请求)或者是否执行定期清理?如果它们是定期清理进程,您应该使用 cron 定期启动它们,而不是让它们无限循环运行。 Cron 进程应该优先于守护进程。同样,如果它们是打开端口和服务请求的服务,您是否考虑过让它们与 INETD 一起工作?同样,单个守护进程 (inetd) 应该优于一堆守护进程。
第三,正如您所发现的,将 PID 保存在文件中并不是非常有效。也许共享 IPC(如信号量)会工作得更好。不过我这里没有任何细节。
第四,有时我需要在网站上下文中运行一些东西。我使用一个 cron 进程,通过维护 URL 调用 wget。您设置一个特殊的 cookie 并将 cookie 信息包含在 wget 命令行中。如果特殊cookie不存在,则返回403而不执行维护过程。这里的另一个好处是登录数据库和避免其他环境问题,因为服务于正常网页的代码正在服务于维护过程。
希望能给您带来想法。我认为如果可以的话避免守护进程是最好的起点。如果你可以在 mod_wsgi 中运行你的 python,那么你就不必支持多个“环境”。调试一次运行数天后失败的进程是残酷的。
I assume you are running Unix/Linux but you don't really say. I have no direct advice on your issue. So I don't expect to be the "right" answer to this question. But there is something to explore here.
First, if your daemons are crashing, you should fix that. Only programs with bugs should crash. Perhaps you should launch them under a debugger and see what happens when they crash (if that's possible). Do you have any trace logging in these processes? If not, add them. That might help diagnose your crash.
Second, are your daemons providing services (opening pipes and waiting for requests) or are they performing periodic cleanup? If they are periodic cleanup processes you should use cron to launch them periodically rather then have them run in an infinite loop. Cron processes should be preferred over daemon processes. Similarly, if they are services that open ports and service requests, have you considered making them work with INETD? Again, a single daemon (inetd) should be preferred to a bunch of daemon processes.
Third, saving a PID in a file is not very effective, as you've discovered. Perhaps a shared IPC, like a semaphore, would work better. I don't have any details here though.
Fourth, sometimes I need stuff to run in the context of the website. I use a cron process that calls wget with a maintenance URL. You set a special cookie and include the cookie info in with wget command line. If the special cookie doesn't exist, return 403 rather than performing the maintenance process. The other benefit here is login to the database and other environmental concerns of avoided since the code that serves normal web pages are serving the maintenance process.
Hope that gives you ideas. I think avoiding daemons if you can is the best place to start. If you can run your python within mod_wsgi that saves you having to support multiple "environments". Debugging a process that fails after running for days at a time is just brutal.