Capistrano 未正确重启 Mongrel 集群
我有一个由三个杂种组成的集群在 nginx 下运行,并使用 Capistrano 2.4.3 部署该应用程序。 当我在系统正在运行时“部署上限”时,行为是:
- 应用程序已部署。 代码已成功更新。
在 cap 部署输出中,有这样的内容:
- 执行“sudo -p 'sudo 密码:' mongrel_rails cluster::restart -C /var/www/rails/myapp/current/config/mongrel_cluster.yml"
- 服务器:[“myip”]
- [myip]执行命令
- ** [out::myip] 停止端口 9096
- ** [out::myip] 停止端口 9097
- ** [out::myip] 停止端口 9098
- ** [out::myip] 已启动端口 9096
- ** [out::myip] 已启动端口 9097
- ** [out::myip] 已启动端口 9098
- 我检查立即在服务器上发现 Mongrel 仍在运行,并且前三个实例的 PID 文件仍然存在
- 不久后(不到一分钟),我发现 Mongrel 不再运行,PID 文件消失了。 ,并且它无法重新启动。
- 如果我手动在服务器上启动 mongrel,则应用程序启动得很好,
看起来“mongrel_rails cluster::restart”没有正确等待完全停止 。 在尝试重新启动集群之前。 如何诊断并解决此问题?
编辑:这是答案:
mongrel_cluster 在“重新启动”任务中,只是执行此操作:
def run
stop
start
end
它不会执行任何等待或检查以查看进程在调用“启动”之前是否退出。 这是一个已知错误,具有突出的问题补丁已提交。 我将补丁应用于 Mongrel Cluster,问题就消失了。
I have a cluster of three mongrels running under nginx, and I deploy the app using Capistrano 2.4.3. When I "cap deploy" when there is a running system, the behavior is:
- The app is deployed. The code is successfully updated.
In the cap deploy output, there is this:
- executing "sudo -p 'sudo password: '
mongrel_rails cluster::restart -C
/var/www/rails/myapp/current/config/mongrel_cluster.yml" - servers: ["myip"]
- [myip] executing command
- ** [out :: myip] stopping port 9096
- ** [out :: myip] stopping port 9097
- ** [out :: myip] stopping port 9098
- ** [out :: myip] already started port 9096
- ** [out :: myip] already started port 9097
- ** [out :: myip] already started port 9098
- executing "sudo -p 'sudo password: '
- I check immediately on the server and find that Mongrel is still running, and the PID files are still present for the previous three instances.
- A short time later (less than one minute), I find that Mongrel is no longer running, the PID files are gone, and it has failed to restart.
- If I start mongrel on the server by hand, the app starts up just fine.
It seems like 'mongrel_rails cluster::restart' isn't properly waiting for a full stop
before attempting a restart of the cluster. How do I diagnose and fix this issue?
EDIT: Here's the answer:
mongrel_cluster, in the "restart" task, simply does this:
def run
stop
start
end
It doesn't do any waiting or checking to see that the process exited before invoking "start". This is a known bug with an outstanding patch submitted. I applied the patch to Mongrel Cluster and the problem disappeared.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以通过在 capistrano 配方中添加以下内容,显式告诉 mongrel_cluster 配方在启动前删除 pid 文件:
这会导致它将 --clean 选项传递给 mongrel_cluster_ctl。
我回去查看了我的部署方案之一,发现我还更改了重新启动任务的工作方式。 查看 mongrel 用户组中的以下消息:
mongrel 用户讨论的restart
下面是我的deploy:restart任务。 我承认这有点像黑客。
You can explicitly tell the mongrel_cluster recipes to remove the pid files before a start by adding the following in your capistrano recipes:
This causes it to pass the --clean option to mongrel_cluster_ctl.
I went back and looked at one of my deployment recipes and noticed that I had also changed the way my restart task worked. Take a look at the following message in the mongrel users group:
mongrel users discussion of restart
The following is my deploy:restart task. I admit it's a bit of a hack.
首先,仅通过调用
cap deploy:restart
来缩小测试范围。 您可能希望在远程执行之前传递--debug
选项进行提示,或者传递--dry-run
选项来查看调整设置时发生的情况。乍一看,这听起来像是 pid 文件或杂种进程的权限问题,但很难确定。 有几件事引起了我的注意:
:runner
变量被显式设置为nil
——这有什么具体原因吗?:admin_runner
变量的新行为。 如果没有看到整个食谱,这可能与您的问题有关吗?<块引用>
:runner 与 :admin_runner(来自 capistrano 2.4 版本)
一些封顶者注意到,以 :runner 用户身份运行部署:设置和部署:清理会弄乱他们精心设计的权限。 我同意这是一个问题。 在此版本中,deploy:start、deploy:stop和deploy:restart在sudoing时都继续使用:runner用户,但deploy:setup和deploy:cleanup将使用:admin_runner用户。 默认情况下, :admin_runner 变量未设置,这意味着这些任务将以 root 身份运行,但如果您希望它们以 :runner 身份运行,只需执行“set :admin_runner, runner”即可。
我对下一步该做什么的建议。 手动停止杂种并清理 PID。 手动启动杂种。 接下来,在调试问题时继续运行
cap deploy:restart
。 根据需要重复。First, narrow the scope of what your testing by only calling
cap deploy:restart
. You might want to pass the--debug
option to prompt before remote execution or the--dry-run
option just to see what's going on as you tweak your settings.At first glance, this sounds like a permissions issue on the pid files or mongrel processes, but it's difficult to know for sure. A couple things that catch my eye are:
:runner
variable is explicity set tonil
-- Was there a specific reason for this?:admin_runner
variable. Without seeing the entire recipe, is this possibly related to your problem?My recommendation for what to do next. Manually stop the mongrels and clean up the PIDs. Start the mongrels manually. Next, continue to run
cap deploy:restart
while debugging the problem. Repeat as necessary.不管怎样,我的杂种在上一个停止命令完成关闭它们之前就开始了。
如果停止所有正在运行的杂种程序需要超过 2.5 秒的时间,那么 sleep 2.5 并不是一个好的解决方案。
似乎需要:
vs.
(这就是 bash 的工作方式, && 等待第一个命令完成而不会出现错误,而“;”只是运行下一个命令)。
我想知道是否有:
Either way, my mongrels are starting before the previous stop command has finished shutting 'em all down.
sleep 2.5 is not a good solution, if it takes longer than 2.5 seconds to halt all running mongrels.
There seems to be a need for:
vs.
(this is how bash works, && waits for the first command to finish w/o error, while ";" simply runs the next command).
I wonder if there is a:
我不想这么简单,但听起来当它试图启动时 pid 文件仍然存在。 确保用手阻止杂种。 手动清理 pid 文件。 然后进行上限部署。
I hate to be so basic, but it sounds like the pid files are still hanging around when it is trying to start. Make sure that mongrel is stopped by hand. Clean up the pid files by hand. Then do a cap deploy.
很好的讨论: http://www.ruby-forum.com/topic/139734#745030
Good discussion: http://www.ruby-forum.com/topic/139734#745030