如何阻止 God 留下过时的 Resque 工作进程?
我试图了解如何以停止的方式与上帝一起监视 travis-ci 的 resque 工作人员通过上帝的 resque watch 不会留下陈旧的工作进程。
下面我讨论的是工作进程,而不是分叉的作业子进程(即队列始终为空)。
当我像这样手动启动 resque 工作程序时:
$ QUEUE=builds rake resque:work
我将得到一个进程:
$ ps x | grep resque
7041 s001 S+ 0:05.04 resque-1.13.0: Waiting for builds
一旦我停止工作程序任务,该进程就会消失。
但是当我与上帝开始同样的事情时(确切的配置在这里,基本上与 resque/god 示例 相同),如下所示...
$ RAILS_ENV=development god -c config/resque.god -D
I [2011-03-27 22:49:15] INFO: Loading config/resque.god
I [2011-03-27 22:49:15] INFO: Syslog enabled.
I [2011-03-27 22:49:15] INFO: Using pid file directory: /Volumes/Users/sven/.god/pids
I [2011-03-27 22:49:15] INFO: Started on drbunix:///tmp/god.17165.sock
I [2011-03-27 22:49:15] INFO: resque-0 move 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is not running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 start: cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
I [2011-03-27 22:49:15] INFO: resque-0 moved 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 [ok] memory within bounds [784kb] (MemoryUsage)
I [2011-03-27 22:49:15] INFO: resque-0 [ok] process is running (ProcessRunning)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] memory within bounds [784kb, 784kb] (MemoryUsage)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] process is running (ProcessRunning)
然后我会得到一个额外的进程:
$ ps x | grep resque
7187 ?? Ss 0:00.02 sh -c cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
7188 ?? S 0:05.11 resque-1.13.0: Waiting for builds
7183 s001 S+ 0:01.18 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D
上帝似乎只记录第一个进程的 pid:
$ cat ~/.god/pids/resque-0.pid
7187
当我通过上帝停止 resque 手表时:
$ god stop resque
Sending 'stop' command
The following watches were affected:
resque-0
上帝给出了这个日志输出:
I [2011-03-27 22:51:22] INFO: resque-0 stop: default lambda killer
I [2011-03-27 22:51:22] INFO: resque-0 sent SIGTERM
I [2011-03-27 22:51:23] INFO: resque-0 process stopped
I [2011-03-27 22:51:23] INFO: resque-0 move 'up' to 'unmonitored'
I [2011-03-27 22:51:23] INFO: resque-0 moved 'up' to 'unmonitored'
但它实际上并没有终止这两个进程,让实际的工作进程保持活动状态:
$ ps x | grep resque
6864 ?? S 0:05.15 resque-1.13.0: Waiting for builds
6858 s001 S+ 0:01.36 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D
I'm trying to understand how to monitor the resque worker for travis-ci with god in such a way that stopping the resque watch via god won't leave a stale worker process.
In the following I'm talking about the worker process, not forked job child processes (i.e. the queue is empty all the time).
When I manually start the resque worker like this:
$ QUEUE=builds rake resque:work
I'll get a single process:
$ ps x | grep resque
7041 s001 S+ 0:05.04 resque-1.13.0: Waiting for builds
And this process will go away as soon as I stop the worker task.
But when I start the same thing with god (exact configuration is here, basically the same thing as the resque/god example) like this ...
$ RAILS_ENV=development god -c config/resque.god -D
I [2011-03-27 22:49:15] INFO: Loading config/resque.god
I [2011-03-27 22:49:15] INFO: Syslog enabled.
I [2011-03-27 22:49:15] INFO: Using pid file directory: /Volumes/Users/sven/.god/pids
I [2011-03-27 22:49:15] INFO: Started on drbunix:///tmp/god.17165.sock
I [2011-03-27 22:49:15] INFO: resque-0 move 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is not running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 start: cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
I [2011-03-27 22:49:15] INFO: resque-0 moved 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 [ok] memory within bounds [784kb] (MemoryUsage)
I [2011-03-27 22:49:15] INFO: resque-0 [ok] process is running (ProcessRunning)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] memory within bounds [784kb, 784kb] (MemoryUsage)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] process is running (ProcessRunning)
Then I'll get an extra process:
$ ps x | grep resque
7187 ?? Ss 0:00.02 sh -c cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
7188 ?? S 0:05.11 resque-1.13.0: Waiting for builds
7183 s001 S+ 0:01.18 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D
God only seems to log the pid of the first one:
$ cat ~/.god/pids/resque-0.pid
7187
When I then stop the resque watch via god:
$ god stop resque
Sending 'stop' command
The following watches were affected:
resque-0
God gives this log output:
I [2011-03-27 22:51:22] INFO: resque-0 stop: default lambda killer
I [2011-03-27 22:51:22] INFO: resque-0 sent SIGTERM
I [2011-03-27 22:51:23] INFO: resque-0 process stopped
I [2011-03-27 22:51:23] INFO: resque-0 move 'up' to 'unmonitored'
I [2011-03-27 22:51:23] INFO: resque-0 moved 'up' to 'unmonitored'
But it does not actually terminate both of the processes, leaving the actual worker process alive:
$ ps x | grep resque
6864 ?? S 0:05.15 resque-1.13.0: Waiting for builds
6858 s001 S+ 0:01.36 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你需要告诉上帝使用rescue生成的pid文件并设置pid文件
env将告诉rescue写入pid文件,并且pid_file将告诉上帝也使用它
,正如svenfuchs指出的那样,仅设置适当的env就足够了:
其中/ home /travis/.god/pids 是默认的 pids 目录
You need to tell god to use pid file generated by rescue and set pid file
env will tell rescue to write pid file, and pid_file will tell god to use it
also as svenfuchs noted it should be enough to set only proper env:
where /home/travis/.god/pids is the default pids directory
我参加这里的聚会可能有点晚了,但我们也遇到了同样的问题。我们正在使用
这导致了多个进程。根据我们的系统操作人员的说法,这是由于使用了 rvm do,我们最终将其替换为
这使得 god 可以按预期工作,而无需指定 pid 文件。
I might be a little late to the party here but we had the same issue on our side. We were using
which caused the multiple processes. According to our sysops guy this is due to the usage of rvm do which we ended up replacing with
This allowed god to work as expected without the need to specify the pid file.