为什么solaris smf 给出“方法或服务退出超时”?即使该方法以状态 0 退出
我对 Solaris SMF 非常陌生,正在为 Weblogic Nodemanager 编写 SMF。我按照以下步骤操作: http://www.camelrichard.org/controlling-weblogic -node-manager-solaris-smf-non-root
为了测试 SMF 是否在服务被终止时重新启动服务,我从另一个终端向其发送终止信号,但它没有重新启动。这就是日志文件的内容:
[ Nov 19 10:17:39 Stopping because process received fatal signal from outside the service. ]
Killed
+ set +x
[ Nov 19 10:17:39 Executing stop method ("/usr/local/Oracle/Middleware/wlserver_10.3/server/bin/killNodeManager.sh") ]
Trying to find the PID of the nodeManager process
Cannot find the PID, NodeManager is not running - cannot kill
[ Nov 19 10:17:39 Method "stop" exited with status 0 ]
[ Nov 19 10:18:40 Method or service exit timed out. Killing contract 100 ]
如果您查看最后两行,我没有得到什么,第一行说该方法已退出,而第二行说该方法超时。我觉得这很奇怪。有人知道这里发生了什么事吗? smf的相关部分如下:
<service_bundle type='manifest' name='nodemanager'>
<service name='application/management/nodemanager' type='service' version='1'>
<single_instance />
<dependency
name='multi-user-server'
grouping='require_all'
restart_on='error'
type='service'>
<service_fmri value='svc:/milestone/multi-user-server' />
</dependency>
<exec_method
type='method'
name='start'
exec='/usr/local/Oracle/Middleware/wlserver_10.3/server/bin/startNodeManager2.sh'
timeout_seconds='120' >
<!-- Trying as root for now :
<method_context>
<method_credential user='weblogic' group='weblogic' />
</method_context>
-->
</exec_method>
<exec_method
type='method'
name='stop'
exec='/usr/local/Oracle/Middleware/wlserver_10.3/server/bin/killNodeManager.sh'
timeout_seconds='60' />
I am really new to Solaris SMF and was writing an SMF for Weblogic Nodemanager. I followed the steps from :
http://www.camelrichard.org/controlling-weblogic-node-manager-solaris-smf-non-root
To test if SMF is restarting the service when it gets killed, I am sending it a kill signal from another terminal, but it does not restart. This is what the log file says:
[ Nov 19 10:17:39 Stopping because process received fatal signal from outside the service. ]
Killed
+ set +x
[ Nov 19 10:17:39 Executing stop method ("/usr/local/Oracle/Middleware/wlserver_10.3/server/bin/killNodeManager.sh") ]
Trying to find the PID of the nodeManager process
Cannot find the PID, NodeManager is not running - cannot kill
[ Nov 19 10:17:39 Method "stop" exited with status 0 ]
[ Nov 19 10:18:40 Method or service exit timed out. Killing contract 100 ]
What I do not get is if you look at the last two lines, the first one says the method exited, while the second one says the method timed out. I find that weird. Anyone knows whats going on here? relevant parts of the smf are below:
<service_bundle type='manifest' name='nodemanager'>
<service name='application/management/nodemanager' type='service' version='1'>
<single_instance />
<dependency
name='multi-user-server'
grouping='require_all'
restart_on='error'
type='service'>
<service_fmri value='svc:/milestone/multi-user-server' />
</dependency>
<exec_method
type='method'
name='start'
exec='/usr/local/Oracle/Middleware/wlserver_10.3/server/bin/startNodeManager2.sh'
timeout_seconds='120' >
<!-- Trying as root for now :
<method_context>
<method_credential user='weblogic' group='weblogic' />
</method_context>
-->
</exec_method>
<exec_method
type='method'
name='stop'
exec='/usr/local/Oracle/Middleware/wlserver_10.3/server/bin/killNodeManager.sh'
timeout_seconds='60' />
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
第一条消息的原因:
这是因为方法脚本中执行的内容将位于 usr/local/Oracle/Middleware/wlserver_10.3/server/bin/startNodeManager2.sh
通常有效的方法将返回宏 <代码>SMF_EXIT_OK
我不确定为什么会出现第二条消息。必须与“killNodeManager.sh”相关
The reason for first message:
This is because of what executes in your method script which will be in usr/local/Oracle/Middleware/wlserver_10.3/server/bin/startNodeManager2.sh
Normally methods that work will return the macro
SMF_EXIT_OK
I am not sure about why the second message shows up. Must be something related to the 'killNodeManager.sh'
看起来
killNodeManager.sh
存在内部错误。它找不到应该停止的进程的 PID。所以,它退出的速度非常快,从开始后不到1秒。然而,从 SMF 的角度来看,与此服务相关的合同 #100 仍然处于活动状态。在分配给“stop”方法执行 60 秒后,SMF 发现合约仍在运行,它别无选择,只能尝试终止整个合约。它合理地假设“停止”方法没有发挥作用。因此,您会在日志中看到最后一条消息,并且服务在终止合约后进入维护模式。希望这有帮助!
IT looks like
killNodeManager.sh
had an internal error. It could not find the PID of the process it is supposed to stop. So, it exited very quickly, within 1 second from its start. However, the contract #100 associated with this service was still active from SMF perspective. After 60 seconds allocated for the "stop" method execution, SMF saw that the contract is still up and it had no other option but to attempt killing the whole contract. It legitimately assumed that the "stop" method did not do its job. Hence, you see the last message in the log and the service goes into maintenance mode after killing the contract.Hope this helps!