求助!RHCS不稳定的怪异问题
本帖最后由 liuyongsd 于 2011-06-13 17:21 编辑
日前配置了一套RHCS的双机。OS版本RHEL 5.4。配置完成后,运行正常,但每隔5天左右,双机就会down掉。从日志中看,是由于找不到心跳所致。但配置时,为了避免心跳丢失,特意做了bonding。
cluster.conf配置文件如下:
<?xml version="1.0"?>
<cluster alias="new_cluster" config_version="20" name="new_cluster">
<totem token="80000"/>
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="clusternode01" nodeid="1" votes="1">
<fence>
<method name="1">
<device lanplus="1" name="rlerpdb"/>
</method>
</fence>
</clusternode>
<clusternode name="clusternode02" nodeid="2" votes="1">
<fence>
<method name="1">
<device lanplus="1" name="rlerpci"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="2" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="" ipaddr="172.16.63.112" login="Administrator" name="node1" passwd="password"/>
<fencedevice agent="fence_ipmilan" auth="" ipaddr="172.16.63.113" login="Administrator" name="node2" passwd="password"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="ipdomain" ordered="1" restricted="0">
<failoverdomainnode name="clusternode01" priority="2"/>
<failoverdomainnode name="clusternode02" priority="1"/>
</failoverdomain>
<failoverdomain name="dbdomain" ordered="1" restricted="0">
<failoverdomainnode name="clusternode01" priority="1"/>
<failoverdomainnode name="clusternode02" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.16.45.204" monitor_link="1"/>
<ip address="172.16.45.203" monitor_link="1"/>
</resources>
<service autostart="1" domain="ipdomain" name="ascs" recovery="relocate">
<ip ref="172.16.45.204">
<script file="/usr/scripts/ascs" name="ascs"/>
</ip>
</service>
<service autostart="1" domain="dbdomain" name="db" recovery="relocate">
<fs device="/dev/mapper/oracle_vg-oraclevol" force_fsck="0" force_unmount="1" fsid="64480" fstype="ext3" mountpoint="/oracle" name="oracle" opti
ons="" self_fence="0">
<fs device="/dev/mapper/data_vg-data01vol" force_fsck="0" force_unmount="1" fsid="11430" fstype="ext3" mountpoint="/oracle/RLP/sapdata1"
name="data1" options="" self_fence="0"/>
<fs device="/dev/mapper/data_vg-data02vol" force_fsck="0" force_unmount="1" fsid="32028" fstype="ext3" mountpoint="/oracle/RLP/sapdata2"
name="data2" options="" self_fence="0"/>
<fs device="/dev/mapper/data_vg-data03vol" force_fsck="0" force_unmount="1" fsid="19850" fstype="ext3" mountpoint="/oracle/RLP/sapdata3"
name="data3" options="" self_fence="0">
<ip ref="172.16.45.203">
<script file="/usr/scripts/db" name="db"/>
</ip>
</fs>
</fs>
</service>
</rm>
</cluster>
启动时间约在Jun 13 14:30,两台主机均重启了,可明明把心跳做了bonding,按说实在不应该啊。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
node1 log:
Jun 13 14:32:59 clusternode1 syslogd 1.4.1: restart (remote reception).
Jun 13 14:32:59 clusternode1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun 13 14:32:59 clusternode1 kernel: Linux version 2.6.18-164.el5 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Tue Aug
18 15:51:48 EDT 2009
Jun 13 14:32:59 clusternode1 kernel: Command line: ro root=LABEL=/ rhgb quiet
Jun 13 14:32:59 clusternode1 kernel: BIOS-provided physical RAM map:
Jun 13 14:32:59 clusternode1 kernel: BIOS-e820: 0000000000010000 - 000000000009dc00 (usable)
Jun 13 14:32:59 clusternode1 kernel: BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
Jun 13 14:32:59 clusternode1 kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Jun 13 14:32:59 clusternode1 kernel: BIOS-e820: 0000000000100000 - 00000000bf620000 (usable)
Jun 13 14:32:59 clusternode1 kernel: BIOS-e820: 00000000bf620000 - 00000000bf63c000 (ACPI data)
Jun 13 14:32:59 clusternode1 kernel: BIOS-e820: 00000000bf63c000 - 00000000bf63d000 (usable)
................(此处省略启动信息)
Jun 13 14:44:07 clusternode1 ccsd[8251]: Starting ccsd 2.0.115:
Jun 13 14:44:07 clusternode1 ccsd[8251]: Built: Aug 5 2009 08:24:53
Jun 13 14:44:07 clusternode1 ccsd[8251]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Jun 13 14:44:07 clusternode1 ccsd[8251]: cluster.conf (cluster name = new_cluster, version = 213) found.
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] AIS Executive Service: started and ready to provide service.
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] Using default multicast address of 239.192.220.17
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Token Timeout (80000 ms) retransmit timeout (3960 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] token hold (3158 ms) retransmits before loss (20 retrans)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] send threads (0 threads)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] RRP token expired timeout (3960 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] RRP token problem counter (2000 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] RRP threshold (10 problem count)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] RRP mode set to none.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] heartbeat_failures_allowed (0)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] max_network_delay (50 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] The network interface [10.0.0.203] is now up.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Created or loaded sequence id 32.10.0.0.203 for this ring.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] entering GATHER state from 15.
Jun 13 14:44:09 clusternode1 openais[8260]: [CMAN ] CMAN 2.0.115 (built Aug 5 2009 08:24:57) started
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais event service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais message service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais configuration service'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SYNC ] Not using a virtual synchrony filter.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Creating commit token because I am the rep.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Saving state aru 0 high seq received 0
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Storing new sequence id for ring 24
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] entering COMMIT state.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] entering RECOVERY state.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] position [0] member 10.0.0.203:
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] previous ring seq 32 rep 10.0.0.203
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] aru 0 high delivered 0 received flag 1
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Did not need to originate any messages in recovery.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Sending initial ORF token
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] CLM CONFIGURATION CHANGE
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] New Configuration:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] Members Left:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] Members Joined:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] CLM CONFIGURATION CHANGE
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] New Configuration:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] r(0) ip(10.0.0.203)
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] Members Left:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] Members Joined:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] r(0) ip(10.0.0.203)
Jun 13 14:44:09 clusternode1 openais[8260]: [SYNC ] This node is within the primary component and will provide service.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] entering OPERATIONAL state.
Jun 13 14:44:09 clusternode1 openais[8260]: [CMAN ] quorum regained, resuming activity
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM ] got nodejoin message 10.0.0.203
Jun 13 14:44:09 clusternode1 ccsd[8251]: Cluster is not quorate. Refusing connection.
Jun 13 14:44:09 clusternode1 ccsd[8251]: Error while processing connect: Connection refused
Jun 13 14:44:10 clusternode1 ccsd[8251]: Initial status:: Quorate
Jun 13 14:44:12 clusternode1 qdiskd[8020]: <info> Quorum Daemon Initializing
Jun 13 14:44:12 clusternode1 qdiskd[8020]: <crit> Initialization failed
Jun 13 14:45:00 clusternode1 fenced[8281]: rlerpprdcihb01 not a cluster member after 3 sec post_join_delay
Jun 13 14:45:00 clusternode1 fenced[8281]: fencing node "rlerpprdcihb01"
Jun 13 14:45:09 clusternode1 fenced[8281]: fence "rlerpprdcihb01" success
Jun 13 14:45:18 clusternode1 kernel: dlm: Using TCP for communications
Jun 13 14:45:19 clusternode1 clvmd: Cluster LVM daemon started - connected to CMAN
Jun 13 14:45:19 clusternode1 multipathd: dm-7: add map (uevent)
Jun 13 14:45:19 clusternode1 multipathd: dm-8: add map (uevent)
Jun 13 14:45:19 clusternode1 multipathd: dm-9: add map (uevent)
Jun 13 14:45:19 clusternode1 multipathd: dm-10: add map (uevent)
Jun 13 14:45:20 clusternode1 multipathd: dm-11: add map (uevent)
Jun 13 14:45:20 clusternode1 multipathd: dm-12: add map (uevent)
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "new_cluster:sapmnt"
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: Joined cluster. Now mounting FS...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=0, already locked for use
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=0: Looking at journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=0: Done
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Trying to acquire journal lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Looking at journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Acquiring the transaction lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Replaying journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Replayed 2 of 2 blocks
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Found 0 revoke tags
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Journal replayed in 1s
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Done
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=2: Trying to acquire journal lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=2: Looking at journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=2: Done
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=3: Trying to acquire journal lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=3: Looking at journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=3: Done
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=4: Trying to acquire journal lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=4: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=4: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=5: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=5: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=5: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=6: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=6: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=6: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=7: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=7: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=7: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=8: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=8: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=8: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=9: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=9: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=9: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "new_cluster:ascs00"
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: Joined cluster. Now mounting FS...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=0, already locked for use
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=0: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=0: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Acquiring the transaction lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Replaying journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Replayed 5 of 5 blocks
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Found 0 revoke tags
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Journal replayed in 1s
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=2: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=2: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=2: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=3: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=3: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=3: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=4: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=4: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=4: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=5: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=5: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=5: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=6: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=6: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=6: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=7: Trying to acquire journal lock...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=7: Looking at journal...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=7: Done
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=8: Trying to acquire journal lock...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=8: Looking at journal...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=8: Done
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=9: Trying to acquire journal lock...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=9: Looking at journal...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=9: Done
Jun 13 14:45:37 clusternode1 clurgmgrd[8581]: <notice> Resource Group Manager Starting
Jun 13 14:45:50 clusternode1 clurgmgrd[8581]: <notice> Starting stopped service service:ascs
Jun 13 14:45:50 clusternode1 clurgmgrd[8581]: <notice> Starting stopped service service:db
Jun 13 14:45:50 clusternode1 avahi-daemon[7413]: Registering new address record for 172.16.45.204 on eth1.
Jun 13 14:45:50 clusternode1 kernel: kjournald starting. Commit interval 5 seconds
Jun 13 14:45:50 clusternode1 kernel: EXT3-fs warning: mounting unchecked fs, running e2fsck is recommended
Jun 13 14:45:50 clusternode1 kernel: EXT3 FS on dm-7, internal journal
Jun 13 14:45:50 clusternode1 kernel: EXT3-fs: dm-7: 1 orphan inode deleted
Jun 13 14:45:50 clusternode1 kernel: EXT3-fs: recovery complete.
Jun 13 14:45:50 clusternode1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:45:51 clusternode1 kernel: kjournald starting. Commit interval 5 seconds
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:45:51 clusternode1 kernel: EXT3 FS on dm-9, internal journal
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: recovery complete.
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:45:51 clusternode1 kernel: kjournald starting. Commit interval 5 seconds
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:45:51 clusternode1 kernel: EXT3 FS on dm-8, internal journal
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: recovery complete.
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:45:51 clusternode1 kernel: kjournald starting. Commit interval 5 seconds
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:45:51 clusternode1 kernel: EXT3 FS on dm-10, internal journal
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: recovery complete.
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:45:51 clusternode1 avahi-daemon[7413]: Registering new address record for 172.16.45.203 on eth1.
Jun 13 14:45:53 clusternode1 SAPRLP_00[11393]: SAP Service SAPRLP_00 successfully started.
Jun 13 14:46:14 clusternode1 SAPRLP_01[12265]: SAP Service SAPRLP_01 successfully started.
Jun 13 14:46:34 clusternode1 kernel: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.eth1.base_reachable_time; Use net.ipv6.neigh.eth1.base_reachable
_time_ms instead.
Jun 13 14:46:42 clusternode1 SAPRLP_01[12902]: Unable to open trace file sapstartsrv.log. (Error 11 Resource temporarily unavailable) [ntservsserver.cpp 2218]
Jun 13 14:46:54 clusternode1 clurgmgrd[8581]: <notice> Service service:db started
Jun 13 14:47:22 clusternode1 clurgmgrd[8581]: <notice> Service service:ascs started
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] entering GATHER state from 11.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Creating commit token because I am the rep.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Saving state aru 39 high seq received 39
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Storing new sequence id for ring 28
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] entering COMMIT state.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] entering RECOVERY state.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] position [0] member 10.0.0.203:
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] previous ring seq 36 rep 10.0.0.203
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] aru 39 high delivered 39 received flag 1
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] position [1] member 10.0.0.204:
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] previous ring seq 36 rep 10.0.0.204
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] aru 0 high delivered 0 received flag 1
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Did not need to originate any messages in recovery.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Sending initial ORF token
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] CLM CONFIGURATION CHANGE
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] New Configuration:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] r(0) ip(10.0.0.203)
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] Members Left:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] Members Joined:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] CLM CONFIGURATION CHANGE
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] New Configuration:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] r(0) ip(10.0.0.203)
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] r(0) ip(10.0.0.204)
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] Members Left:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] Members Joined:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] r(0) ip(10.0.0.204)
Jun 13 14:54:42 clusternode1 openais[8260]: [SYNC ] This node is within the primary component and will provide service.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] entering OPERATIONAL state.
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] got nodejoin message 10.0.0.203
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM ] got nodejoin message 10.0.0.204
Jun 13 14:54:42 clusternode1 openais[8260]: [CPG ] got joinlist message from node 1
Jun 13 14:54:57 clusternode1 kernel: dlm: connecting to 2
Jun 13 14:56:38 clusternode1 clurgmgrd[8581]: <notice> Relocating service:ascs to better node rlerpprdcihb01
Jun 13 14:56:38 clusternode1 clurgmgrd[8581]: <notice> Stopping service service:ascs
Jun 13 14:57:17 clusternode1 avahi-daemon[7413]: Withdrawing address record for 172.16.45.204 on eth1.
Jun 13 14:57:23 clusternode1 SAPRLP_01[20972]: Unable to open trace file sapstartsrv.log. (Error 11 Resource temporarily unavailable) [ntservsserver.cpp 2218]
Jun 13 14:57:27 clusternode1 clurgmgrd[8581]: <notice> Service service:ascs is stopped
Jun 13 14:59:27 clusternode1 gdm[22061]: 无法认证用户
node 2 log信息:
Jun 12 04:03:02 clusternode2 syslogd 1.4.1: restart (remote reception).
Jun 13 14:28:49 clusternode2 openais[19156]: [TOTEM] The token was lost in the OPERATIONAL state.
Jun 13 14:28:49 clusternode2 openais[19156]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Jun 13 14:28:49 clusternode2 openais[19156]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
Jun 13 14:28:49 clusternode2 openais[19156]: [TOTEM] entering GATHER state from 2.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] entering GATHER state from 0.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Creating commit token because I am the rep.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Saving state aru 9b high seq received 9b
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Storing new sequence id for ring 24
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] entering COMMIT state.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] entering RECOVERY state.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] position [0] member 10.0.0.204:
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] previous ring seq 32 rep 10.0.0.203
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] aru 9b high delivered 9b received flag 1
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Did not need to originate any messages in recovery.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Sending initial ORF token
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] CLM CONFIGURATION CHANGE
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] New Configuration:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] r(0) ip(10.0.0.204)
Jun 13 14:28:54 clusternode2 kernel: dlm: closing connection to node 1
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] Members Left:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] r(0) ip(10.0.0.203)
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] Members Joined:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] CLM CONFIGURATION CHANGE
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] New Configuration:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] r(0) ip(10.0.0.204)
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] Members Left:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] Members Joined:
Jun 13 14:28:54 clusternode2 fenced[19176]: clusternode1 not a cluster member after 0 sec post_fail_delay
Jun 13 14:28:54 clusternode2 openais[19156]: [SYNC ] This node is within the primary component and will provide service.
Jun 13 14:28:54 clusternode2 fenced[19176]: fencing node "clusternode1"
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] entering OPERATIONAL state.
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM ] got nodejoin message 10.0.0.204
Jun 13 14:28:54 clusternode2 openais[19156]: [CPG ] got joinlist message from node 2
Jun 13 14:29:03 clusternode2 fenced[19176]: fence "clusternode1" success
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Trying to acquire journal lock...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Trying to acquire journal lock...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Looking at journal...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Looking at journal...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Acquiring the transaction lock...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Replaying journal...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Replayed 2 of 2 blocks
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Found 0 revoke tags
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Journal replayed in 1s
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Done
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Acquiring the transaction lock...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Replaying journal...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Replayed 5 of 8 blocks
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Found 2 revoke tags
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Journal replayed in 0s
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Done
Jun 13 14:29:03 clusternode2 clurgmgrd[19449]: <notice> Taking over service service:db from down member clusternode1
Jun 13 14:29:04 clusternode2 kernel: kjournald starting. Commit interval 5 seconds
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs warning: mounting unchecked fs, running e2fsck is recommended
Jun 13 14:29:04 clusternode2 kernel: EXT3 FS on dm-7, internal journal
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: dm-7: 1 orphan inode deleted
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: recovery complete.
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:29:04 clusternode2 kernel: kjournald starting. Commit interval 5 seconds
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:29:04 clusternode2 kernel: EXT3 FS on dm-9, internal journal
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: recovery complete.
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:29:05 clusternode2 kernel: kjournald starting. Commit interval 5 seconds
Jun 13 14:29:05 clusternode2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:29:05 clusternode2 kernel: EXT3 FS on dm-8, internal journal
Jun 13 14:29:05 clusternode2 kernel: EXT3-fs: recovery complete.
Jun 13 14:29:05 clusternode2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:29:06 clusternode2 kernel: kjournald starting. Commit interval 5 seconds
Jun 13 14:29:06 clusternode2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:29:06 clusternode2 kernel: EXT3 FS on dm-10, internal journal
Jun 13 14:29:06 clusternode2 kernel: EXT3-fs: recovery complete.
Jun 13 14:29:06 clusternode2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:29:06 clusternode2 avahi-daemon[9054]: Registering new address record for 172.16.45.203 on eth1.
Jun 13 14:29:28 clusternode2 SAPRLP_01[11001]: Unable to open trace file sapstartsrv.log. (Error 11 Resource temporarily unavailable) [ntservsserver.cpp 2218]
Jun 13 14:30:08 clusternode2 clurgmgrd[19449]: <notice> Service service:db started
Jun 13 14:49:05 clusternode2 syslogd 1.4.1: restart (remote reception).
Jun 13 14:49:05 clusternode2 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun 13 14:49:05 clusternode2 kernel: Linux version 2.6.18-164.el5 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Tue Aug
18 15:51:48 EDT 2009
Jun 13 14:49:05 clusternode2 kernel: Command line: ro root=LABEL=/1 rhgb quiet
Jun 13 14:49:05 clusternode2 kernel: BIOS-provided physical RAM map:
Jun 13 14:49:05 clusternode2 kernel: BIOS-e820: 0000000000010000 - 000000000009dc00 (usable)
Jun 13 14:49:05 clusternode2 kernel: BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
Jun 13 14:49:05 clusternode2 kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
...........(省略硬件启动信息)
Jun 13 14:54:42 clusternode2 openais[8417]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais event service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais message service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais configuration service'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SYNC ] Not using a virtual synchrony filter.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] entering GATHER state from 10.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] Saving state aru 0 high seq received 0
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] Storing new sequence id for ring 28
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] entering COMMIT state.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] entering RECOVERY state.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] position [0] member 10.0.0.203:
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] previous ring seq 36 rep 10.0.0.203
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] aru 39 high delivered 39 received flag 1
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] position [1] member 10.0.0.204:
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] previous ring seq 36 rep 10.0.0.204
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] aru 0 high delivered 0 received flag 1
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] Did not need to originate any messages in recovery.
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] CLM CONFIGURATION CHANGE
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] New Configuration:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] Members Left:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] Members Joined:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] CLM CONFIGURATION CHANGE
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] New Configuration:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] r(0) ip(10.0.0.203)
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] r(0) ip(10.0.0.204)
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] Members Left:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] Members Joined:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] r(0) ip(10.0.0.203)
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] r(0) ip(10.0.0.204)
Jun 13 14:54:42 clusternode2 openais[8417]: [SYNC ] This node is within the primary component and will provide service.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] entering OPERATIONAL state.
Jun 13 14:54:42 clusternode2 openais[8417]: [CMAN ] quorum regained, resuming activity
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] got nodejoin message 10.0.0.203
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM ] got nodejoin message 10.0.0.204
Jun 13 14:54:42 clusternode2 openais[8417]: [CPG ] got joinlist message from node 1
Jun 13 14:54:43 clusternode2 ccsd[8408]: Remote copy of cluster.conf is from quorate node.
Jun 13 14:54:43 clusternode2 ccsd[8408]: Local version # : 213
Jun 13 14:54:43 clusternode2 ccsd[8408]: Remote version #: 213
Jun 13 14:54:43 clusternode2 qdiskd[8177]: <info> Quorum Daemon Initializing
Jun 13 14:54:43 clusternode2 qdiskd[8177]: <crit> Initialization failed
Jun 13 14:54:43 clusternode2 ccsd[8408]: Initial status:: Quorate
Jun 13 14:54:57 clusternode2 kernel: dlm: Using TCP for communications
Jun 13 14:54:57 clusternode2 kernel: dlm: got connection from 1
Jun 13 14:54:58 clusternode2 clvmd: Cluster LVM daemon started - connected to CMAN
Jun 13 14:54:59 clusternode2 multipathd: dm-7: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-8: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-9: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-10: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-11: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-12: add map (uevent)
Jun 13 14:55:04 clusternode2 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "new_cluster:sapmnt"
Jun 13 14:55:04 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: Joined cluster. Now mounting FS...
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=1, already locked for use
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=1: Looking at journal...
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=1: Done
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "new_cluster:ascs00"
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: Joined cluster. Now mounting FS...
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=1, already locked for use
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=1: Looking at journal...
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=1: Done
Jun 13 14:56:26 clusternode2 clurgmgrd[8709]: <notice> Resource Group Manager Starting
Jun 13 14:57:27 clusternode2 clurgmgrd[8709]: <notice> Starting stopped service service:ascs
Jun 13 14:57:27 clusternode2 avahi-daemon[7782]: Registering new address record for 172.16.45.204 on eth1.
Jun 13 14:57:31 clusternode2 SAPRLP_00[11351]: SAP Service SAPRLP_00 successfully started.
Jun 13 14:58:11 clusternode2 kernel: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.eth1.base_reachable_time; Use net.ipv6.neigh.eth1.base_reachable
_time_ms instead.
Jun 13 14:58:19 clusternode2 SAPRLP_01[11931]: SAP Service SAPRLP_01 successfully started.
Jun 13 14:59:00 clusternode2 clurgmgrd[8709]: <notice> Service service:ascs started
有哪位高人给看看,一直没搞明白心跳为何会丢。心跳ip 10.0.0.203/204的网口已经做了bonding
。
回复 1# liuyongsd
能否看一下restat前面的日志,因为你粘出来都只是重启以后了,意义不大。还有想问一下,你问题解决了吗
碰到同样的状况,每个6天在19:25左右会发生丢失令牌的现象,导致节点fence. 请问楼主找到解决方法了么.
为什么不用heartbeat呢
最为诡异的是cluster能正常运行一个固定的时间间隔(6天)后就重启. 而且每次重启的时间点也基本相同(都是在19:25左右).
先是主节点重启,启动完成后,主把备fence掉.然后cluster就恢复正常 . 系统版本同样是5.4 ,难道是新的bug么.