卷组问题
是tru64的平台,因为tru64板块人太少,而我又很急,所以发到这边,也许你现在不用tru64,也请根据你的经验,给点处理意见,先谢谢了
客户环境,两台alpha机器(ES40、4100),一台ra7000的存储,做了群集
我把故障现象给大家描述下,加上日志:
客户反映业务无法正常运行时,我登录到主机查看/var/adm/messages,里面有持续不断的卷组错误,例如:
Mar 31 09:15:28 aqwg1 vmunix: Defering I/O (errno 5) for block(0x100c0, 0x100c0) on device 55,7
Mar 31 09:15:58 aqwg1 vmunix: io/vol.c(volerror): Uncorrectable write error on volume vol41, plex vol41-01, block 33926784
Mar 31 09:15:58 aqwg1 vmunix: io/vol.c(volerror): Uncorrectable write error on volume vol41, plex vol41-01, block 33926656
Mar 31 09:15:58 aqwg1 vmunix: io/vol.c(volerror): Uncorrectable write error on volume vol41, plex vol41-01, block 23642752
Mar 31 09:15:58 aqwg1 vmunix: io/vol.c(volerror): Uncorrectable write error on volume vol41, plex vol41-01, block 65728
我查看vol41的状态是active的
# volprint -ht -g dg1|grep 41
sd dg101-29 vol29-01 0 54972416 2097152 dg101 rz16
sd dg101-40 vol40-01 0 78041088 2097152 dg101 rz16
v vol41 fsgen ENABLED ACTIVE 41943040 SELECT -
pl vol41-01 vol41 ENABLED ACTIVE 41943040 CONCAT - RW
sd dg101-41 vol41-01 0 80138240 41943040 dg101 rz16
本地卷组的状态也都是正常的,当这个故障出现时,如果我重启主机,业务又会恢复一段时间,然后相同的故障又会转移到另外一台主机上
数据库是sybase,日志中也有I/O错误:
00:00000:00000:2011/03/31 03:01:56.46 kernel sddone: write error on virtual disk 19 block 118755:
00:00000:00000:2011/03/31 03:01:56.54 kernel sddone: I/O error
00:00000:00057:2011/03/31 03:01:56.54 server bufwritedes: write error detected - spid=57, ppage=1757155, bvirtpg=318885859, dbid=7
00:00000:00000:2011/03/31 03:01:56.59 kernel sddone: write error on virtual disk 19 block 118755:
00:00000:00000:2011/03/31 03:01:56.59 kernel sddone: I/O error
00:00000:00000:2011/03/31 03:01:56.59 kernel sddone: read error on virtual disk 0 block 325:
00:00000:00000:2011/03/31 03:01:56.59 kernel sddone: I/O error
还有类似这样的错误:
00:00000:00001:2011/03/31 09:42:29.34 server Activating disk 'data03_200'.
00:00000:00001:2011/03/31 09:42:29.34 kernel Initializing virtual device 2, '/dev/rvol/dg1/vol03' with dsync 'off'.
00:00000:00001:2011/03/31 09:42:29.34 kernel Virtual device 2 started using asynchronous i/o.
00:00000:00001:2011/03/31 09:42:29.34 server Activating disk 'data04_50'.
00:00000:00001:2011/03/31 09:42:29.34 kernel Initializing virtual device 3, '/dev/rvol/dg1/vol04' with dsync 'off'.
00:00000:00001:2011/03/31 09:42:29.34 kernel Virtual device 3 started using asynchronous i/o.
00:00000:00001:2011/03/31 09:42:29.34 server Activating disk 'data05_1000'.
00:00000:00001:2011/03/31 09:42:29.34 kernel Initializing virtual device 4, '/dev/rvol/dg1/vol05' with dsync 'off'.
00:00000:00001:2011/03/31 09:42:29.34 kernel Virtual device 4 started using asynchronous i/o.
00:00000:00001:2011/03/31 09:42:29.34 server Activating disk 'data06_200'.
另外我在/var/adm/messages里还发现了这个错误:
Mar 30 19:26:13 aqwg1 vmunix: system is full
Mar 30 19:26:13 aqwg1 vmunix: /usr: write failed, file system is full
Mar 30 19:57:34 aqwg1 vmunix: /usr: write failed, file system is full
Mar 30 20:31:02 aqwg1 vmunix: /usr: write failed, file system is full
Mar 30 21:00:22 aqwg1 vmunix: /usr: write failed, file system is full
Mar 30 21:25:06 aqwg1 vmunix: /usr: write failed, file system is full
文件系统信息:
# df -i
Filesystem 512-blocks Used Available Capacity Iused Ifree %Iused Mounted on
root_domain#root 524288 340102 169840 67% 2207 294311 1% /
/proc 0 0 0 100% 84 4014 2% /proc
usr_domain#usr 9057232 5348742 3708490 60% 134968 7112694 2% /usr
/dev/ase_007 40693708 23227184 13397152 64% 62 4915136 0% /data
/usr并没有满,此问题还在持续,请各位兄弟给点处理意见,谢谢!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
http://www.ixpub.net/thread-1265207-1-1.html
看看这篇文章对你有没有帮助