HP L3000一个奇怪问题,大家来看一下
昨天下午,一个客户那边的工程师打过来电话,说是一台HP L3000 hang那没反应了,远程网管软件只显示客户agent当掉,但是依然能ping通。telnet上去开始能进去,但是反应很慢,之后再telnet就没反应了,hang在那了,客户本来有一个HP的终端显示器连在机器上,但是中间一个串口线没了,我这边的串口线也弄没了,只好先跑到电子城去买了一个串口线,带着之前准备好的内存(原来给扩的内存一直报单bit错误,但是一直没出问题,客户不好停机,也一直没换,想趁着这机会给换了),到现场查看telnet确实没反应,终端查看也是一样状况,GSP查看相关LOG,没有什么有用信息。除了上次更换内存报的一些错,时间上看跟上次扩内存操作时间吻合。没有最新的日志信息出来。该机器连一台Sun的3510光纤磁盘阵列。没有其他办法,跟客户商量关机,重新开机,查看机器开机信息,一直到引导进系统都显示正常。系统里查看日志,从事件发生时间上来看有一个可能性最大的日志记录,日志如下:
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Sun Oct 18 06:10:04 2009
ntpdm sent Event Monitor notification information:
/adapters/events/TL_adapter/0_10_0_0 is >= 1.
Its current value is INFORMATION(1).
Event data from monitor:
Event Time..........: Sun Oct 18 06:10:04 2009
Severity............: INFORMATION
Monitor.............: dm_TL_adapter
Event #.............: 17
System..............: ntpdm.mnt.lge.com
Summary:
Adapter at hardware path 0/10/0/0 : Received an ERQ Frozen interrupt
Description of Error:
lbolt value: 547337675
The Fibre Channel Driver received an ERQ Frozen interrupt
from the adapter
Probable Cause / Recommended Action:
The Tachyon TL adapter generated an interrupt indicating that
the ERQ has been frozen. This is caused by the driver
requesting the Tachyon TL chip to freeze the ERQ.
No action is required. Informative message. The driver will
recover from this condition
Additional Event Data:
System IP Address...: 150.150.214.100
Event Id............: 0x4ada40bc00000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_TL_adapter.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
0x4ada40bc00000000
Additional System Data:
System Model Number.............: 9000/800/L3000-8x
OS Version......................: B.11.11
EMS Version.....................: A.04.00
STM Version.....................: A.41.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_TL_adapter.htm
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Sun Oct 18 18:36:04 2009
ntpdm sent Event Monitor notification information:
/adapters/events/TL_adapter/0_10_0_0 is >= 1.
Its current value is INFORMATION(1).
Event data from monitor:
Event Time..........: Sun Oct 18 18:36:04 2009
Severity............: INFORMATION
Monitor.............: dm_TL_adapter
Event #.............: 19
System..............: ntpdm.mnt.lge.com
Summary:
Adapter at hardware path 0/10/0/0 : Received an interrupt indicating that
a primitive was received
Description of Error:
lbolt value: 4953
The Fibre Channel Driver received an interrupt indicating
that a primitive was received
Frame Manager Status Register = 0xa002c480
Probable Cause / Recommended Action:
The Tachyon TL adapter received a primitive sequence.
No action needed. Informative message.
Additional Event Data:
System IP Address...: 150.150.214.100
Event Id............: 0x4adaef9400000002
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_TL_adapter.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
0x4adaeec000000001
Additional System Data:
System Model Number.............: 9000/800/L3000-8x
OS Version......................: B.11.11
EMS Version.....................: A.04.00
STM Version.....................: A.41.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_TL_adapter.htm
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
又查了一些文档 发现之前也出现过这种提示 06年 和08年都有 所以觉得这个Hang机不一定完全跟这个提示有关的。。
内存也换了 PDT也清了 光纤卡也换了 之后还是有提示信息 不过比以前少了 只有下面的两个:
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Sun Oct 18 18:36:04 2009
ntpdm sent Event Monitor notification information:
/adapters/events/TL_adapter/0_10_0_0 is >= 1.
Its current value is INFORMATION(1).
Event data from monitor:
Event Time..........: Sun Oct 18 18:36:04 2009
Severity............: INFORMATION
Monitor.............: dm_TL_adapter
Event #.............: 19
System..............: ntpdm.mnt.lge.com
Summary:
Adapter at hardware path 0/10/0/0 : Received an interrupt indicating that
a primitive was received
Description of Error:
lbolt value: 4953
The Fibre Channel Driver received an interrupt indicating
that a primitive was received
Frame Manager Status Register = 0xa002c480
Probable Cause / Recommended Action:
The Tachyon TL adapter received a primitive sequence.
No action needed. Informative message.
Additional Event Data:
System IP Address...: 150.150.214.100
Event Id............: 0x4adaef9400000002
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_TL_adapter.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
0x4adaeec000000001
Additional System Data:
System Model Number.............: 9000/800/L3000-8x
OS Version......................: B.11.11
EMS Version.....................: A.04.00
STM Version.....................: A.41.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_TL_adapter.htm
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
Component Data:
Physical Device Path....: 0/10/0/0
Vendor Id...............: 0x0000103C
Serial Number(WWN)......: 50060B0000309196
I/O Log Event Data:
Driver Status Code..................: 0x00000013
Length of Logged Hardware Status....: 0 bytes.
Offset to Logged Manager Information: 0 bytes.
Length of Logged Manager Information: 61 bytes.
Manager-Specific Information:
Raw data from FCMS Adapter driver:
00000006 00001359 00000001 00000001 A002C480 2F75782F 6B65726E 2F6B6973
752F544C 2F737263 2F636F6D 6D6F6E2F 7773696F 2F74645F 6973722E 63
>---------- End Event Monitoring Service Event Notification ----------<
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Sun Oct 25 18:13:34 2009
ntpdm sent Event Monitor notification information:
/adapters/events/TL_adapter/0_10_0_0 is >= 1.
Its current value is INFORMATION(1).
Event data from monitor:
Event Time..........: Sun Oct 25 18:13:34 2009
Severity............: INFORMATION
Monitor.............: dm_TL_adapter
Event #.............: 18
System..............: ntpdm.mnt.lge.com
Summary:
Adapter at hardware path 0/10/0/0 : Received an interrupt indicating that
a primitive was transmitted
Description of Error:
lbolt value: 4953
The Fibre Channel Driver received an interrupt indicating
that a primitive was transmitted
Frame Manager Status Register = 0xa002c480
Probable Cause / Recommended Action:
The Tachyon TL adapter transmitted a primitive sequence.
This is caused by the driver issuing a primitive sequence
to the chip to be transmitted.
No action is required. Informative message
Additional Event Data:
System IP Address...: 150.150.214.100
Event Id............: 0x4ae424ce00000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_TL_adapter.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
0x4ae423fe00000000
Additional System Data:
System Model Number.............: 9000/800/L3000-8x
OS Version......................: B.11.11
EMS Version.....................: A.04.00
STM Version.....................: A.41.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_TL_adapter.htm
打了最新的补丁也还是有这信息 我不确定 这个是不是每次重启都会出现的。。。连光纤线也换了
TOC = Transfer of Control
toc和一般的restart不一样,如果系统收到这个信号,会把当前内存的状态(一些数据等)转到dump区域,然后再正常重启;在重启过程中会把dump区域的数据都拷到/var/adm/crash目录下,所以生成的文件比较大。通过HP的一个工具分析这些文件可以了解系统hang住的原因。
准备换卡了
自己顶一下吧
网络攻击的可能性不大。。至于单bit错误 你的方法倒可以一试
兄弟光纤卡报这种错很正常的,还有single bit error 在BCH下进入ser,然后执行pdt clear,试试.
我好几台single bit error报错的机器都是这样搞好的,single bit error 换内存不一定有用的.
hang的时候最好不要直接重启了,不然可能很难查的.
可以朝网络方面去排查.比如ARP攻击,错误的DNS设置,都可以导致登陆超级慢.
什么是TOC
我觉得很难查了,你关机开机。。hang住的原因已经被冲掉~
hang住的时候应该做TOC阿!!然后分析crash dump文件才行。。