求助,RP3440故障定位.
一台RP3440小型机,近期不间断重启,在/var/adm/syslog/syslog.log中无相关EMS信息,面板上的SYSTEM闪红灯,其它4个状态灯不亮.查RP3440硬件手册无该状态对应硬件出错信息,抓该机器TS99文件如下,本人以为是PCI背板坏.不知道除了该故障外,其中的PCI BUS是否也有问题,请教有相关工程经验的工程师.
HP-UX yydl01 B.11.11 U 9000/800 1003590773
Return Info:
Return Errors:
----------------- Processor 0 HPMC Information - PDC Version: 45.11 ------
Timestamp = Wed Sep 1 16:26:52 GMT 2010 (20:10:09:01:16:26:52)
HPMC Chassis Codes
Chassis Code Extension
------------ ---------
0xe800035c00e00000 0x0000000000434d6c
0x57000f7300e00000 0x8040004000000000
0xf600105e00e00000 0x000000003f900000
0x140003b200e00000 0x000000000000000b
0x5600109b00e00000 0x000000000002a024
General Registers 0 - 31
00-03 0000000000000000 0000000000d0d3f0 fffffffffed26000 000000000189c3f0
04-07 0000000000086100 0000000000000000 0000000000000020 0000000000000001
08-11 000000000000ffff 0000000000000080 0000000000000800 0000000000000000
12-15 00ffffffffffffff 0000000008000084 0000000000000001 0000000000c44658
16-19 0000000000000001 0000000060300000 000000004001447c 4000000000000000
20-23 0000000000000000 0000000080610800 0000000001b29000 fffffffffed26000
24-27 fffffffffed26000 0000000000004000 0000000000000040 0000000000d053f0
28-31 0000806108618061 0000000100000000 400003ffffff1f38 0000000000d0d3f0
Control Registers 0 - 31
00-03 000000007b3ecf43 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 000000000000da15 0000000000000c32 00000000000000c0 0000000000000020
12-15 0000000000000000 0000000000000000 000000000002a000 c400000000000000
16-19 000000a87ef4dfc6 0000000000000000 0000000000434d6c 000000004ae50090
20-23 00000000a627fffb c000000049937048 000000ff0804fe1f 0000000800000000
24-27 0000000001b29000 000000000465d1d6 0000000000001000 000000000189c078
28-31 0000000077fcd028 000000a87ee40195 0000000000000000 400003ffffff1be8
Space Registers 0 - 7
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 00000000077e8c00 0000000009e19400 0000000000000000
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x0000000000434d70
Check Type = 0x20000000
Cpu State = 0x9e000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
Floating Point Registers 0 - 31
00-03 0800002000000000 0000000000000000 0000000000000000 0000000000000000
04-07 000000f10000000a 3f70fef010fef016 40f6000000000000 406e200000000000
08-11 3ff0000000000005 0000000000016000 40775e8a175e8a1e 0000000000000175
12-15 40c0000000000000 4100000000000000 4100000000000000 4080000000000000
16-19 4080000000000000 4080000000000000 4100000000000000 4100000000000000
20-23 5555555555555555 5555555555555555 0000000000015f25 4058c00000000000
24-27 0000000000000000 0000000000000002 00000000000002c9 0000000000000000
28-31 4084a03333333333 0000000000000294 0000000000000000 0000000000000000
PIM Revision = 0x0000000000000001
CPU ID = 0x0000000000000014
CPU Revision = 0x0000000000000032
Cpu Serial Number = 0x4460898c343f0409
Check Summary = 0x8040004000000000
SAL Timestamp = 0x000000004c7e7ecc
System Firmware Rev. = 0x00000b4b0000119f
PDC Relocation Address = 0x000000003f900000
Available Memory = 0x00000000ffe00000
CPU Diagnose Register 2 = 0x3212026000002228
MIB_STAT = 0x0040000000200000
MIB_LOG1 = 0x0000000000555500
MIB_LOG2 = 0x0000800000000000
MIB_ECC_DATA = 0x1010a6c41010aac0
ICache Info = 0x0000000000000000
DCache Info = 0x0000000000000000
Sharedcache Info1 = 0x0000000000000000
Sharedcache Info2 = 0x0000000000000040
MIB_RSLOG1 = 0x0000000000000004
MIB_RSLOG2 = 0x0000010000000000
MIB_RQLOG = 0x02d0c96fffff3530
MIB_REQLOGa = 0x8000000000000200
MIB_REQLOGb = 0x01000aa400000000
Reserved = 0x0000000000000000
Cache Repair Detail = 0x0000000000000000
PIM Detail Text:
-------------- Memory Error Log Information --------------
No errors logged for this bus
------------ I/O Module Error Log Information ------------
IO Subsystem Log Entries
Found 2 PCI Comp errors
Found 1 PCI Bus error
------------------------------------------------
Detail display of IO subsystem log entries
------------------------------------------
PCI Component Error information
PCI Component Error 1
--- Section Header ---
GUID
data1 0xe429faf6
data2 0x3cb7
data3 0x11d4
datat4 0xbc a7 0 80 c7 3c 88 81
REVISION 0x0200
ERROR_RECOVERY_INFO 0x80
SECTION_LENGTH 0x00000188
VALIDATION_BITS 0x0000000000000023
PCI_COMP_ERROR_STATUS 0x0000000000302000
PCI_COMP_INFO 0x0000000000000000 0x01a7101404000300
Vendor Id/Device Id: 0x1a7/1014
Base Class/Sub Class/Program Interface: 0x03/0/4
Segment/Bus/Device/Function: 0x0/60/1/0
PCI_COMP_MEM_NUM 0
PCI_COMP_IO_NUM 0
PCI_COMP_REGS_DATA_PAIR
Address Data
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
PCI_COMP_OEM_DATA_STRUCT
--- Section Header ---
GUID
data1 0x4f7d86a
data2 0x598b
data3 0x4a0a
data4 0xaa 62 ff 70 73 46 67 4d
LENGTH 232
PHYSICAL_LOCATION 0x000000ffff02ff85
REGISTRATION_NUMBER 0x000000000000000a
CONFIG_REGISTERS_DATA
Offset Size Data
0 8 0x0a30014601a71014
8 8 0x0001402006040003
16 8 0x0000000000000000
24 8 0x3220616140616160
32 8 0x0001fff1b000b000
40 8 0x0000000000000000
48 8 0x0000008000000000
56 8 0x000300fb00000000
128 8 0x0023600800c39007
136 8 0x0020002000200020
0 0 0x0000000000000000
0 0 0x0000000000000000
End of PCI Component Error Information for Error 1
PCI Component Error 2
--- Section Header ---
GUID
data1 0xe429faf6
data2 0x3cb7
data3 0x11d4
datat4 0xbc a7 0 80 c7 3c 88 81
REVISION 0x0200
ERROR_RECOVERY_INFO 0x81
SECTION_LENGTH 0x00000188
VALIDATION_BITS 0x0000000000000023
PCI_COMP_ERROR_STATUS 0x0000000000491000
PCI_COMP_INFO 0x0000000000000000 0x01a7101404000300
Vendor Id/Device Id: 0x1a7/1014
Base Class/Sub Class/Program Interface: 0x03/0/4
Segment/Bus/Device/Function: 0x0/80/1/0
PCI_COMP_MEM_NUM 0
PCI_COMP_IO_NUM 0
PCI_COMP_REGS_DATA_PAIR
Address Data
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
PCI_COMP_OEM_DATA_STRUCT
--- Section Header ---
GUID
data1 0x4f7d86a
data2 0x598b
data3 0x4a0a
data4 0xaa 62 ff 70 73 46 67 4d
LENGTH 232
PHYSICAL_LOCATION 0x000000ffff01ff85
REGISTRATION_NUMBER 0x000000000000000a
CONFIG_REGISTERS_DATA
Offset Size Data
0 8 0x0230014601a71014
8 8 0x0001402006040003
16 8 0x0000000000000000
24 8 0x2220818140818180
32 8 0x0001fff1c000c000
40 8 0x0000000000000000
48 8 0x0000008000000000
56 8 0x000300ff00000000
128 8 0x0033800800c39007
136 8 0x0020002000200020
0 0 0x0000000000000000
0 0 0x0000000000000000
End of PCI Component Error Information for Error 2
End of PCI Component Error Information
PCI Bus Error information
PCI Bus Error 1
--- Section Header ---
GUID
data1 0xe429faf4
data2 0x3cb7
data3 0x11d4
data4 0xbc a7 0 80 c7 3c 88 81
REVISION 0x0200
ERROR_RECOVERY_INFO 0x84
SECTION_LENGTH 0x00000108
VALIDATION_BITS 0x000000000000074f
PCI_BUS_ERROR_STATUS 0x0000000000592000
PCI_BUS_ERROR_TYPE 0x0000000000000000
PCI_BUS_ID 0x0000000000000060
PCI_BUS_ADDRESS 0x0000000000610801
PCI_BUS_DATA 0x0000000000000000
PCI_BUS_CMD 0x0000000000000000
PCI_BUS_REQUESTOR_ID 0x00000000fed26000
PCI_BUS_COMPLETER_ID 0x0000000000000000
PCI_BUS_TARGET_ID 0x0000000000610801
PCI_BUS_OEM_ID 0x0000000000b809d8
Bus OEM Data
CEC Header:
--- OEM Data Header ---
GUID
data1 0x9fe64482
data2 0xa02d
data3 0x4ef7
data4 0xad e6 c6 63 59 62 53 99
--- OEM Data Body ---
CELL_NUMBER 0
SBA_NUMBER 0
ROPE_NUMBER 3
--- Mercury Info ---
ERROR_STATUS 0x000000010000021d
ERROR_MASTER_ID_LOG 0x0000000000000000
INBOUND_ERR_ADDRESS 0x0000000000000000
INBOUND_ERR_ATTRIBUTE 0x0000000000000000
COMPLETION_MESSAGE_LOG 0x0000000000000000
OUTBOUND_ERR_ADDRESS 0x4000000000610801
ERROR_CONFIG 0x0000000000000030
STATUS_INFO_CONTROL 0x0000000000000000
FUNC_ID 0x12b00146122e103c
CAPABILITIES_LIST 0x0f00023700200002
AGP_COMMAND 0x0000000000000000
PCIX_CAPABILITIES 0x2013ff0000010007
OLR_CONTROL 0x00023f9d00032403
CLOCK_CONTROL 0x0000000000000008
BUS_MODE 0x9db964ef2b257ce0
End of PCI Bus Error Information for Error 1
End of PCI Bus Error Information
Return Warnings:
Return Revisions:
FRU INFORMATION
Module Revision
------ --------
PA 8800 CPU Module 3.2 PA 8800 CPU Module 3.2
Board Info!
Format Version : 0x1 Language Code : 0x0
Mfg Date : Mfg Name : JABIL
Product Name : rp3440 SYSTEM BOARD
Serial Number : 52JAPE4521005119
Part Number : A7136-60001
Fru File Tp/Len : 0x1 Fru File :
Revision : A1 Eng Date Code : 4433
Artwork Rev : D Fru Info :
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
先把MP LOG贴上来看看吧,应该有发现
OK,我的处理方法是 先更换PCI板,故障没有消除,换上原来的PCI板,重启一次机器,关机,更换系统扳,机器基本正常,但偶尔还是报网卡MDA写错误,完了重启,再次更换PCI板,故障消失,为什么为这样,我也没想明白.
同意6#的方法,可以按照下面方法排除:
直接把slot2的板卡拔掉先,看是否还出现HPMC重启;如果还重启的话再分析ts99,还对应slot2的话就换PCI笼子;如果不重启的话把原来slot2的板卡插到另外一个槽位,还重启的话分析新ts99,如果发现是新槽位导致HPMC的话,那么就是该板卡的问题了,这时更换该板卡即可。
建议更换 PCI card Rope 3 (slot 2),如果还不行,就更好整个PCI 笼子。
有红灯,看MP里面的syslog
MP-SL,看下日志吧,应该有更具体的报错信息。
回复 2# rfancy
ok.thanks ,i will just try step 3 firstly,then reply the result.
possible fix:
1.if cpu is a FRU,replace cpu module 0
2.if cpu is not a FRU,replace the system board
3.replace I/O card cage