kafka集群突然挂掉,求救大神看看什么问题出在哪??
kafka集群,版本是0.8.2.0:hetserver1,hetserver,hetserver3三台主机
hetserver1在17号 21:39开始报错
2020-03-17 21:39:00,040 ERROR kafka.network.Processor: Closing socket for /172.19.4.12 because of error
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
at kafka.network.MultiSend.writeTo(Transmission.scala:101)
at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
at kafka.network.Processor.write(SocketServer.scala:473)
at kafka.network.Processor.run(SocketServer.scala:343)
at java.lang.Thread.run(Thread.java:745)
2020-03-17 21:39:00,071 ERROR kafka.network.Processor: Closing socket for /172.19.4.12 because of error
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
at kafka.network.MultiSend.writeTo(Transmission.scala:101)
at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
at kafka.network.Processor.write(SocketServer.scala:473)
at kafka.network.Processor.run(SocketServer.scala:343)
at java.lang.Thread.run(Thread.java:745)
2020-03-17 21:39:00,073 ERROR kafka.network.Processor: Closing socket for /172.19.4.12 because of error
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
at kafka.network.MultiSend.writeTo(Transmission.scala:101)
at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
at kafka.network.Processor.write(SocketServer.scala:473)
at kafka.network.Processor.run(SocketServer.scala:343)
at java.lang.Thread.run(Thread.java:745)
hetserver2也报错:
2020-03-17 21:39:00,193 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver1/172.19.4.12:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,194 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver1/172.19.4.12:2181, initiating session
2020-03-17 21:39:00,194 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired)
2020-03-17 21:39:00,195 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0xa70e4797252000e has expired, closing socket connection
2020-03-17 21:39:00,195 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hetserver1:2181,hetserver2:2181,hetserver3:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@5e39570d
2020-03-17 21:39:00,196 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2020-03-17 21:39:00,196 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver3/172.19.4.14:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,197 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver3/172.19.4.14:2181, initiating session
2020-03-17 21:39:00,198 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hetserver3/172.19.4.14:2181, sessionid = 0x870e479725d0517, negotiated timeout = 6000
2020-03-17 21:39:00,198 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected)
2020-03-17 21:39:00,297 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 41]: Broker change listener fired for path /brokers/ids with children 42
2020-03-17 21:39:00,301 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 41]: Newly added brokers: , deleted brokers: 41,40, all live brokers: 42
2020-03-17 21:39:00,301 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-41-send-thread], Shutting down
2020-03-17 21:39:00,301 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-41-send-thread], Stopped
2020-03-17 21:39:00,301 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-41-send-thread], Shutdown completed
2020-03-17 21:39:00,302 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-40-send-thread], Shutting down
2020-03-17 21:39:00,302 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-40-send-thread], Stopped
2020-03-17 21:39:00,302 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-40-send-thread], Shutdown completed
2020-03-17 21:39:00,304 INFO kafka.controller.KafkaController: [Controller 41]: Broker failure callback for 41,40
2020-03-17 21:39:00,306 INFO kafka.controller.KafkaController: [Controller 41]: Removed ArrayBuffer() from list of shutting down brokers.
2020-03-17 21:39:00,308 INFO kafka.controller.PartitionStateMachine: [Partition state machine on Controller 41]: Invoking state change to OfflinePartition for partitions [__consumer_offsets,19],[__consumer_offsets,47],[__consumer_offsets,41],[__consumer_offsets,29],[session-location,0],[__consumer_offsets,17],[__consumer_offsets,10],[hetASUPfldTopic,0],[__consumer_offsets,14],[__consumer_offsets,40],[hetACDMTopic,0],[__consumer_offsets,26],[__consumer_offsets,20],[__consumer_offsets,22],[__consumer_offsets,5],[push-result-error,0],[__consumer_offsets,8],[__consumer_offsets,23],[__consumer_offsets,11],[hetAsupMsgTopic,0],[__consumer_offsets,13],[__consumer_offsets,49],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_offsets,37],[__consumer_offsets,44],[__consumer_offsets,31],[__consumer_offsets,34],[__consumer_offsets,46],[btTaskTopic,0],[__consumer_offsets,25],[__consumer_offsets,43],[__consumer_offsets,32],[__consumer_offsets,35],[__consumer_offsets,7],[__consumer_offsets,38],[__consumer_offsets,1],[HetPetaTopic,0],[__consumer_offsets,2],[__consumer_offsets,16]
2020-03-17 21:39:00,314 ERROR state.change.logger: Controller 41 epoch 57126 aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127.
2020-03-17 21:39:00,314 ERROR state.change.logger: Controller 41 epoch 57126 encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127..
2020-03-17 21:39:00,314 ERROR state.change.logger: Controller 41 epoch 57126 initiated state change for partition [__consumer_offsets,19] from OfflinePartition to OnlinePartition failed
kafka.common.StateChangeFailedException: encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127..
at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:380)
at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:453)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
at kafka.utils.Utils$.inLock(Utils.scala:561)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
Caused by: kafka.common.StateChangeFailedException: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127.
at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:354)
... 23 more
hetserver3报错信息如下:
2020-03-17 21:39:00,660 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver3/172.19.4.14:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,661 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver3/172.19.4.14:2181, initiating session
2020-03-17 21:39:00,661 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired)
2020-03-17 21:39:00,661 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x870e47715360000 has expired, closing socket connection
2020-03-17 21:39:00,661 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hetserver1:2181,hetserver2:2181,hetserver3:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@68246cf
2020-03-17 21:39:00,663 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver2/172.19.4.13:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,664 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver2/172.19.4.13:2181, initiating session
2020-03-17 21:39:00,666 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hetserver2/172.19.4.13:2181, sessionid = 0xa70e47972520504, negotiated timeout = 6000
2020-03-17 21:39:00,666 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected)
2020-03-17 21:39:00,666 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2020-03-17 21:39:00,763 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 40]: Broker change listener fired for path /brokers/ids with children 42
2020-03-17 21:39:00,764 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 40]: Newly added brokers: , deleted brokers: 41,40, all live brokers: 42
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-41-send-thread], Shutting down
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-41-send-thread], Stopped
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-41-send-thread], Shutdown completed
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-40-send-thread], Shutting down
2020-03-17 21:39:00,764 INFO kafka.network.Processor: Closing socket connection to /172.19.4.14.
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-40-send-thread], Stopped
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-40-send-thread], Shutdown completed
2020-03-17 21:39:00,764 INFO kafka.controller.KafkaController: [Controller 40]: Broker failure callback for 41,40
2020-03-17 21:39:00,765 INFO kafka.controller.KafkaController: [Controller 40]: Removed ArrayBuffer() from list of shutting down brokers.
2020-03-17 21:39:00,765 INFO kafka.controller.PartitionStateMachine: [Partition state machine on Controller 40]: Invoking state change to OfflinePartition for partitions [__consumer_offsets,19],[__consumer_offsets,30],[__consumer_offsets,47],[__consumer_offsets,29],[__consumer_offsets,41],[session-location,0],[HetPetaAddTopic,0],[__consumer_offsets,39],[hetASUPfldTopic,0],[__consumer_offsets,10],[__consumer_offsets,17],[hetFltMsgTopic,0],[__consumer_offsets,14],[__consumer_offsets,40],[hetACDMTopic,0],[__consumer_offsets,18],[__consumer_offsets,0],[__consumer_offsets,26],[__consumer_offsets,24],[__consumer_offsets,33],[__consumer_offsets,20],[__consumer_offsets,21],[__consumer_offsets,3],[__consumer_offsets,5],[__consumer_offsets,22],[hetVideoTopic,0],[__consumer_offsets,12],[push-result-error,0],[__consumer_offsets,8],[__consumer_offsets,23],[__consumer_offsets,15],[__consumer_offsets,11],[hetAsupMsgTopic,0],[__consumer_offsets,48],[__consumer_offsets,13],[__consumer_offsets,49],[__consumer_offsets,6],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_offsets,37],[__consumer_offsets,31],[push-result,0],[__consumer_offsets,44],[hetTaskTopic,0],[__consumer_offsets,42],[__consumer_offsets,34],[__consumer_offsets,46],[btTaskTopic,0],[__consumer_offsets,25],[__consumer_offsets,27],[__consumer_offsets,45],[__consumer_offsets,32],[__consumer_offsets,43],[__consumer_offsets,36],[__consumer_offsets,35],[__consumer_offsets,7],[__consumer_offsets,38],[__consumer_offsets,9],[__consumer_offsets,1],[HetPetaTopic,0],[__consumer_offsets,2],[__consumer_offsets,16]
2020-03-17 21:39:00,767 ERROR state.change.logger: Controller 40 epoch 57125 aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127.
2020-03-17 21:39:00,767 ERROR state.change.logger: Controller 40 epoch 57125 encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127..
2020-03-17 21:39:00,767 ERROR state.change.logger: Controller 40 epoch 57125 initiated state change for partition [__consumer_offsets,19] from OfflinePartition to OnlinePartition failed
kafka.common.StateChangeFailedException: encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127..
at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:380)
at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:453)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
at kafka.utils.Utils$.inLock(Utils.scala:561)
at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
Caused by: kafka.common.StateChangeFailedException: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127.
at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:354)
... 23 more
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
磁盘满了?