kafka集群突然挂掉,求救大神看看什么问题出在哪??

发布于 2022-01-03 01:36:23 字数 21507 浏览 745 评论 1

kafka集群,版本是0.8.2.0:hetserver1,hetserver,hetserver3三台主机

hetserver1在17号 21:39开始报错

2020-03-17 21:39:00,040 ERROR kafka.network.Processor: Closing socket for /172.19.4.12 because of error
java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
    at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
    at kafka.network.MultiSend.writeTo(Transmission.scala:101)
    at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
    at kafka.network.Processor.write(SocketServer.scala:473)
    at kafka.network.Processor.run(SocketServer.scala:343)
    at java.lang.Thread.run(Thread.java:745)
2020-03-17 21:39:00,071 ERROR kafka.network.Processor: Closing socket for /172.19.4.12 because of error
java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
    at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
    at kafka.network.MultiSend.writeTo(Transmission.scala:101)
    at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
    at kafka.network.Processor.write(SocketServer.scala:473)
    at kafka.network.Processor.run(SocketServer.scala:343)
    at java.lang.Thread.run(Thread.java:745)
2020-03-17 21:39:00,073 ERROR kafka.network.Processor: Closing socket for /172.19.4.12 because of error
java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
    at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
    at kafka.network.MultiSend.writeTo(Transmission.scala:101)
    at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
    at kafka.network.Processor.write(SocketServer.scala:473)
    at kafka.network.Processor.run(SocketServer.scala:343)
    at java.lang.Thread.run(Thread.java:745)

hetserver2也报错:

2020-03-17 21:39:00,193 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver1/172.19.4.12:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,194 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver1/172.19.4.12:2181, initiating session
2020-03-17 21:39:00,194 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired)
2020-03-17 21:39:00,195 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0xa70e4797252000e has expired, closing socket connection
2020-03-17 21:39:00,195 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hetserver1:2181,hetserver2:2181,hetserver3:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@5e39570d
2020-03-17 21:39:00,196 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2020-03-17 21:39:00,196 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver3/172.19.4.14:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,197 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver3/172.19.4.14:2181, initiating session
2020-03-17 21:39:00,198 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hetserver3/172.19.4.14:2181, sessionid = 0x870e479725d0517, negotiated timeout = 6000
2020-03-17 21:39:00,198 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected)
2020-03-17 21:39:00,297 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 41]: Broker change listener fired for path /brokers/ids with children 42
2020-03-17 21:39:00,301 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 41]: Newly added brokers: , deleted brokers: 41,40, all live brokers: 42
2020-03-17 21:39:00,301 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-41-send-thread], Shutting down
2020-03-17 21:39:00,301 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-41-send-thread], Stopped
2020-03-17 21:39:00,301 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-41-send-thread], Shutdown completed
2020-03-17 21:39:00,302 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-40-send-thread], Shutting down
2020-03-17 21:39:00,302 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-40-send-thread], Stopped
2020-03-17 21:39:00,302 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-40-send-thread], Shutdown completed
2020-03-17 21:39:00,304 INFO kafka.controller.KafkaController: [Controller 41]: Broker failure callback for 41,40
2020-03-17 21:39:00,306 INFO kafka.controller.KafkaController: [Controller 41]: Removed ArrayBuffer() from list of shutting down brokers.
2020-03-17 21:39:00,308 INFO kafka.controller.PartitionStateMachine: [Partition state machine on Controller 41]: Invoking state change to OfflinePartition for partitions [__consumer_offsets,19],[__consumer_offsets,47],[__consumer_offsets,41],[__consumer_offsets,29],[session-location,0],[__consumer_offsets,17],[__consumer_offsets,10],[hetASUPfldTopic,0],[__consumer_offsets,14],[__consumer_offsets,40],[hetACDMTopic,0],[__consumer_offsets,26],[__consumer_offsets,20],[__consumer_offsets,22],[__consumer_offsets,5],[push-result-error,0],[__consumer_offsets,8],[__consumer_offsets,23],[__consumer_offsets,11],[hetAsupMsgTopic,0],[__consumer_offsets,13],[__consumer_offsets,49],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_offsets,37],[__consumer_offsets,44],[__consumer_offsets,31],[__consumer_offsets,34],[__consumer_offsets,46],[btTaskTopic,0],[__consumer_offsets,25],[__consumer_offsets,43],[__consumer_offsets,32],[__consumer_offsets,35],[__consumer_offsets,7],[__consumer_offsets,38],[__consumer_offsets,1],[HetPetaTopic,0],[__consumer_offsets,2],[__consumer_offsets,16]
2020-03-17 21:39:00,314 ERROR state.change.logger: Controller 41 epoch 57126 aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127.
2020-03-17 21:39:00,314 ERROR state.change.logger: Controller 41 epoch 57126 encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127..
2020-03-17 21:39:00,314 ERROR state.change.logger: Controller 41 epoch 57126 initiated state change for partition [__consumer_offsets,19] from OfflinePartition to OnlinePartition failed
kafka.common.StateChangeFailedException: encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127..
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:380)
    at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
    at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:453)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
    at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
    at kafka.utils.Utils$.inLock(Utils.scala:561)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
    at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
    at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
Caused by: kafka.common.StateChangeFailedException: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127.
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:354)
    ... 23 more

hetserver3报错信息如下:

2020-03-17 21:39:00,660 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver3/172.19.4.14:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,661 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver3/172.19.4.14:2181, initiating session
2020-03-17 21:39:00,661 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired)
2020-03-17 21:39:00,661 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x870e47715360000 has expired, closing socket connection
2020-03-17 21:39:00,661 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hetserver1:2181,hetserver2:2181,hetserver3:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@68246cf
2020-03-17 21:39:00,663 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver2/172.19.4.13:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,664 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver2/172.19.4.13:2181, initiating session
2020-03-17 21:39:00,666 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hetserver2/172.19.4.13:2181, sessionid = 0xa70e47972520504, negotiated timeout = 6000
2020-03-17 21:39:00,666 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected)
2020-03-17 21:39:00,666 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2020-03-17 21:39:00,763 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 40]: Broker change listener fired for path /brokers/ids with children 42
2020-03-17 21:39:00,764 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 40]: Newly added brokers: , deleted brokers: 41,40, all live brokers: 42
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-41-send-thread], Shutting down
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-41-send-thread], Stopped
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-41-send-thread], Shutdown completed
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-40-send-thread], Shutting down
2020-03-17 21:39:00,764 INFO kafka.network.Processor: Closing socket connection to /172.19.4.14.
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-40-send-thread], Stopped
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-40-send-thread], Shutdown completed
2020-03-17 21:39:00,764 INFO kafka.controller.KafkaController: [Controller 40]: Broker failure callback for 41,40
2020-03-17 21:39:00,765 INFO kafka.controller.KafkaController: [Controller 40]: Removed ArrayBuffer() from list of shutting down brokers.
2020-03-17 21:39:00,765 INFO kafka.controller.PartitionStateMachine: [Partition state machine on Controller 40]: Invoking state change to OfflinePartition for partitions [__consumer_offsets,19],[__consumer_offsets,30],[__consumer_offsets,47],[__consumer_offsets,29],[__consumer_offsets,41],[session-location,0],[HetPetaAddTopic,0],[__consumer_offsets,39],[hetASUPfldTopic,0],[__consumer_offsets,10],[__consumer_offsets,17],[hetFltMsgTopic,0],[__consumer_offsets,14],[__consumer_offsets,40],[hetACDMTopic,0],[__consumer_offsets,18],[__consumer_offsets,0],[__consumer_offsets,26],[__consumer_offsets,24],[__consumer_offsets,33],[__consumer_offsets,20],[__consumer_offsets,21],[__consumer_offsets,3],[__consumer_offsets,5],[__consumer_offsets,22],[hetVideoTopic,0],[__consumer_offsets,12],[push-result-error,0],[__consumer_offsets,8],[__consumer_offsets,23],[__consumer_offsets,15],[__consumer_offsets,11],[hetAsupMsgTopic,0],[__consumer_offsets,48],[__consumer_offsets,13],[__consumer_offsets,49],[__consumer_offsets,6],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_offsets,37],[__consumer_offsets,31],[push-result,0],[__consumer_offsets,44],[hetTaskTopic,0],[__consumer_offsets,42],[__consumer_offsets,34],[__consumer_offsets,46],[btTaskTopic,0],[__consumer_offsets,25],[__consumer_offsets,27],[__consumer_offsets,45],[__consumer_offsets,32],[__consumer_offsets,43],[__consumer_offsets,36],[__consumer_offsets,35],[__consumer_offsets,7],[__consumer_offsets,38],[__consumer_offsets,9],[__consumer_offsets,1],[HetPetaTopic,0],[__consumer_offsets,2],[__consumer_offsets,16]
2020-03-17 21:39:00,767 ERROR state.change.logger: Controller 40 epoch 57125 aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127.
2020-03-17 21:39:00,767 ERROR state.change.logger: Controller 40 epoch 57125 encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127..
2020-03-17 21:39:00,767 ERROR state.change.logger: Controller 40 epoch 57125 initiated state change for partition [__consumer_offsets,19] from OfflinePartition to OnlinePartition failed
kafka.common.StateChangeFailedException: encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127..
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:380)
    at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
    at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:453)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
    at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
    at kafka.utils.Utils$.inLock(Utils.scala:561)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
    at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
    at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
Caused by: kafka.common.StateChangeFailedException: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127.
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:354)
    ... 23 more

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

葬花如无物 2022-01-06 21:02:01

磁盘满了?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文