記錄一次kafka的OOM報錯:
情況是這樣的,我在自己的win10上安裝了zookeeper和kafka,用來調試。
第一次啟動是ok的,消費端和生產端都是可以正常運行的。
然后,我嘗試用代碼去循環生產數據,kafka就掛了。
接着我重啟kafka,就再也啟動不了了,查看啟動失敗的日志報了OOM。報錯內容如下:
[2021-11-07 20:16:13,683] ERROR Error while creating log for __consumer_offsets-41 in dir D:\software\kafka_2.11-1.1.0\logs (kafka.server.LogDirFailureChannel) java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:67) at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:53) at kafka.log.LogSegment$.open(LogSegment.scala:560) at kafka.log.Log.loadSegments(Log.scala:412) at kafka.log.Log.<init>(Log.scala:216) at kafka.log.Log$.apply(Log.scala:1747) at kafka.log.LogManager$$anonfun$getOrCreateLog$1.apply(LogManager.scala:673) at kafka.log.LogManager$$anonfun$getOrCreateLog$1.apply(LogManager.scala:641) at scala.Option.getOrElse(Option.scala:121) at kafka.log.LogManager.getOrCreateLog(LogManager.scala:641) at kafka.cluster.Partition$$anonfun$getOrCreateReplica$1.apply(Partition.scala:177) at kafka.cluster.Partition$$anonfun$getOrCreateReplica$1.apply(Partition.scala:173) at kafka.utils.Pool.getAndMaybePut(Pool.scala:65) at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:172) at kafka.cluster.Partition$$anonfun$6$$anonfun$8.apply(Partition.scala:259) at kafka.cluster.Partition$$anonfun$6$$anonfun$8.apply(Partition.scala:259) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.cluster.Partition$$anonfun$6.apply(Partition.scala:259) at kafka.cluster.Partition$$anonfun$6.apply(Partition.scala:253) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:258) at kafka.cluster.Partition.makeLeader(Partition.scala:253) at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:1165) at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:1163) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:1163) at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:1083) at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:183) at kafka.server.KafkaApis.handle(KafkaApis.scala:108) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) ... 42 more [2021-11-07 20:16:13,689] INFO [ReplicaManager broker=0] Stopping serving replicas in dir D:\software\kafka_2.11-1.1.0\logs (kafka.server.ReplicaManager) [2021-11-07 20:16:13,693] ERROR [ReplicaManager broker=0] Error while making broker the leader for partition Topic: __consumer_offsets; Partition: 41; Leader: None; AllReplicas: ; InSyncReplicas: in dir None (kafka.server.ReplicaManager) org.apache.kafka.common.errors.KafkaStorageException: Error while creating log for __consumer_offsets-41 in dir D:\software\kafka_2.11-1.1.0\logs Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:67) at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:53) at kafka.log.LogSegment$.open(LogSegment.scala:560) at kafka.log.Log.loadSegments(Log.scala:412) at kafka.log.Log.<init>(Log.scala:216) at kafka.log.Log$.apply(Log.scala:1747) at kafka.log.LogManager$$anonfun$getOrCreateLog$1.apply(LogManager.scala:673) at kafka.log.LogManager$$anonfun$getOrCreateLog$1.apply(LogManager.scala:641) at scala.Option.getOrElse(Option.scala:121) at kafka.log.LogManager.getOrCreateLog(LogManager.scala:641) at kafka.cluster.Partition$$anonfun$getOrCreateReplica$1.apply(Partition.scala:177) at kafka.cluster.Partition$$anonfun$getOrCreateReplica$1.apply(Partition.scala:173) at kafka.utils.Pool.getAndMaybePut(Pool.scala:65) at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:172) at kafka.cluster.Partition$$anonfun$6$$anonfun$8.apply(Partition.scala:259) at kafka.cluster.Partition$$anonfun$6$$anonfun$8.apply(Partition.scala:259) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.cluster.Partition$$anonfun$6.apply(Partition.scala:259) at kafka.cluster.Partition$$anonfun$6.apply(Partition.scala:253) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:258) at kafka.cluster.Partition.makeLeader(Partition.scala:253) at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:1165) at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:1163) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:1163) at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:1083) at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:183) at kafka.server.KafkaApis.handle(KafkaApis.scala:108) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) ... 42 more [2021-11-07 20:16:13,836] ERROR Error while creating log for __consumer_offsets-32 in dir D:\software\kafka_2.11-1.1.0\logs (kafka.server.LogDirFailureChannel)
解決過程,
首先,嘗試重啟kafka,刪除kafka的日志,重啟zookeeper,甚至關機重啟windows系統,都沒有用。
kafka的日志路徑:%KAFKA_HOME%\config\server.peoperties中的log.dirs=logs
然后,在網上查找解決方案,修改kafka-server-start.bat中的JVM參數,將兩個1G修改為512M,結果kafka可以啟動一會,但是馬上又掛掉。
最后通過觀察,發現每次重啟kafka的時候,logs下面都會有一堆文件產生(而且確認每次啟動前我都手動刪除過了),很奇怪,不知道這些數據是在哪里緩存的,最終把zookeeper中的logs下面的日志全部刪掉,就ok了。
zookeeper的日志路徑:在%ZOOKEEPER%conf\zoo.cfg中的dataDir=D:/logs/zookeeper
結論:kafka啟動時候的oom主要是因為,數據緩存到依賴的zookeeper中了。驚不驚喜,意不意外
備注:折磨我兩天的問題解決了,一些截圖暫時沒有,后續再補齊,今天周五了,容我先好好休息一下