Spark Spark-sql-kafka - java.lang.NoClassDefFoundError：org/apache/kafka/common/serialization/ByteArraySerializer

发布于 2025-01-19 16:08:25 字数 1743 浏览 2 评论 0 原文

我正在尝试通过。

火花版：3.2.1
Scala版本：2.12.15

遵循其火花壳的指南，包括依赖项，我开始了我的外壳：

spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1

但是，一旦我在外壳中运行类似以下内容：

val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers","http://HOST:PORT").option("subscribe", "my-topic").load()

我会得到以下例外：

java.lang.noclassdeffounderror：org/apache/kafka/common/serialization/bytearrayserializer`

任何想法如何克服这个问题？

我的假设是使用 - 包装，也应加载所有依赖项。但这似乎并非如此。从日志中，我假设软件包已成功加载，包括kafka-clients依赖关系：

org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
resolving dependencies :: org.apache.spark#spark-submit-parent-3b04f646-471c-4cc8-88fb-7e32bc3226ed;1.0
confs: \[default\]
found org.apache.spark#spark-sql-kafka-0-10_2.12;3.2.1 in central
found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.2.1 in central
found org.apache.kafka#kafka-clients;2.8.0 in central
found org.lz4#lz4-java;1.7.1 in central
found org.xerial.snappy#snappy-java;1.1.8.4 in central
found org.slf4j#slf4j-api;1.7.30 in central
found org.apache.hadoop#hadoop-client-runtime;3.3.1 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.apache.hadoop#hadoop-client-api;3.3.1 in central
found org.apache.htrace#htrace-core4;4.1.0-incubating in central
found commons-logging#commons-logging;1.1.3 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found org.apache.commons#commons-pool2;2.6.2 in central

原文

I am experimenting with spark reading from a kafka topic through "Structured Streaming + Kafka Integration Guide".

Spark version: 3.2.1
Scala version: 2.12.15

Following their guide on the spark-shell including the dependencies, I start my shell:

spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1

However, once I run something like the following in my shell:

val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers","http://HOST:PORT").option("subscribe", "my-topic").load()

I get the following exception:

java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer`

Any ideas how to overcome this issue?

My assumption was with using --packages, all dependencies should be loaded as well. But this does not seem to be the case. From the logs I assume that the package gets loaded successfully, including the kafka-clients dependency:

org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
resolving dependencies :: org.apache.spark#spark-submit-parent-3b04f646-471c-4cc8-88fb-7e32bc3226ed;1.0
confs: \[default\]
found org.apache.spark#spark-sql-kafka-0-10_2.12;3.2.1 in central
found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.2.1 in central
found org.apache.kafka#kafka-clients;2.8.0 in central
found org.lz4#lz4-java;1.7.1 in central
found org.xerial.snappy#snappy-java;1.1.8.4 in central
found org.slf4j#slf4j-api;1.7.30 in central
found org.apache.hadoop#hadoop-client-runtime;3.3.1 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.apache.hadoop#hadoop-client-api;3.3.1 in central
found org.apache.htrace#htrace-core4;4.1.0-incubating in central
found commons-logging#commons-logging;1.1.3 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found org.apache.commons#commons-pool2;2.6.2 in central

分享到QQ

分享到微博