df.show()在pyspark返回“未授权的exception:用户my_user在< table; table system.size_estimates>或任何父母

发布于 2025-02-06 17:12:10 字数 865 浏览 1 评论 0原文

我正在尝试从Cassandra表中读取记录,

此代码可以正常工作:

df = spark.read \
    .format("org.apache.spark.sql.cassandra") \
    .option("spark.cassandra.connection.host", "my_host") \
    .option("spark.cassandra.connection.port", "9042") \
    .option("spark.cassandra.auth.username", "my_user") \
    .option("spark.cassandra.auth.password", "my_pass") \
    .option("keyspace", "my_keyspace") \
    .option("table", "my_table") \
    .load()

但是当我尝试显示记录时,

df.show(3)

我会得到此例外,

com.datastax.oss.driver.api.core.servererrors.UnauthorizedException: User my_user has no SELECT permission on <table system.size_estimates> or any of its parents

关键是我只能获得my_keyspace的所有权限。

但是,我成功地将CQLSH连接到同一Cassandra主机:具有相同用户/通行证的端口,并在my_keyspace中执行任何操作。

请建议火花代码有什么问题以及如何在这种情况下采取行动?

I'm trying to read records from Cassandra table

this code works fine:

df = spark.read \
    .format("org.apache.spark.sql.cassandra") \
    .option("spark.cassandra.connection.host", "my_host") \
    .option("spark.cassandra.connection.port", "9042") \
    .option("spark.cassandra.auth.username", "my_user") \
    .option("spark.cassandra.auth.password", "my_pass") \
    .option("keyspace", "my_keyspace") \
    .option("table", "my_table") \
    .load()

but when i try to show records

df.show(3)

i get this exception

com.datastax.oss.driver.api.core.servererrors.UnauthorizedException: User my_user has no SELECT permission on <table system.size_estimates> or any of its parents

The point is i have all permissions to my_keyspace only.

But i successfully connect with cqlsh to same cassandra host:port with same user/pass and do whatever in my_keyspace.

Please advice what's wrong with spark code and how to act in such situation?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不喜欢何必死缠烂打 2025-02-13 17:12:10

Spark Cassandra连接器使用存储在system.size_estimates中的值估算Cassandra表的大小。连接器需要估算表尺寸,以计算火花隔板的数量。请参阅我在这篇文章中的答案

如果您,已自动给出身份验证的用户/角色,请阅读对某些系统表的访问:

system_schema.keyspaces
system_schema.columns
system_schema.tables
system.local
system.peers

但是您需要明确授权您的火花用户,因此它可以访问size_estimates with:

GRANT SELECT ON system.size_estimates TO spark_role

请注意,请注意,该角色只需要读取访问()选择权限)到表。干杯!

The Spark Cassandra connector estimates the size of the Cassandra tables using the values stored in system.size_estimates. The connector needs an estimate of the table size in order to calculate the number of Spark partitions. See my answer in this post for details.

If you've enabled the authorizer in Cassandra, authenticated users/roles are automatically given read access to some system tables:

system_schema.keyspaces
system_schema.columns
system_schema.tables
system.local
system.peers

But you will need to explicitly authorize your Spark user so it can access the size_estimates table with:

GRANT SELECT ON system.size_estimates TO spark_role

Note that the role only needs read access (SELECT permission) to the table. Cheers!

美羊羊 2025-02-13 17:12:10

您需要授予该用户的读取访问system.size_estimation

You need to grant read access to system.size_estimation for that user

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文