使用CLI从Hadoop访问Hadoop的Azure ADLS Gen 2

发布于 2025-02-01 02:10:09 字数 1924 浏览 5 评论 0原文

我基本上想使用hadoop fs -ls从独立的本地Cloudera群集中列出ADLS Gen 2容器下的文件。但是,我遇到了此错误:

命令从bash:

hadoop fs -Dfs.azure.account.key.accountName.dfs.core.windows.net="accessKey" -Dfs.azure.createRemoteFileSystemDuringInitialization=true -ls abfss://[email protected]/

错误:

WARN fs.FileSystem: Failed to initialize fileystem abfss://[email protected]/:Invalid configuration value detected for fs.azure.account.key ls: Invalid configuration value detected for fs.azure.account.key

然后,我通过配置:

sc._jsc.hadoopConfiguration().set('fs.azure.account.auth.type.accountName.dfs.core.windows.net','SharedKey')
sc._jsc.hadoopConfiguration().set('fs.azure.account.key.accountName.core.windows.net','accessKey')
sc._jsc.hadoopConfiguration().set('fs.abfss.impl','org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem')

spark shell的错误:

WARN fs.FileSystem: Failed to initialize fileystem abfss://[email protected]/: Configuration property accountName.dfs.core.windows.net not found

注意:

  • pyspark read和在设置Spark Conf()之后,写信给此ADLSGEN2容器正常工作。问题只有当我使用-fs命令尝试时,我最终也想与pyspark一起使用distcpy()。
  • 我没有在core site.xml上配置任何内容。相反,我想在程序或脚本上下文中独立传递所有键,参数和任何设置,甚至在bash上。寻找符合此标准的解决方案。
  • 另外,由于我只是在运行POC,因此不要为此使用OAuth。目前,我只想使用共享键进行测试检查。

有人可以帮我在这里确定这个问题吗?

I basically want to list files under an ADLS Gen 2 Container using hadoop fs -ls from a standalone on-prem Cloudera Cluster. However I am getting this error:

Command ran from bash:

hadoop fs -Dfs.azure.account.key.accountName.dfs.core.windows.net="accessKey" -Dfs.azure.createRemoteFileSystemDuringInitialization=true -ls abfss://[email protected]/

Error:

WARN fs.FileSystem: Failed to initialize fileystem abfss://[email protected]/:Invalid configuration value detected for fs.azure.account.key ls: Invalid configuration value detected for fs.azure.account.key

Then, I ran this same fs -ls command from within a spark program by configuring:

sc._jsc.hadoopConfiguration().set('fs.azure.account.auth.type.accountName.dfs.core.windows.net','SharedKey')
sc._jsc.hadoopConfiguration().set('fs.azure.account.key.accountName.core.windows.net','accessKey')
sc._jsc.hadoopConfiguration().set('fs.abfss.impl','org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem')

The error from Spark Shell:

WARN fs.FileSystem: Failed to initialize fileystem abfss://[email protected]/: Configuration property accountName.dfs.core.windows.net not found

Note:

  • PySpark read and write to this ADLSGen2 container is working as expected after setting up the spark conf(). The issue is only when I try this with -fs commands as I eventually want to be using distcpy() as well, along with PySpark.
  • I haven't configured anything on core-site.xml. Rather, I want to pass all keys, parameters and any settings independently within the program or script's context, even on bash. Looking for a solution that meets this criteria.
  • Also, not using oAuth for this, since I am just running a POC. For now, I am only interested in checking this out using SharedKey for testing.

Can someone help me identify the issue here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

黄昏下泛黄的笔记 2025-02-08 02:10:09

根据 文章注意以下限制:

adls 不支持默认文件系统。请勿将 默认文件系统 属性(fs。DeftFS)设置为 abfss:// uri。您可以使用 adls 作为 secondary 文件系统,而 hdfs 仍然是 主要 文件系统。

请按照参考使用:

  • CLI访问Hadoop的访问ADLS Gen 2的
  • 使用CORE-SITE.xml上配置的
  • 详细信息。您可以通过键和参数。

参考:

https://www.youtube.com/watch?v = h3jyrhl4y4m

https://docs.cloudera.com/runtime/7.2.10/cloud-data-access/topics/cr-cda-hadoop-file-system-commands.html

As per the article, please note the following limitations:

ADLS is not supported as the default filesystem. Do not set the default file system property (fs. default fs) to an abfss:// URI. You can use ADLS as a secondary filesystem while HDFS remains the primary filesystem.

Please follow the reference it has detailed information about:

  • Access Azure ADLS Gen 2 from Hadoop On-Prem using CLI
  • Configured on core-site.xml
  • You can pass keys and parameters.

Reference:

https://www.youtube.com/watch?v=h3jYrhl4Y4M

https://docs.cloudera.com/runtime/7.2.10/cloud-data-access/topics/cr-cda-hadoop-file-system-commands.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文