使用CLI从Hadoop访问Hadoop的Azure ADLS Gen 2
我基本上想使用hadoop fs -ls
从独立的本地Cloudera群集中列出ADLS Gen 2容器下的文件。但是,我遇到了此错误:
命令从bash:
hadoop fs -Dfs.azure.account.key.accountName.dfs.core.windows.net="accessKey" -Dfs.azure.createRemoteFileSystemDuringInitialization=true -ls abfss://[email protected]/
错误:
WARN fs.FileSystem: Failed to initialize fileystem abfss://[email protected]/:Invalid configuration value detected for fs.azure.account.key ls: Invalid configuration value detected for fs.azure.account.key
然后,我通过配置:
sc._jsc.hadoopConfiguration().set('fs.azure.account.auth.type.accountName.dfs.core.windows.net','SharedKey')
sc._jsc.hadoopConfiguration().set('fs.azure.account.key.accountName.core.windows.net','accessKey')
sc._jsc.hadoopConfiguration().set('fs.abfss.impl','org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem')
spark shell的错误:
WARN fs.FileSystem: Failed to initialize fileystem abfss://[email protected]/: Configuration property accountName.dfs.core.windows.net not found
注意:
- pyspark read和在设置Spark Conf()之后,写信给此ADLSGEN2容器正常工作。问题只有当我使用-fs命令尝试时,我最终也想与pyspark一起使用distcpy()。
- 我没有在core site.xml上配置任何内容。相反,我想在程序或脚本上下文中独立传递所有键,参数和任何设置,甚至在bash上。寻找符合此标准的解决方案。
- 另外,由于我只是在运行POC,因此不要为此使用OAuth。目前,我只想使用共享键进行测试检查。
有人可以帮我在这里确定这个问题吗?
I basically want to list files under an ADLS Gen 2 Container using hadoop fs -ls
from a standalone on-prem Cloudera Cluster. However I am getting this error:
Command ran from bash:
hadoop fs -Dfs.azure.account.key.accountName.dfs.core.windows.net="accessKey" -Dfs.azure.createRemoteFileSystemDuringInitialization=true -ls abfss://[email protected]/
Error:
WARN fs.FileSystem: Failed to initialize fileystem abfss://[email protected]/:Invalid configuration value detected for fs.azure.account.key ls: Invalid configuration value detected for fs.azure.account.key
Then, I ran this same fs -ls
command from within a spark program by configuring:
sc._jsc.hadoopConfiguration().set('fs.azure.account.auth.type.accountName.dfs.core.windows.net','SharedKey')
sc._jsc.hadoopConfiguration().set('fs.azure.account.key.accountName.core.windows.net','accessKey')
sc._jsc.hadoopConfiguration().set('fs.abfss.impl','org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem')
The error from Spark Shell:
WARN fs.FileSystem: Failed to initialize fileystem abfss://[email protected]/: Configuration property accountName.dfs.core.windows.net not found
Note:
- PySpark read and write to this ADLSGen2 container is working as expected after setting up the spark conf(). The issue is only when I try this with -fs commands as I eventually want to be using distcpy() as well, along with PySpark.
- I haven't configured anything on core-site.xml. Rather, I want to pass all keys, parameters and any settings independently within the program or script's context, even on bash. Looking for a solution that meets this criteria.
- Also, not using oAuth for this, since I am just running a POC. For now, I am only interested in checking this out using SharedKey for testing.
Can someone help me identify the issue here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据 文章,请注意以下限制:
adls 不支持默认文件系统。请勿将 默认文件系统 属性(fs。DeftFS)设置为 abfss:// uri。您可以使用 adls 作为 secondary 文件系统,而 hdfs 仍然是 主要 文件系统。
请按照参考使用:
参考:
https://www.youtube.com/watch?v = h3jyrhl4y4m
https://docs.cloudera.com/runtime/7.2.10/cloud-data-access/topics/cr-cda-hadoop-file-system-commands.html
As per the article, please note the following limitations:
ADLS is not supported as the default filesystem. Do not set the default file system property (fs. default fs) to an abfss:// URI. You can use ADLS as a secondary filesystem while HDFS remains the primary filesystem.
Please follow the reference it has detailed information about:
Reference:
https://www.youtube.com/watch?v=h3jYrhl4Y4M
https://docs.cloudera.com/runtime/7.2.10/cloud-data-access/topics/cr-cda-hadoop-file-system-commands.html