如何将.parquet文件从本地机器上传到Azure存储数据湖Gen2?
我在本地计算机中有一组.parquet文件,这些文件正在尝试上传到Data Lake Gen2中的容器。
我不能执行以下操作:
def upload_file_to_directory():
try:
file_system_client = service_client.get_file_system_client(file_system="my-file-system")
directory_client = file_system_client.get_directory_client("my-directory")
file_client = directory_client.create_file("uploaded-file.parquet")
local_file = open("C:\\file-to-upload.parquet",'r')
file_contents = local_file.read()
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
except Exception as e:
print(e)
因为.parquet文件无法通过.read()函数读取。
当我尝试这样做时:
def upload_file_to_directory():
file_system_client = service_client.get_file_system_client(file_system="my-file-system")
directory_client = file_system_client.get_directory_client("my-directory")
file_client = directory_client.create_file("uploaded-file.parquet")
file_client.upload_file("C:\\file-to-upload.txt",'r')
我会收到以下错误:
AttributeError: 'DataLakeFileClient' object has no attribute 'upload_file'
有什么建议吗?
I have a set of .parquet files in my local machine that I am trying to upload to a container in Data Lake Gen2.
I cannot do the following:
def upload_file_to_directory():
try:
file_system_client = service_client.get_file_system_client(file_system="my-file-system")
directory_client = file_system_client.get_directory_client("my-directory")
file_client = directory_client.create_file("uploaded-file.parquet")
local_file = open("C:\\file-to-upload.parquet",'r')
file_contents = local_file.read()
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
except Exception as e:
print(e)
because the .parquet file cannot read by the .read() function.
When I try do this:
def upload_file_to_directory():
file_system_client = service_client.get_file_system_client(file_system="my-file-system")
directory_client = file_system_client.get_directory_client("my-directory")
file_client = directory_client.create_file("uploaded-file.parquet")
file_client.upload_file("C:\\file-to-upload.txt",'r')
I get the following error:
AttributeError: 'DataLakeFileClient' object has no attribute 'upload_file'
Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您之所以收到此信息,是因为您已导入
dataLakeFileClient
模块。尝试安装datalakeserviceclient
,因为它具有upload_file
方法。但是,要读取.parquet文件,解决方案之一是使用
pandas
。以下是对我有用的代码。而且,您可能需要导入
dataLakeFileClient
库来使此工作:结果:
You are receiving this because you have imported
DataLakeFileClient
module. Try installingDataLakeServiceClient
since it hasupload_file
method.However, to read the .parquet file, one of the workarounds is to use
pandas
. Below is the code that worked for me.and you may be required to import
DataLakeFileClient
library to make this work:RESULTS: