将 .parquet 编码为 io.Bytes
目标:将 Parquet 文件上传到 MinIO - 这需要将文件转换为字节。
我已经能够对 .csv
、.json
和 .txt
执行此操作:
bytes = data.to_csv().encode('utf-8')
bytes = json.dumps(self.data, indent=4, separators=(',', ': ')).encode('utf-8')
bytes = data.encode('utf-8')
MinioConn:
from minio import Minio
class MinioConn:
def __init__(self,
host='foo.com:9000',
access_key='CENSORED', secret_key='CENSORED',
secure=False):
self.host = host
self.access_key = access_key
self.secret_key = secret_key
self.secure = secure
def client(self):
return Minio(self.host, self.access_key, self.secret_key,
secure=self.secure)
我的上传代码:
import pandas as pd
import io
from fastparquet import write
import MinioConn
filename = 'myfile.parquet'
# ---
df = pd.DataFrame(data=[['tom', 10], ['nick', 15], ['juli', 14]],
columns=['Name', 'Age'])
df.to_parquet(filename)
# ---
data = pd.read_parquet(filename)
bytes = data.encode('utf-8')
buffer = io.BytesIO(bytes)
bucket = 'synthetic-data-gen'
client = MinioConn().client()
client.put_object(bucket,
f'foo/bar/{filename}',
data=buffer,
length=len(bytes),
content_type='application/{}'.format(filename.split('.', 1)[1]))
回溯:
Traceback (most recent call last):
File "test.py", line 16, in <module>
bytes = data.encode('utf-8')
File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/pandas/core/generic.py", line 5139, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'encode'
看起来您的帖子主要是代码;请添加更多详细信息。
Goal: Upload a Parquet file to MinIO - this requires converting the file to Bytes.
I've been able to do this for .csv
, .json
and .txt
:
bytes = data.to_csv().encode('utf-8')
bytes = json.dumps(self.data, indent=4, separators=(',', ': ')).encode('utf-8')
bytes = data.encode('utf-8')
MinioConn:
from minio import Minio
class MinioConn:
def __init__(self,
host='foo.com:9000',
access_key='CENSORED', secret_key='CENSORED',
secure=False):
self.host = host
self.access_key = access_key
self.secret_key = secret_key
self.secure = secure
def client(self):
return Minio(self.host, self.access_key, self.secret_key,
secure=self.secure)
My Upload Code:
import pandas as pd
import io
from fastparquet import write
import MinioConn
filename = 'myfile.parquet'
# ---
df = pd.DataFrame(data=[['tom', 10], ['nick', 15], ['juli', 14]],
columns=['Name', 'Age'])
df.to_parquet(filename)
# ---
data = pd.read_parquet(filename)
bytes = data.encode('utf-8')
buffer = io.BytesIO(bytes)
bucket = 'synthetic-data-gen'
client = MinioConn().client()
client.put_object(bucket,
f'foo/bar/{filename}',
data=buffer,
length=len(bytes),
content_type='application/{}'.format(filename.split('.', 1)[1]))
Traceback:
Traceback (most recent call last):
File "test.py", line 16, in <module>
bytes = data.encode('utf-8')
File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/pandas/core/generic.py", line 5139, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'encode'
It looks like your post is mostly code; please add some more details.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您不指定
文件名
,pandas.to_parquet
,它将返回字节。对于旧版本的 pandas:
If you don't specify a
filename
,pandas.to_parquet
, it will return bytes.For older version of pandas: