将 .parquet 编码为 io.Bytes

发布于 2025-01-13 00:05:04 字数 2002 浏览 1 评论 0原文

目标:将 Parquet 文件上传到 MinIO - 这需要将文件转换为字节。

我已经能够对 .csv.json.txt 执行此操作:

bytes = data.to_csv().encode('utf-8')
bytes = json.dumps(self.data, indent=4, separators=(',', ': ')).encode('utf-8')
bytes = data.encode('utf-8')

MinioConn:

from minio import Minio


class MinioConn:
    def __init__(self,
                 host='foo.com:9000',
                 access_key='CENSORED', secret_key='CENSORED',
                 secure=False):
        self.host = host
        self.access_key = access_key
        self.secret_key = secret_key
        self.secure = secure

    def client(self):
        return Minio(self.host, self.access_key, self.secret_key,
                     secure=self.secure)

我的上传代码:

import pandas as pd
import io
from fastparquet import write

import MinioConn

filename = 'myfile.parquet'
# ---
df = pd.DataFrame(data=[['tom', 10], ['nick', 15], ['juli', 14]],
                  columns=['Name', 'Age'])
df.to_parquet(filename)
# ---

data = pd.read_parquet(filename)

bytes = data.encode('utf-8')
buffer = io.BytesIO(bytes)

bucket = 'synthetic-data-gen'

client = MinioConn().client()
client.put_object(bucket,
                f'foo/bar/{filename}',
                data=buffer,
                length=len(bytes),
                content_type='application/{}'.format(filename.split('.', 1)[1]))

回溯:

Traceback (most recent call last):
  File "test.py", line 16, in <module>
    bytes = data.encode('utf-8')
  File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/pandas/core/generic.py", line 5139, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'encode'

看起来您的帖子主要是代码;请添加更多详细信息。

Goal: Upload a Parquet file to MinIO - this requires converting the file to Bytes.

I've been able to do this for .csv, .json and .txt:

bytes = data.to_csv().encode('utf-8')
bytes = json.dumps(self.data, indent=4, separators=(',', ': ')).encode('utf-8')
bytes = data.encode('utf-8')

MinioConn:

from minio import Minio


class MinioConn:
    def __init__(self,
                 host='foo.com:9000',
                 access_key='CENSORED', secret_key='CENSORED',
                 secure=False):
        self.host = host
        self.access_key = access_key
        self.secret_key = secret_key
        self.secure = secure

    def client(self):
        return Minio(self.host, self.access_key, self.secret_key,
                     secure=self.secure)

My Upload Code:

import pandas as pd
import io
from fastparquet import write

import MinioConn

filename = 'myfile.parquet'
# ---
df = pd.DataFrame(data=[['tom', 10], ['nick', 15], ['juli', 14]],
                  columns=['Name', 'Age'])
df.to_parquet(filename)
# ---

data = pd.read_parquet(filename)

bytes = data.encode('utf-8')
buffer = io.BytesIO(bytes)

bucket = 'synthetic-data-gen'

client = MinioConn().client()
client.put_object(bucket,
                f'foo/bar/{filename}',
                data=buffer,
                length=len(bytes),
                content_type='application/{}'.format(filename.split('.', 1)[1]))

Traceback:

Traceback (most recent call last):
  File "test.py", line 16, in <module>
    bytes = data.encode('utf-8')
  File "/home/me/miniconda3/envs/sdg/lib/python3.8/site-packages/pandas/core/generic.py", line 5139, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'encode'

It looks like your post is mostly code; please add some more details.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

む无字情书 2025-01-20 00:05:04

如果您不指定文件名pandas.to_parquet,它将返回字节。

bytes_data = df.to_parquet()
buffer = io.BytesIO(bytes_data)

对于旧版本的 pandas:

buffer = io.BytesIO()
bytes_data = df.to_parquet(buffer)
buffer.seek(0)

If you don't specify a filename, pandas.to_parquet, it will return bytes.

bytes_data = df.to_parquet()
buffer = io.BytesIO(bytes_data)

For older version of pandas:

buffer = io.BytesIO()
bytes_data = df.to_parquet(buffer)
buffer.seek(0)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文