pandas to_parquet因GZIP而失败

发布于 2025-01-29 06:41:01 字数 633 浏览 2 评论 0原文

我有一个有关pandas pd.to_parquet功能的问题，compression ='gzip'选项。 GZIP实用程序未识别使用此选项创建的文件。我正在运行AWSS EC2实例，并具有深度学习基础AMI（Ubuntu 18.04）版本53 Python 3.6.9，Pandas 1.1.5。

保存的文件

df.to_parquet(path,  engine='pyarrow', compression='gzip')

具有6159个字节。

就会抛出一个错误“不用gzip格式”

保存的文件上使用gzip，

df.to_parquet(path,  engine='pyarrow', compression=None)

gzip -dv如果我在没有压缩的情况下 1511字节的大小，不用说，gzip -dv工作正常并还原文件。

我试图谷歌搜索，但什么也没想到。任何帮助都将受到赞赏

原文

I have a question regarding pandas pd.to_parquet function with the compression = 'gzip' option. Files created with this option are not recognized by gzip utility.
I'm running AWSs ec2 instance with Deep Learning Base AMI (Ubuntu 18.04) Version 53
python 3.6.9, pandas 1.1.5.

A file saved with

df.to_parquet(path,  engine='pyarrow', compression='gzip')

has the size 6159 bytes.

gzip -dv throws an error " not in gzip format"

If I used gzip on the file saved without compression, i.e. first run

df.to_parquet(path,  engine='pyarrow', compression=None)

and then gzip the .parquet file, the resulting .parquet.gz file has the size of 1511 bytes, and, needless to say, gzip -dv works just fine and restores the file.

I tried to google it, but came up with nothing.
Any help is appreciated

分享到QQ

分享到微博