无法使用 pd.read_parquet 读取镶木地板

发布于 2025-01-17 13:35:44 字数 3352 浏览 0 评论 0原文

我刚刚更新了所有 conda 环境(pandas 1.4.1),并且面临 pandas read_parquet 函数的问题。

parquet_file = r'F:\Python Scripts\my_file.parquet'
file= pd.read_parquet(path = parquet_file)

它会生成以下错误:

file= pd.read_parquet(parquet_file)
Traceback (most recent call last):

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\api.py:135 in _parse_header
    fmd = read_thrift(f, parquet_thrift.FileMetaData)

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\thrift_structures.py:25 in read_thrift
    obj.read(pin)

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\parquet_thrift\parquet\ttypes.py:1929 in read
    iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])

TypeError: got wrong ttype while reading field


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  Input In [14] in <cell line: 1>
    whole_df42= pd.read_parquet(parquet_file)

  File C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parquet.py:493 in read_parquet
    return impl.read(

  File C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parquet.py:345 in read
    parquet_file = self.api.ParquetFile(path, **parquet_kwargs)

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\api.py:100 in __init__
    self._parse_header(fn, verify)

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\api.py:138 in _parse_header
    self.fn)

AttributeError: 'ParquetFile' object has no attribute 'fn'

我尝试过不同的镶木地板文件,但没有任何结果。

我尝试重新创建镶木地板,只需使用:

df.to_parquet(parquet_file)

from a DataFrame 但又出现了另一个错误。

      Traceback (most recent call last):
      File F:\Python Scripts\My_modules\io\df.py:86 in dataframe_to_parquet
        df.to_parquet(full_path, **kwargs)
    
      File C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py:207 in wrapper
        return func(*args, **kwargs)
    
      File C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:2835 in to_parquet
        return to_parquet(
    
      File C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parquet.py:420 in to_parquet
        impl.write(
    
      File C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parquet.py:301 in write
        self.api.write(
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\writer.py:938 in write
        write_simple(filename, data, fmd, row_group_offsets,
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\writer.py:802 in write_simple
        rg = make_row_group(f, data[start:end], fmd.schema,
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\writer.py:671 in make_row_group
        chunk = write_column(f, data[column.name], column,
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\writer.py:589 in write_column
        bdata = compress_data(bdata, compression)
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\compression.py:113 in compress_data
        raise RuntimeError("Compression '%s' not available.  Options: %s" %
    
    RuntimeError: Compression 'snappy' not available.  Options: ['BROTLI', 'GZIP', 'UNCOMPRESSED']

我已经使用了这些代码行数十次,我认为这是与新版本有关的问题。有人有解决方法或可以帮助我了解问题所在吗?

谢谢。

I've just updated all my conda environments (pandas 1.4.1) and I'm facing a problem with pandas read_parquet function.

parquet_file = r'F:\Python Scripts\my_file.parquet'
file= pd.read_parquet(path = parquet_file)

it generate the following error:

file= pd.read_parquet(parquet_file)
Traceback (most recent call last):

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\api.py:135 in _parse_header
    fmd = read_thrift(f, parquet_thrift.FileMetaData)

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\thrift_structures.py:25 in read_thrift
    obj.read(pin)

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\parquet_thrift\parquet\ttypes.py:1929 in read
    iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])

TypeError: got wrong ttype while reading field


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  Input In [14] in <cell line: 1>
    whole_df42= pd.read_parquet(parquet_file)

  File C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parquet.py:493 in read_parquet
    return impl.read(

  File C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parquet.py:345 in read
    parquet_file = self.api.ParquetFile(path, **parquet_kwargs)

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\api.py:100 in __init__
    self._parse_header(fn, verify)

  File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\api.py:138 in _parse_header
    self.fn)

AttributeError: 'ParquetFile' object has no attribute 'fn'

I've tried with different parquet files and nothing.

I've tried to recreate the parquet, simply with:

df.to_parquet(parquet_file)

from a DataFrame but another error rose.

      Traceback (most recent call last):
      File F:\Python Scripts\My_modules\io\df.py:86 in dataframe_to_parquet
        df.to_parquet(full_path, **kwargs)
    
      File C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py:207 in wrapper
        return func(*args, **kwargs)
    
      File C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:2835 in to_parquet
        return to_parquet(
    
      File C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parquet.py:420 in to_parquet
        impl.write(
    
      File C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parquet.py:301 in write
        self.api.write(
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\writer.py:938 in write
        write_simple(filename, data, fmd, row_group_offsets,
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\writer.py:802 in write_simple
        rg = make_row_group(f, data[start:end], fmd.schema,
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\writer.py:671 in make_row_group
        chunk = write_column(f, data[column.name], column,
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\writer.py:589 in write_column
        bdata = compress_data(bdata, compression)
    
      File C:\ProgramData\Anaconda3\lib\site-packages\fastparquet\compression.py:113 in compress_data
        raise RuntimeError("Compression '%s' not available.  Options: %s" %
    
    RuntimeError: Compression 'snappy' not available.  Options: ['BROTLI', 'GZIP', 'UNCOMPRESSED']

I've used these lines of code dozens of times, I think it is an issue connected with the new release. Is there someone who has a workaround or that could help me understand what is the problem?

thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文