阅读木薯片到熊猫filenotfounderror

发布于 2025-01-30 14:37:30 字数 2036 浏览 2 评论 0原文

我的代码如下，运行良好。数据

April_data = sc.read.parquet('somepath/data.parquet')
type(April_data)
pyspark.sql.dataframe.DataFrame

为火花

df_pp = pd.read_parquet('somepath/data.parquet')

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_4244/1910461502.py in <module>
----> 1 df_pp = pd.read_parquet('somepath/data.parquet')

/usr/local/anaconda//parquet.py in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, **kwargs)
    498         storage_options=storage_options,
    499         use_nullable_dtypes=use_nullable_dtypes,
--> 500         **kwargs,
    501     )

/usr/local/anaconda//io/parquet.py in read(self, path, columns, use_nullable_dtypes, storage_options, **kwargs)
    234             kwargs.pop("filesystem", None),
    235             storage_options=storage_options,
--> 236             mode="rb",
    237         )
    238         try:

/usr/local/anaconda/parquet.py in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
    100         # this branch is used for example when reading from non-fsspec URLs
    101         handles = get_handle(
--> 102             path_or_handle, mode, is_text=False, storage_options=storage_options
    103         )
    104         fs = None

/usr/local/anaconda/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    709         else:
    710             # Binary mode
--> 711             handle = open(handle, ioargs.mode)
    712         handles.append(handle)
    713 

FileNotFoundError: [Errno 2] No such file or directory: 'somepath/data.parquet'

读

!pip install fastparquet
Successfully installed cramjam-2.5.0 fastparquet-0.8.1

它

框当我这样做时，我可以

hdfs_location = 'somepath/'
!hdfs dfs -ls $hdfs_location

在同一文件中运行所有这些代码

原文

I have code as below and it runs fine. It reads as a spark dataframe

April_data = sc.read.parquet('somepath/data.parquet')
type(April_data)
pyspark.sql.dataframe.DataFrame

But when I try to read as a pandas df I get error

df_pp = pd.read_parquet('somepath/data.parquet')

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_4244/1910461502.py in <module>
----> 1 df_pp = pd.read_parquet('somepath/data.parquet')

/usr/local/anaconda//parquet.py in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, **kwargs)
    498         storage_options=storage_options,
    499         use_nullable_dtypes=use_nullable_dtypes,
--> 500         **kwargs,
    501     )

/usr/local/anaconda//io/parquet.py in read(self, path, columns, use_nullable_dtypes, storage_options, **kwargs)
    234             kwargs.pop("filesystem", None),
    235             storage_options=storage_options,
--> 236             mode="rb",
    237         )
    238         try:

/usr/local/anaconda/parquet.py in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
    100         # this branch is used for example when reading from non-fsspec URLs
    101         handles = get_handle(
--> 102             path_or_handle, mode, is_text=False, storage_options=storage_options
    103         )
    104         fs = None

/usr/local/anaconda/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    709         else:
    710             # Binary mode
--> 711             handle = open(handle, ioargs.mode)
    712         handles.append(handle)
    713 

FileNotFoundError: [Errno 2] No such file or directory: 'somepath/data.parquet'

I have installed fastparquet package as below

!pip install fastparquet
Successfully installed cramjam-2.5.0 fastparquet-0.8.1

# udpate 1

the file is located in HDFS and I can see the file when I do

hdfs_location = 'somepath/'
!hdfs dfs -ls $hdfs_location

I am running all this code in the same file

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小情绪 2025-02-06 14:37:30

每个文档，与其他兄弟姐妹IO模块类似，不支持HDFS位置的阅读。虽然有，它没有读取镶木木材或其他已知格式。

对于Read_Parquet中的字符串值，当前支持CPU文件路径或仅在线方案（HTTP，FTP）和两个特定的存储路径（Amazon S3存储桶，Google Cloud Storage或GS）。

但是，您可以通过类似文件的对象。因此，请考虑阅读所需的镶木quet文件并传递内容。以下是使用各种HDFS软件包的示例：

from hdfs import Client

with client.read('somepath/data.parquet') as f: 
    df_pp = pd.read_parquet(f.read())

from hdfs3 import HDFileSystem 
hdfs = HDFileSystem(host='localhost', port=8020)

with hdfs.open('somepath/data.parquet') as f: 
    df_pp = pd.read_parquet(f)

另外， > 支持转换为熊猫数据框架：

from fastparquet import ParquetFile

pf = ParquetFile('somepath/data.parquet') 
df = pf.to_pandas()

Per docs, pandas.read_parquet, similar to other sibling IO modules, does not support reading from HDFS locations. While there is read_hdf, it does not read parquet or other known formats.

For string values in read_parquet, CPU file paths or only online schemes (http, ftp) and two specific storage paths (Amazon S3 buckets, Google Cloud Storage or GS) are currently supported.

However, you can pass file-like objects. So consider reading the needed parquet file and pass content. Below are examples using various HDFS packages:

from hdfs import Client

with client.read('somepath/data.parquet') as f: 
    df_pp = pd.read_parquet(f.read())

from hdfs3 import HDFileSystem 
hdfs = HDFileSystem(host='localhost', port=8020)

with hdfs.open('somepath/data.parquet') as f: 
    df_pp = pd.read_parquet(f)

Also, fastparquet supports conversion to pandas data frame:

from fastparquet import ParquetFile

pf = ParquetFile('somepath/data.parquet') 
df = pf.to_pandas()

回复收藏 0 原文

~没有更多了~

关于作者

混浊又暗下来

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

阅读木薯片到熊猫filenotfounderror

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

阅读木薯片到熊猫filenotfounderror

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。