如何使用 pyarrow.csv.read_csv 从文件系统读取文件?
我想用 pyarrow 读取 google 存储桶中的单个 CSV 文件。我该怎么做?
我可以使用 gcsfs 创建一个 FileSystem 对象,但我没有找到将其提供给 pyarrow.csv.read_csv 的方法。
我是否需要从文件系统创建某种文件流?最好的方法是什么?
import gcsfs
import pyarrow.csv as csv
fs = gcsfs.GCSFileSystem(project='foo')
csv.read_csv("bucket/foo/bar.csv", filesystem=fs)
TypeError: read_csv() got an unexpected keyword argument 'filesystem'
使用 pyarrow 版本 6.0.1
I want to read a single CSV file in a google bucket with pyarrow. How do I do this?
I can create a FileSystem object with gcsfs
, but I don't see a way to provide this to pyarrow.csv.read_csv
.
Do I need to create some sort of file stream from the file system? What's the best way to do this?
import gcsfs
import pyarrow.csv as csv
fs = gcsfs.GCSFileSystem(project='foo')
csv.read_csv("bucket/foo/bar.csv", filesystem=fs)
TypeError: read_csv() got an unexpected keyword argument 'filesystem'
Using pyarrow version 6.0.1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我猜您正在使用 您是对的,其中列出的方法不适用于
read_csv
,因为没有filesystem
参数。通常我们仍然可以这样做,但过程有点不同。Pyarrow 有自己的文件系统抽象。如果您有 pyarrow 文件系统,那么您可以首先打开一个文件,然后使用该文件读取 CSV:
不幸的是,
gcsfs.GCSFileSystem
不是“pyarrow 文件系统”,但您有一些选择。gcsfs.GCSFileSystem.open
方法可以为您提供一个“python 文件对象”,您可以将其用作pyarrow.csv.read_csv
的输入。I'm guessing you are working with this doc. You're correct that the approach listed there does not work with
read_csv
because there is nofilesystem
parameter. We can still generally do this but the process is a bit different.Pyarrow has its own filesystem abstraction. If you have a pyarrow filesystem then you can first open a file and then use that file to read the CSV:
Unfortunately, a
gcsfs.GCSFileSystem
is not a "pyarrow filesystem" but you have a few options.The method
gcsfs.GCSFileSystem.open
can give you a "python file object" which you can use as input topyarrow.csv.read_csv
.