如何使用 pyarrow.csv.read_csv 从文件系统读取文件?

发布于 2025-01-15 05:03:21 字数 439 浏览 3 评论 0原文

我想用 pyarrow 读取 google 存储桶中的单个 CSV 文件。我该怎么做?

我可以使用 gcsfs 创建一个 FileSystem 对象,但我没有找到将其提供给 pyarrow.csv.read_csv 的方法。

我是否需要从文件系统创建某种文件流?最好的方法是什么?

import gcsfs
import pyarrow.csv as csv

fs = gcsfs.GCSFileSystem(project='foo')

csv.read_csv("bucket/foo/bar.csv", filesystem=fs)
TypeError: read_csv() got an unexpected keyword argument 'filesystem'

使用 pyarrow 版本 6.0.1

I want to read a single CSV file in a google bucket with pyarrow. How do I do this?

I can create a FileSystem object with gcsfs, but I don't see a way to provide this to pyarrow.csv.read_csv.

Do I need to create some sort of file stream from the file system? What's the best way to do this?

import gcsfs
import pyarrow.csv as csv

fs = gcsfs.GCSFileSystem(project='foo')

csv.read_csv("bucket/foo/bar.csv", filesystem=fs)
TypeError: read_csv() got an unexpected keyword argument 'filesystem'

Using pyarrow version 6.0.1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

恋竹姑娘 2025-01-22 05:03:21

我猜您正在使用 您是对的,其中列出的方法不适用于 read_csv,因为没有 filesystem 参数。通常我们仍然可以这样做,但过程有点不同。

Pyarrow 有自己的文件系统抽象。如果您有 pyarrow 文件系统,那么您可以首先打开一个文件,然后使用该文件读取 CSV:

import pyarrow as pa
import pyarrow.csv as csv
import pyarrow.fs as fs

local_fs = fs.LocalFileSystem()
with local_fs.open_input_file('foo/bar.csv') as csv_file:
    csv.read_csv(csv_file)

不幸的是,gcsfs.GCSFileSystem 不是“pyarrow 文件系统”,但您有一些选择。

gcsfs.GCSFileSystem.open 方法可以为您提供一个“python 文件对象”,您可以将其用作 pyarrow.csv.read_csv 的输入。

import gcsfs
import pyarrow.csv as csv

fs = gcsfs.GCSFileSystem(project='foo')
with fs.open("bucket/foo/bar.csv", 'rb') as csv_file:
    csv.read_csv(csv_file)

I'm guessing you are working with this doc. You're correct that the approach listed there does not work with read_csv because there is no filesystem parameter. We can still generally do this but the process is a bit different.

Pyarrow has its own filesystem abstraction. If you have a pyarrow filesystem then you can first open a file and then use that file to read the CSV:

import pyarrow as pa
import pyarrow.csv as csv
import pyarrow.fs as fs

local_fs = fs.LocalFileSystem()
with local_fs.open_input_file('foo/bar.csv') as csv_file:
    csv.read_csv(csv_file)

Unfortunately, a gcsfs.GCSFileSystem is not a "pyarrow filesystem" but you have a few options.

The method gcsfs.GCSFileSystem.open can give you a "python file object" which you can use as input to pyarrow.csv.read_csv.

import gcsfs
import pyarrow.csv as csv

fs = gcsfs.GCSFileSystem(project='foo')
with fs.open("bucket/foo/bar.csv", 'rb') as csv_file:
    csv.read_csv(csv_file)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文