AWS boto迭代器返回字节而不是字符串:_csv.Error:迭代器应返回字符串,而不是字符(您是否在文本模式下打开文件?)

发布于 2025-01-19 22:58:44 字数 1092 浏览 2 评论 0原文

我们在S3存储桶上有一个大的.CSV文件。我们想将其阅读到一条词典中,以通过行处理。 Botocore.Response.StreamingBody提供了一个迭代器,您可以使用iter_lines()获得。但是,它返回字节,而不是字符串,这是CSV.Dictreader所期望的。这引发了以下错误:

Traceback (most recent call last):
  File "s3_iter_alternate.py", line 30, in process_file
    for row in csv_reader:
  File "C:\Users\Stan\AppData\Local\Programs\Python\Python38-32\lib\csv.py", line 111, in __next__
    row = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

做到这一点的正确方法是什么?我是Python的新手。我的代码如下:

    s3_resource = boto3.resource('s3', aws_access_key_id = ACCESS_KEY,
                                 aws_secret_access_key = SECRET_KEY)
    s3_object = s3_resource.Object(bucket_name=BUCKET_NAME, key=OBJECT_KEY)

    resp = s3_object.get(Range=f'bytes={offset}-')
    body: botocore.response.StreamingBody = resp['Body']

    csv_reader = csv.DictReader(body.iter_lines(chunk_size=1024), fieldnames=FIELDNAMES)
    for row in csv_reader:
        print('Processing: ' + str(row)) #process here
    return

We have a large .csv file on an S3 bucket. We want to read it into a dictionary for processing line by line. botocore.response.StreamingBody provides an iterator that you can get with iter_lines(). However, it returns bytes, not strings, which is expected by the csv.DictReader. This throws the following error:

Traceback (most recent call last):
  File "s3_iter_alternate.py", line 30, in process_file
    for row in csv_reader:
  File "C:\Users\Stan\AppData\Local\Programs\Python\Python38-32\lib\csv.py", line 111, in __next__
    row = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

What is the correct way to do this? I am a novice to Python. My code is below:

    s3_resource = boto3.resource('s3', aws_access_key_id = ACCESS_KEY,
                                 aws_secret_access_key = SECRET_KEY)
    s3_object = s3_resource.Object(bucket_name=BUCKET_NAME, key=OBJECT_KEY)

    resp = s3_object.get(Range=f'bytes={offset}-')
    body: botocore.response.StreamingBody = resp['Body']

    csv_reader = csv.DictReader(body.iter_lines(chunk_size=1024), fieldnames=FIELDNAMES)
    for row in csv_reader:
        print('Processing: ' + str(row)) #process here
    return

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

最单纯的乌龟 2025-01-26 22:58:44

通过用 str 包装 iter_lines,在 Python 3 中为我工作。试试这个 -

csv_reader = csv.DictReader(str(body.iter_lines(chunk_size=1024)), fieldnames=FIELDNAMES)

iter_lines 返回字节。将其转换为文本字符串似乎可以解决问题。

Worked for me in Python 3 by wrapping iter_lines with str. Try this -

csv_reader = csv.DictReader(str(body.iter_lines(chunk_size=1024)), fieldnames=FIELDNAMES)

iter_lines returns bytes. Converting it to text string seems to do the trick.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文