AWS boto迭代器返回字节而不是字符串:_csv.Error:迭代器应返回字符串,而不是字符(您是否在文本模式下打开文件?)
我们在S3存储桶上有一个大的.CSV文件。我们想将其阅读到一条词典中,以通过行处理。 Botocore.Response.StreamingBody提供了一个迭代器,您可以使用iter_lines()获得。但是,它返回字节,而不是字符串,这是CSV.Dictreader所期望的。这引发了以下错误:
Traceback (most recent call last):
File "s3_iter_alternate.py", line 30, in process_file
for row in csv_reader:
File "C:\Users\Stan\AppData\Local\Programs\Python\Python38-32\lib\csv.py", line 111, in __next__
row = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
做到这一点的正确方法是什么?我是Python的新手。我的代码如下:
s3_resource = boto3.resource('s3', aws_access_key_id = ACCESS_KEY,
aws_secret_access_key = SECRET_KEY)
s3_object = s3_resource.Object(bucket_name=BUCKET_NAME, key=OBJECT_KEY)
resp = s3_object.get(Range=f'bytes={offset}-')
body: botocore.response.StreamingBody = resp['Body']
csv_reader = csv.DictReader(body.iter_lines(chunk_size=1024), fieldnames=FIELDNAMES)
for row in csv_reader:
print('Processing: ' + str(row)) #process here
return
We have a large .csv file on an S3 bucket. We want to read it into a dictionary for processing line by line. botocore.response.StreamingBody provides an iterator that you can get with iter_lines(). However, it returns bytes, not strings, which is expected by the csv.DictReader. This throws the following error:
Traceback (most recent call last):
File "s3_iter_alternate.py", line 30, in process_file
for row in csv_reader:
File "C:\Users\Stan\AppData\Local\Programs\Python\Python38-32\lib\csv.py", line 111, in __next__
row = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
What is the correct way to do this? I am a novice to Python. My code is below:
s3_resource = boto3.resource('s3', aws_access_key_id = ACCESS_KEY,
aws_secret_access_key = SECRET_KEY)
s3_object = s3_resource.Object(bucket_name=BUCKET_NAME, key=OBJECT_KEY)
resp = s3_object.get(Range=f'bytes={offset}-')
body: botocore.response.StreamingBody = resp['Body']
csv_reader = csv.DictReader(body.iter_lines(chunk_size=1024), fieldnames=FIELDNAMES)
for row in csv_reader:
print('Processing: ' + str(row)) #process here
return
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过用
str
包装iter_lines
,在 Python 3 中为我工作。试试这个 -iter_lines
返回字节。将其转换为文本字符串似乎可以解决问题。Worked for me in Python 3 by wrapping
iter_lines
withstr
. Try this -iter_lines
returns bytes. Converting it to text string seems to do the trick.