如何将 zip 文件中的文件读取为文本而不是字节?

发布于 2024-10-31 13:02:08 字数 1109 浏览 2 评论 0原文

一个用于读取 ZIP 存档内的 CSV 文件的简单程序:

import csv, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
    
for row in csv.DictReader(items_file):
    pass

适用于 Python 2.7:

$ python2.7 test_zip_file_py3k.py ~/data.zip
$

但不适用于 Python 3.2:

$ python3.2 test_zip_file_py3k.py ~/data.zip
Traceback (most recent call last):
    File "test_zip_file_py3k.py", line 8, in <module>
    for row in csv.DictReader(items_file):
    File "/somedir/python3.2/csv.py", line 109, in __next__
    self.fieldnames
    File "/somedir/python3.2/csv.py", line 96, in fieldnames
    self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file 
in text mode?)

Python 3 中的 csv 模块想要查看文本文件,但是 zipfile。 ZipFile.open 返回始终被视为二进制数据的 zipfile.ZipExtFile

如何在 Python 3 中实现这一功能?

A simple program for reading a CSV file inside a ZIP archive:

import csv, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
    
for row in csv.DictReader(items_file):
    pass

works in Python 2.7:

$ python2.7 test_zip_file_py3k.py ~/data.zip
$

but not in Python 3.2:

$ python3.2 test_zip_file_py3k.py ~/data.zip
Traceback (most recent call last):
    File "test_zip_file_py3k.py", line 8, in <module>
    for row in csv.DictReader(items_file):
    File "/somedir/python3.2/csv.py", line 109, in __next__
    self.fieldnames
    File "/somedir/python3.2/csv.py", line 96, in fieldnames
    self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file 
in text mode?)

The csv module in Python 3 wants to see a text file, but zipfile.ZipFile.open returns a zipfile.ZipExtFile that is always treated as binary data.

How does one make this work in Python 3?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

何其悲哀 2024-11-07 13:02:08

我刚刚注意到 Lennart 的答案不适用于 Python 3.1,但它可以适用于 Python 3.2。他们增强了 zipfile.ZipExtFile 在 Python 3.2 中(请参阅发行说明)。这些更改似乎使 zipfile.ZipExtFile 与 io.TextWrapper

顺便说一句,它可以在 Python 3.1 中工作,如果你取消注释下面的 hacky 行来猴子补丁 zipfile.ZipExtFile,并不是说我会推荐这种 hackery。我包含它只是为了说明 Python 3.2 中为使事情顺利运行而所做的工作的本质。

$ cat test_zip_file_py3k.py 
import csv, io, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
# items_file.readable = lambda: True
# items_file.writable = lambda: False
# items_file.seekable = lambda: False
# items_file.read1 = items_file.read
items_file  = io.TextIOWrapper(items_file)
    
for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0} -- row = {1}'.format(idx, row))

如果我必须支持 py3k < 3.2,那么我会采用 我的其他答案

3.6+ 更新

从 3.6 开始,删除了对 mode='U' 的支持^1

版本 3.6 中的更改:删除了对 mode='U' 的支持。使用 io.TextIOWrapper 以通用换行模式读取压缩文本文件。

从 3.8 开始,添加了 Path 对象,它给出了我们可以像内置的 open() 函数一样调用 open() 方法(在以下情况下传递 newline=''我们的 CSV),我们得到 csv 阅读器接受的 io.TextIOWrapper 对象。请参阅尤里的回答,此处

I just noticed that Lennart's answer didn't work with Python 3.1, but it does work with Python 3.2. They've enhanced zipfile.ZipExtFile in Python 3.2 (see release notes). These changes appear to make zipfile.ZipExtFile work nicely with io.TextWrapper.

Incidentally, it works in Python 3.1, if you uncomment the hacky lines below to monkey-patch zipfile.ZipExtFile, not that I would recommend this sort of hackery. I include it only to illustrate the essence of what was done in Python 3.2 to make things work nicely.

$ cat test_zip_file_py3k.py 
import csv, io, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
# items_file.readable = lambda: True
# items_file.writable = lambda: False
# items_file.seekable = lambda: False
# items_file.read1 = items_file.read
items_file  = io.TextIOWrapper(items_file)
    
for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0} -- row = {1}'.format(idx, row))

If I had to support py3k < 3.2, then I would go with the solution in my other answer.

Update for 3.6+

Starting w/3.6, support for mode='U' was removed^1:

Changed in version 3.6: Removed support of mode='U'. Use io.TextIOWrapper for reading compressed text files in universal newlines mode.

Starting w/3.8, a Path object was added which gives us an open() method that we can call like the built-in open() function (passing newline='' in the case of our CSV) and we get back an io.TextIOWrapper object the csv readers accept. See Yuri's answer, here.

橙幽之幻 2024-11-07 13:02:08

您可以将其包装在 io.TextIOWrapper 中。

items_file  = io.TextIOWrapper(items_file, encoding='your-encoding', newline='')

应该有效。

You can wrap it in a io.TextIOWrapper.

items_file  = io.TextIOWrapper(items_file, encoding='your-encoding', newline='')

Should work.

南笙 2024-11-07 13:02:08

如果您只想将文件读入字符串:

with ZipFile('spam.zip') as myzip:
    with myzip.open('eggs.txt') as myfile:
       eggs = myfile.read().decode('UTF-8'))

And if you just like to read a file into a string:

with ZipFile('spam.zip') as myzip:
    with myzip.open('eggs.txt') as myfile:
       eggs = myfile.read().decode('UTF-8'))
柠檬色的秋千 2024-11-07 13:02:08

Lennart 的答案是在正确的轨道上(谢谢,Lennart,我投票赞成你的答案)并且它几乎有效:

$ cat test_zip_file_py3k.py 
import csv, io, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
items_file  = io.TextIOWrapper(items_file, encoding='iso-8859-1', newline='')

for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0}'.format(idx))

$ python3.1 test_zip_file_py3k.py ~/data.zip
Traceback (most recent call last):
  File "test_zip_file_py3k.py", line 7, in <module>
    items_file  = io.TextIOWrapper(items_file, 
                                   encoding='iso-8859-1', 
                                   newline='')
AttributeError: readable

问题出现io.TextWrapper 的第一个必需参数是 <强>缓冲;不是文件对象。

这似乎有效:

items_file  = io.TextIOWrapper(io.BytesIO(items_file.read()))

这似乎有点复杂,而且必须将整个(可能很大)zip 文件读入内存似乎很烦人。还有更好的办法吗?

这是在行动中:

$ cat test_zip_file_py3k.py 
import csv, io, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
items_file  = io.TextIOWrapper(io.BytesIO(items_file.read()))

for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0}'.format(idx))

$ python3.1 test_zip_file_py3k.py ~/data.zip
Processing row 0
Processing row 1
Processing row 2
...
Processing row 250

Lennart's answer is on the right track (Thanks, Lennart, I voted up your answer) and it almost works:

$ cat test_zip_file_py3k.py 
import csv, io, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
items_file  = io.TextIOWrapper(items_file, encoding='iso-8859-1', newline='')

for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0}'.format(idx))

$ python3.1 test_zip_file_py3k.py ~/data.zip
Traceback (most recent call last):
  File "test_zip_file_py3k.py", line 7, in <module>
    items_file  = io.TextIOWrapper(items_file, 
                                   encoding='iso-8859-1', 
                                   newline='')
AttributeError: readable

The problem appears to be that io.TextWrapper's first required parameter is a buffer; not a file object.

This appears to work:

items_file  = io.TextIOWrapper(io.BytesIO(items_file.read()))

This seems a little complex and also it seems annoying to have to read in a whole (perhaps huge) zip file into memory. Any better way?

Here it is in action:

$ cat test_zip_file_py3k.py 
import csv, io, sys, zipfile

zip_file    = zipfile.ZipFile(sys.argv[1])
items_file  = zip_file.open('items.csv', 'rU')
items_file  = io.TextIOWrapper(io.BytesIO(items_file.read()))

for idx, row in enumerate(csv.DictReader(items_file)):
    print('Processing row {0}'.format(idx))

$ python3.1 test_zip_file_py3k.py ~/data.zip
Processing row 0
Processing row 1
Processing row 2
...
Processing row 250
海拔太高太耀眼 2024-11-07 13:02:08

从 Python 3.8 开始,zipfile 模块具有 Path 对象 ,我们可以将其与其 open() 方法一起使用来获取 io.TextIOWrapper 对象,该对象可以传递给 csv 读取器:

import csv, sys, zipfile

# Give a string path to the ZIP archive, and
# the archived file to read from 
items_zipf = zipfile.Path(sys.argv[1], at='items.csv')

# Then use the open method, like you'd usually
# use the built-in open()
items_f = items_zipf.open(newline='')

# Pass the TextIO-like file to your reader as normal
for row in csv.DictReader(items_f):
    print(row)

Starting with Python 3.8, the zipfile module has the Path object, which we can use with its open() method to get an io.TextIOWrapper object, which can be passed to the csv readers:

import csv, sys, zipfile

# Give a string path to the ZIP archive, and
# the archived file to read from 
items_zipf = zipfile.Path(sys.argv[1], at='items.csv')

# Then use the open method, like you'd usually
# use the built-in open()
items_f = items_zipf.open(newline='')

# Pass the TextIO-like file to your reader as normal
for row in csv.DictReader(items_f):
    print(row)
终弃我 2024-11-07 13:02:08

这是打开 zip 文件并读取该 zip 内的文本文件的最小方法。我发现窍门是 TextIOWrapper read() 方法,上面的任何答案中都没有提到(上面提到了 BytesIO.read(),但 Python 文档推荐 TextIOWrapper)。

import zipfile
import io

# Create the ZipFile object
zf = zipfile.ZipFile('my_zip_file.zip')

# Read a file that is inside the zip...reads it as a binary file-like object
my_file_binary = zf.open('my_text_file_inside_zip.txt')

# Convert the binary file-like object directly to text using TextIOWrapper and it's read() method
my_file_text  = io.TextIOWrapper(my_file_binary, encoding='utf-8', newline='').read()

我希望他们在 ZipFile open() 方法中保留 mode='U' 参数来执行相同的操作,因为这非常简洁,但是,唉,这已经过时了。

Here's a minimal recipe to open a zip file and read a text file inside that zip. I found the trick to be the TextIOWrapper read() method, not mentioned in any answers above (BytesIO.read() was mentioned above, but Python docs recommend TextIOWrapper).

import zipfile
import io

# Create the ZipFile object
zf = zipfile.ZipFile('my_zip_file.zip')

# Read a file that is inside the zip...reads it as a binary file-like object
my_file_binary = zf.open('my_text_file_inside_zip.txt')

# Convert the binary file-like object directly to text using TextIOWrapper and it's read() method
my_file_text  = io.TextIOWrapper(my_file_binary, encoding='utf-8', newline='').read()

I wish they kept the mode='U' parameter in the ZipFile open() method to do this same thing since that was so succinct but, alas, that is obsolete.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文