使用Python下载并解压文件
我正在尝试下载并打开压缩文件,但在使用 zipfile 的文件类型句柄时似乎遇到问题。运行此命令时,我收到错误“AttributeError:addinfourl 实例没有属性‘seek’”:
import zipfile
import urllib2
def download(url,directory,name):
webfile = urllib2.urlopen('http://www.sec.gov'+url)
webfile2 = zipfile.ZipFile(webfile)
content = zipfile.ZipFile.open(webfile2).read()
localfile = open(directory+name, 'w')
localfile.write(content)
localfile.close()
return()
download(link.get("href"),'./fails_data', link.text)
I am trying to download and open a zipped file and seem to be having trouble using a file type handle with zipfile. I'm getting the error "AttributeError: addinfourl instance has no attribute 'seek'" when running this:
import zipfile
import urllib2
def download(url,directory,name):
webfile = urllib2.urlopen('http://www.sec.gov'+url)
webfile2 = zipfile.ZipFile(webfile)
content = zipfile.ZipFile.open(webfile2).read()
localfile = open(directory+name, 'w')
localfile.write(content)
localfile.close()
return()
download(link.get("href"),'./fails_data', link.text)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
总而言之,以下代码从网站检索压缩文件中第一个文件的内容:
Putting things together, the following retrieves the content of the first file within a zipped file from a website:
截至 2020 年,您可以使用 dload 下载并解压缩文件,即:
默认情况下,它会解压到脚本路径上带有 zip 文件名的目录,但您可以指定解压位置:
使用
pip install dload
安装As of 2020, you can use dload to download and unzip a file, i.e.:
By default it extracts to a dir on the script path with the zip file name, but you can specify the extract location:
install using
pip install dload
您无法在
urllib2.urlopen
ed 文件上进行搜索。这里列出了它支持的方法: http://docs.python.org/library/urllib.html# urllib.urlopen。您必须检索该文件(可能使用 urllib.urlretrieve,http: //docs.python.org/library/urllib.html#urllib.urlretrieve),然后使用
zipfile
。或者,您可以
read()
urlopened 文件,然后将其放入StringIO
,然后使用zipfile
就这一点而言,如果您想要内存中的压缩数据。如果您只想提取文件,而不是使用read
,还可以查看zipfile
的extract
和extract_all
方法>。You can't seek on a
urllib2.urlopen
ed file. The methods it supports are listed here: http://docs.python.org/library/urllib.html#urllib.urlopen.You'll have to retrieve the file (possibly with
urllib.urlretrieve
, http://docs.python.org/library/urllib.html#urllib.urlretrieve), then usezipfile
on it.Alternatively, you could
read()
theurlopen
ed file, then put it into aStringIO
, then usezipfile
on that, if you wanted the zipped data in memory. Also check out theextract
andextract_all
methods ofzipfile
if you just want to extract the file, instead of usingread
.我没有足够的代表来发表评论,但关于 Marius 的上述回答,请注意,对于 Python3,导入和 urlretrieve 调用需要稍作修改,因为 urllib 已分为多个模块。
成为:
并且
成为
I do not have enough rep to comment but regarding Marius's answer above please note that for Python3 there is a slight modification needed regarding import and urlretrieve call, since urllib has been split into several modules.
Becomes:
And
Becomes
迭代 @Marius 答案(直接从 zip 中读取单个文件),如果要将所有文件提取到一个目录,请执行以下操作:
这会将 zip 文件存储在临时目录中。如果您想保留它,可以将文件名传递给
urlretrieve
,例如urllib.request.urlretrieve(url, "my_zip_file.zip")
。Iterating on @Marius answer (which reads a single file directly from the zip), if you want to extract all files to a directory, do this:
This stores the zip file in a temporary dir. If you want to keep it around, you can pass a filename to
urlretrieve
, e.g.urllib.request.urlretrieve(url, "my_zip_file.zip")
.