使用Python下载并解压文件

发布于 2024-11-26 17:04:01 字数 481 浏览 0 评论 0原文

我正在尝试下载并打开压缩文件，但在使用 zipfile 的文件类型句柄时似乎遇到问题。运行此命令时，我收到错误“AttributeError：addinfourl 实例没有属性‘seek’”：

import zipfile
import urllib2

def download(url,directory,name):
 webfile = urllib2.urlopen('http://www.sec.gov'+url)
 webfile2 = zipfile.ZipFile(webfile)
 content = zipfile.ZipFile.open(webfile2).read()
 localfile = open(directory+name, 'w')
 localfile.write(content)
 localfile.close()
 return()

download(link.get("href"),'./fails_data', link.text)

原文

I am trying to download and open a zipped file and seem to be having trouble using a file type handle with zipfile. I'm getting the error "AttributeError: addinfourl instance has no attribute 'seek'" when running this:

import zipfile
import urllib2

def download(url,directory,name):
 webfile = urllib2.urlopen('http://www.sec.gov'+url)
 webfile2 = zipfile.ZipFile(webfile)
 content = zipfile.ZipFile.open(webfile2).read()
 localfile = open(directory+name, 'w')
 localfile.write(content)
 localfile.close()
 return()

download(link.get("href"),'./fails_data', link.text)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

栖迟 2024-12-03 17:04:02

总而言之，以下代码从网站检索压缩文件中第一个文件的内容：

import urllib.request
import zipfile
    
url = 'http://www.gutenberg.lib.md.us/4/8/8/2/48824/48824-8.zip'
filehandle, _ = urllib.request.urlretrieve(url)
zip_file_object = zipfile.ZipFile(filehandle, 'r')
first_file = zip_file_object.namelist()[0]
file = zip_file_object.open(first_file)
content = file.read()

Putting things together, the following retrieves the content of the first file within a zipped file from a website:

import urllib.request
import zipfile
    
url = 'http://www.gutenberg.lib.md.us/4/8/8/2/48824/48824-8.zip'
filehandle, _ = urllib.request.urlretrieve(url)
zip_file_object = zipfile.ZipFile(filehandle, 'r')
first_file = zip_file_object.namelist()[0]
file = zip_file_object.open(first_file)
content = file.read()

回复收藏 0 原文

海之角 2024-12-03 17:04:02

截至 2020 年，您可以使用 dload 下载并解压缩文件，即：

import dload
dload.save_unzip("https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip")

默认情况下，它会解压到脚本路径上带有 zip 文件名的目录，但您可以指定解压位置：

dload.save_unzip("https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip", "/extract/here")

使用 pip install dload 安装

As of 2020, you can use dload to download and unzip a file, i.e.:

import dload
dload.save_unzip("https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip")

By default it extracts to a dir on the script path with the zip file name, but you can specify the extract location:

dload.save_unzip("https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip", "/extract/here")

install using pip install dload

回复收藏 0 原文

暖伴 2024-12-03 17:04:02

您无法在 urllib2.urlopened 文件上进行搜索。这里列出了它支持的方法： http://docs.python.org/library/urllib.html# urllib.urlopen。

您必须检索该文件（可能使用 urllib.urlretrieve，http: //docs.python.org/library/urllib.html#urllib.urlretrieve），然后使用 zipfile 。

或者，您可以 read() urlopened 文件，然后将其放入 StringIO，然后使用 zipfile就这一点而言，如果您想要内存中的压缩数据。如果您只想提取文件，而不是使用 read，还可以查看 zipfile 的 extract 和 extract_all 方法>。

回复收藏 0 原文

歌枕肩 2024-12-03 17:04:02

我没有足够的代表来发表评论，但关于 Marius 的上述回答，请注意，对于 Python3，导入和 urlretrieve 调用需要稍作修改，因为 urllib 已分为多个模块。

import urllib

成为：

import urllib.request

并且

filehandle, _ = urllib.urlretrieve(url)

成为

filehandle, _ = urllib.request.urlretrieve(url)

I do not have enough rep to comment but regarding Marius's answer above please note that for Python3 there is a slight modification needed regarding import and urlretrieve call, since urllib has been split into several modules.

import urllib

Becomes:

import urllib.request

And

filehandle, _ = urllib.urlretrieve(url)

Becomes

filehandle, _ = urllib.request.urlretrieve(url)

回复收藏 0 原文

笑红尘 2024-12-03 17:04:02

迭代 @Marius 答案（直接从 zip 中读取单个文件），如果要将所有文件提取到一个目录，请执行以下操作：

import urllib
import zipfile

url = "http://www.gutenberg.lib.md.us/4/8/8/2/48824/48824-8.zip"
extract_dir = "example"

zip_path, _ = urllib.request.urlretrieve(url)
with zipfile.ZipFile(zip_path, "r") as f:
    f.extractall(extract_dir)

这会将 zip 文件存储在临时目录中。如果您想保留它，可以将文件名传递给urlretrieve，例如urllib.request.urlretrieve(url, "my_zip_file.zip")。

Iterating on @Marius answer (which reads a single file directly from the zip), if you want to extract all files to a directory, do this:

import urllib
import zipfile

url = "http://www.gutenberg.lib.md.us/4/8/8/2/48824/48824-8.zip"
extract_dir = "example"

zip_path, _ = urllib.request.urlretrieve(url)
with zipfile.ZipFile(zip_path, "r") as f:
    f.extractall(extract_dir)

This stores the zip file in a temporary dir. If you want to keep it around, you can pass a filename to urlretrieve, e.g. urllib.request.urlretrieve(url, "my_zip_file.zip").

回复收藏 0 原文

~没有更多了~