Python 使用 HTTP 查找远程文件

发布于 2024-08-16 02:18:57 字数 186 浏览 12 评论 0原文

如何查找远程 (HTTP) 文件上的特定位置以便只能下载该部分?

假设远程文件上的字节为: 1234567890

我想查找 4 并从那里下载 3 个字节,这样我就会得到: 456

另外,如何检查远程文件是否存在? 我尝试过 os.path.isfile() 但当我传递远程文件 url 时它返回 False 。

How do I seek to a particular position on a remote (HTTP) file so I can download only that part?

Lets say the bytes on a remote file were: 1234567890

I wanna seek to 4 and download 3 bytes from there so I would have: 456

and also, how do I check if a remote file exists?
I tried, os.path.isfile() but it returns False when I'm passing a remote file url.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

陌路终见情 2024-08-23 02:18:57

如果您通过 HTTP 下载远程文件,则需要设置 Range 标头。

检查在此示例中如何完成。看起来像这样:

myUrlclass.addheader("Range","bytes=%s-" % (existSize))

编辑我刚刚找到了一个更好的实现。这个类使用起来非常简单,正如在文档字符串中可以看到的那样。

class HTTPRangeHandler(urllib2.BaseHandler):
"""Handler that enables HTTP Range headers.

This was extremely simple. The Range header is a HTTP feature to
begin with so all this class does is tell urllib2 that the 
"206 Partial Content" reponse from the HTTP server is what we 
expected.

Example:
    import urllib2
    import byterange

    range_handler = range.HTTPRangeHandler()
    opener = urllib2.build_opener(range_handler)

    # install it
    urllib2.install_opener(opener)

    # create Request and set Range header
    req = urllib2.Request('http://www.python.org/')
    req.header['Range'] = 'bytes=30-50'
    f = urllib2.urlopen(req)
"""

def http_error_206(self, req, fp, code, msg, hdrs):
    # 206 Partial Content Response
    r = urllib.addinfourl(fp, hdrs, req.get_full_url())
    r.code = code
    r.msg = msg
    return r

def http_error_416(self, req, fp, code, msg, hdrs):
    # HTTP's Range Not Satisfiable error
    raise RangeError('Requested Range Not Satisfiable')

更新:“更好的实现”已移至github:excid3/urlgrabberbyterange.py 文件中。

If you are downloading the remote file through HTTP, you need to set the Range header.

Check in this example how it can be done. Looks like this:

myUrlclass.addheader("Range","bytes=%s-" % (existSize))

EDIT: I just found a better implementation. This class is very simple to use, as it can be seen in the docstring.

class HTTPRangeHandler(urllib2.BaseHandler):
"""Handler that enables HTTP Range headers.

This was extremely simple. The Range header is a HTTP feature to
begin with so all this class does is tell urllib2 that the 
"206 Partial Content" reponse from the HTTP server is what we 
expected.

Example:
    import urllib2
    import byterange

    range_handler = range.HTTPRangeHandler()
    opener = urllib2.build_opener(range_handler)

    # install it
    urllib2.install_opener(opener)

    # create Request and set Range header
    req = urllib2.Request('http://www.python.org/')
    req.header['Range'] = 'bytes=30-50'
    f = urllib2.urlopen(req)
"""

def http_error_206(self, req, fp, code, msg, hdrs):
    # 206 Partial Content Response
    r = urllib.addinfourl(fp, hdrs, req.get_full_url())
    r.code = code
    r.msg = msg
    return r

def http_error_416(self, req, fp, code, msg, hdrs):
    # HTTP's Range Not Satisfiable error
    raise RangeError('Requested Range Not Satisfiable')

Update: The "better implementation" has moved to github: excid3/urlgrabber in the byterange.py file.

唠甜嗑 2024-08-23 02:18:57

我强烈建议使用 requests 库。它无疑是我用过的最好的 HTTP 库。特别是,为了完成您所描述的内容,您将执行以下操作:

import requests

url = "http://www.sffaudio.com/podcasts/ShellGameByPhilipK.Dick.pdf"

# Retrieve bytes between offsets 3 and 5 (inclusive).
r = requests.get(url, headers={"range": "bytes=3-5"})

# If a 4XX client error or a 5XX server error is encountered, we raise it.
r.raise_for_status()

I highly recommend using the requests library. It is easily the best HTTP library I have ever used. In particular, to accomplish what you have described, you would do something like:

import requests

url = "http://www.sffaudio.com/podcasts/ShellGameByPhilipK.Dick.pdf"

# Retrieve bytes between offsets 3 and 5 (inclusive).
r = requests.get(url, headers={"range": "bytes=3-5"})

# If a 4XX client error or a 5XX server error is encountered, we raise it.
r.raise_for_status()
時窥 2024-08-23 02:18:57

AFAIK,这是不可能使用 fseek() 或类似的。您需要使用 HTTP Range 标头来实现此目的。服务器可能支持也可能不支持此标头,因此您的情况可能会有所不同。

import urllib2

myHeaders = {'Range':'bytes=0-9'}

req = urllib2.Request('http://www.promotionalpromos.com/mirrors/gnu/gnu/bash/bash-1.14.3-1.14.4.diff.gz',headers=myHeaders)

partialFile = urllib2.urlopen(req)

s2 = (partialFile.read())

编辑:这当然是假设远程文件是指存储在 HTTP 服务器上的文件...

如果您想要的文件位于 FTP 服务器上,则 FTP 只允许指定开始偏移量而不是一个范围。如果这是你想要的,那么下面的代码应该可以做到(未经测试!)

import ftplib
fileToRetrieve = 'somefile.zip'
fromByte = 15
ftp = ftplib.FTP('ftp.someplace.net')
outFile = open('partialFile', 'wb')
ftp.retrbinary('RETR '+ fileToRetrieve, outFile.write, rest=str(fromByte))
outFile.close()

AFAIK, this is not possible using fseek() or similar. You need to use the HTTP Range header to achieve this. This header may or may not be supported by the server, so your mileage may vary.

import urllib2

myHeaders = {'Range':'bytes=0-9'}

req = urllib2.Request('http://www.promotionalpromos.com/mirrors/gnu/gnu/bash/bash-1.14.3-1.14.4.diff.gz',headers=myHeaders)

partialFile = urllib2.urlopen(req)

s2 = (partialFile.read())

EDIT: This is of course assuming that by remote file you mean a file stored on a HTTP server...

If the file you want is on an FTP server, FTP only allows to to specify a start offset and not a range. If this is what you want, then the following code should do it (not tested!)

import ftplib
fileToRetrieve = 'somefile.zip'
fromByte = 15
ftp = ftplib.FTP('ftp.someplace.net')
outFile = open('partialFile', 'wb')
ftp.retrbinary('RETR '+ fileToRetrieve, outFile.write, rest=str(fromByte))
outFile.close()
陌若浮生 2024-08-23 02:18:57

您可以使用 httpio 访问远程 HTTP 文件,就像它们是本地文件一样:

pip install httpio
import zipfile
import httpio

url = "http://some/large/file.zip"
with httpio.open(url) as fp:
    zf = zipfile.ZipFile(fp)
    print(zf.namelist())

You can use httpio to access remote HTTP files as if they were local:

pip install httpio
import zipfile
import httpio

url = "http://some/large/file.zip"
with httpio.open(url) as fp:
    zf = zipfile.ZipFile(fp)
    print(zf.namelist())
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文