如何使用Python快速下载文件?
我尝试了诸如WGET之类的不同模块,它们都需要同一时间执行。
在此示例中,我将从reddit
https://////////////v.-redd。 it/rfxd2e2zhet81/dash_1080.mp4?source = sufflback
video_url="https://v.redd.it/rfxd2e2zhet81/DASH_1080.mp4?source=fallback"
start = datetime.datetime.now()
print(start)
response = requests.get(video_url)
stop = datetime.datetime.now()
print(stop)
print("status: " + str(response.status_code))
输出:
2022-04-14 15:59:52.258759
2022-04-14 16:02:03.791324
status: 200
使用firefox,同一请求似乎少于一秒钟。
右键单击和“将视频另存为”与即时无法区分。
我从堆栈溢出上研究类似问题的理解是,以下最小示例应导致下载时间确定,仅取决于我的互联网连接。 为单个连接配置了
https://www.speedtest.net/ href =“ https://i.sstatic.net/s1rwy.png” rel =“ nofollow noreferrer”> 
该文件的大小约为20 MB,实际上不应该花很长时间才能下载。
作为控件,此通话快速完成。
video_url="https://stackoverflow.com/questions/71872663/speed-up-python-requests-download-speed"
start = datetime.datetime.now()
print(start)
response = requests.get(video_url)
stop = datetime.datetime.now()
print(stop)
print("status: " + str(response.status_code))
输出:
2022-04-14 15:58:47.022299
2022-04-14 15:58:47.418743
status: 200
我针对在我自己的斑点存储上托管的40 MB文件也采用了相同的请求:
2022-04-14 16:07:59.304382
2022-04-14 16:08:00.729495
status: 200
根据使用Firefox,Python和Python的速度差异,在其他目标上看起来像Python在节流。
我如何使用Python脚本并相应地避免被限制?
我尝试使用Firefox在第一个请求中使用的标题无济于事 - 结果是相同的。
How can I download a file fast using python?
I tried different modules like wget and they all take about the same time to execute.
In this example I will get a file from reddit
https://v.redd.it/rfxd2e2zhet81/DASH_1080.mp4?source=fallback
video_url="https://v.redd.it/rfxd2e2zhet81/DASH_1080.mp4?source=fallback"
start = datetime.datetime.now()
print(start)
response = requests.get(video_url)
stop = datetime.datetime.now()
print(stop)
print("status: " + str(response.status_code))
output:
2022-04-14 15:59:52.258759
2022-04-14 16:02:03.791324
status: 200
Using Firefox the same request completes in seemingly less than a second.

A right click and "save video as" is not distinguishable from instant.
My understanding from researching similar questions on stack overflow is that the following minimal example should result in OK download times and only depend on my internet connection. https://www.speedtest.net/ configured for a single connection gives me the following result:

The file is about 20 MB in size and really should not take long to download.
As a control, this call finishes fast.
video_url="https://stackoverflow.com/questions/71872663/speed-up-python-requests-download-speed"
start = datetime.datetime.now()
print(start)
response = requests.get(video_url)
stop = datetime.datetime.now()
print(stop)
print("status: " + str(response.status_code))
output:
2022-04-14 15:58:47.022299
2022-04-14 15:58:47.418743
status: 200
I ran the same request against a 40 MB file hosted on my own blob storage:
2022-04-14 16:07:59.304382
2022-04-14 16:08:00.729495
status: 200
Based on the speed differences using firefox, python and python on other targets it looks like Python is beeing throttled.
How can I use a python script and behave accordingly as to avoid being throttled?
I tried using the headers that firefox was using in its first request to no avail - the outcome was the same.
发布评论
评论(2)
观察您使用代码206。 表示已发送请求,每个请求可能是文件的不同部分。下载后,将零件焊接为重新创建文件。如果每个部分都以与单个下载相似的速度,则可以允许较短的下载时间。可以使用
请求
通过使用Aldopate标头发送请求(请参阅206 partial
的链接描述),并使用例如多处理
,但是在此之前,我必须警告您,并非所有的服务器都应该仔细支持部分gimmick,您应该仔细,您应该仔细,并且计算创建代码的其他负担是否值得获得您可以实现的收益。Observe that you got responses with code 206.
206 Partial
means that requests were sent, each presumably for different part of file. After download finish parts are welded to recreate file. This might allow shorter download time, if every part is served with similar speed as when there is single download. Such behavior might be emulated usingrequests
by sending request with appriopate headers (see linked description of206 Partial
) and using for examplemultiprocessing
, but before that I must warn you that not all servers support partial gimmick and you should carefully calculate if additional burden of creating code for doing so is worth gain you can achieve.看起来解决方案是绕过Python Eco系统。
我测试了用户@Daweo在评论中建议的解决方案。
它需要ARIA2安装。
输出是:
因此花了大约400毫秒。
It looks like the solution is to get around the python eco system.
I tested the solution that user @Daweo suggested in the comments.
It requires an aria2 installation.
the output is:
So that took something like 400 ms.