为什么OpenStreetMap通过python ulllib.request.urlretrieve给出403错误，但是在使用请求时可以很好地工作？

发布于 2025-01-25 19:07:52 字数 1116 浏览 4 评论 0原文

我正在尝试从OpenStreetMap获取瓷砖。我了解了他们API的语法。我有一种使用请求的“旧方式”来做事情，并有一个“新方法”，该方法正在别人的github项目使用。我正在尝试使用他们的项目，但这对我来说是一个地方。然后，我将这个简短的脚本汇总为一个最小的例子：

from PIL import Image
import requests
import urllib.request

#A tile we want to grab from Open Street Map
tile_url = "https://a.tile.openstreetmap.org/16/19299/24629.png"


#Old way: Get the bytes via requests.get, then parse as image and save.
old_way = Image.open(requests.get(tile_url,stream=True).raw)
old_way.save("oldway.png")


#New way: use urlretrieve to directly copy the file over to newway.png
destination = "newway.png"
try:
    path, response = urllib.request.urlretrieve(tile_url, destination)
except urllib.error.URLError as e:
    print("URL error!")
    print(e.code)

#Expected behavior: Two identical images are created,
#named oldway.png and newway.png
#Actual behavior: We get oldway, but newway gives a 403 error.

这里出了什么问题？这两个HTTP获取请求有什么区别导致第一个工作正常工作，而第二个http请求给出了403个错误？我已经尝试挖掘相应的python库的源代码，但事实证明这是一个相当长的函数互相呼叫的一团糟。当然，由于它是HTTPS，我无法监视网络连接以评估要传输的原始字节并以这种方式弄清楚事物。

用户代理有什么问题吗？标头的东西？请注意，我不仅试图解决这个问题（“有什么问题？案件将来会使我受益。

原文

I'm trying to grab tiles from OpenStreetMap. I learned the syntax of their API. I have an "old way" of doing things, using Requests, and a "new way", which is being used by someone else's github project. I'm trying to use their project, but it's failing for me in one spot. I then put together this short script as a minimal example:

from PIL import Image
import requests
import urllib.request

#A tile we want to grab from Open Street Map
tile_url = "https://a.tile.openstreetmap.org/16/19299/24629.png"


#Old way: Get the bytes via requests.get, then parse as image and save.
old_way = Image.open(requests.get(tile_url,stream=True).raw)
old_way.save("oldway.png")


#New way: use urlretrieve to directly copy the file over to newway.png
destination = "newway.png"
try:
    path, response = urllib.request.urlretrieve(tile_url, destination)
except urllib.error.URLError as e:
    print("URL error!")
    print(e.code)

#Expected behavior: Two identical images are created,
#named oldway.png and newway.png
#Actual behavior: We get oldway, but newway gives a 403 error.

What's going wrong here? What's the difference between these two HTTP GET requests which results in the first one working fine, and the second one giving a 403 error? I've tried digging into the source code of the respective Python libraries, but it turns out to be a pretty long mess of functions calling each other. And of course since it's HTTPS, I can't monitor the network connection to evaluate the raw bytes being transferred and figure things out that way.

Is there something wrong with the user-agent? Something with the headers? Please note that I'm not just trying to solve this problem ("What's the issue? You have a method that works. Ignore the broken one"), but I'm trying to learn about the details here and hoping that this odd edge case can benefit me in the future.

分享到QQ

分享到微博