httperror:找不到使用用户代理

发布于 2025-02-01 10:07:33 字数 741 浏览 3 评论 0原文

我很想打开一个如下的URL:

import urllib.request

url = "https://www.chess.cornell.edu/index.php/users/calculato%20rs/calculator-absolute-flux-measurement-using-xpd100"
# I tried to access to this url.
req = urllib.request.Request(
    url, 
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)
# using the user agent like many answers suggested.
f = urllib.request.urlopen(req)

但是,我总是会出现错误:

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Not Found

非常感谢您的任何帮助!

I am tyring to open an url like following:

import urllib.request

url = "https://www.chess.cornell.edu/index.php/users/calculato%20rs/calculator-absolute-flux-measurement-using-xpd100"
# I tried to access to this url.
req = urllib.request.Request(
    url, 
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)
# using the user agent like many answers suggested.
f = urllib.request.urlopen(req)

However, I always got the error like following:

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Not Found

Thanks a lot for any helps!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

無處可尋 2025-02-08 10:07:33

我使用了请求库,

import request
r = requests.get("https://www.chess.cornell.edu/index.php/users/calculato%20rs/calculator-absolute-flux-measurement-using-xpd100")

即使它返回a,

<Response [404]>

它也可以正常工作,您仍然可以使用r.text来获取网站的HTML

,这可能会发生这种情况,因为网站返回状态404 < /code>(找不到),即使它实际返回有效的页面。 Urllib恐慌并丢弃错误,您的浏览器和请求仍将遵循并向我们展示页面。

很高兴是否有帮助:)

I used the requests library and it worked fine

import request
r = requests.get("https://www.chess.cornell.edu/index.php/users/calculato%20rs/calculator-absolute-flux-measurement-using-xpd100")

even though it returns a,

<Response [404]>

you can still use r.text to get the html of the site

this probably happens because the site returns a status 404 (Not found) even though it actually returns a valid page. While urllib panics and throws an error your browser and requests will still follow through and show us the page.

Glad if this helps :)

雨落□心尘 2025-02-08 10:07:33

如果您甚至需要获取响应主体,即使有404错误,这是如何使用urllib 完成。 :

try:
    f = urllib.request.urlopen(req)
except urllib.error.HTTPError as err:
    f = err

当然,这是一个非常简单的片段,假设您想做f.read()以后处理内容。在强大的程序中,应该有各种检查HTTP响应代码,内容类型等的检查。


当然,使用请求(如@deerawijesundara建议)没有错。实际上,在类似情况下,我也会亲自使用请求,但是为了完整的缘故,我决定添加仅stdlib的答案。

If you need to fetch response body even for 404 error, this is how it's done using urllib:

try:
    f = urllib.request.urlopen(req)
except urllib.error.HTTPError as err:
    f = err

This is a very simplistic snippet, of course, assuming you want to do f.read() later on to process the content. In a robust program there should be all kinds of checks for HTTP response code, content types and so on.


There is nothing wrong with using requests (as suggested by @DeeraWijesundara), of course. In fact, I would personally use requests too in a similar case, but for completeness' sake I've decided to add an stdlib-only answer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文