如何在许多网站中搜索时如何获取所有唯一错误

发布于 2025-02-12 09:29:39 字数 3120 浏览 0 评论 0原文

我使用请求模块浏览了很多网站,我想看看该站点是否损坏/存在/是否可以访问它。我正在使用尝试/除功能,并且可以看到我遇到的错误。

我的问题:我有很多网站要浏览,并且不知道会发生什么错误。我可能已经看过所有这些,但我不知道。

以下是发生错误的一些示例:

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: TLSV1_UNRECOGNIZED_NAME] tlsv1 unrecognized name (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001D58BAB4850>, 'Connection to the_site timed out. (connect timeout=10)'))
<class 'requests.exceptions.ConnectTimeout'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BAB48B0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
<class 'requests.exceptions.ConnectionError'>

Err: ('Connection aborted.'the_site', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
<class 'requests.exceptions.ConnectionError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BB44C40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
<class 'requests.exceptions.ConnectionError'>

296 nan: is Not reachable 
Err: Invalid URL 'nan': No schema supplied. Perhaps you meant http://nan?

354 : is Not reachable, status_code: 404

您可以看到它们都略有不同(甚至忽略了对象ID和主机),

我尝试过:

try:
        #Get Url
        get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
        # if the request succeeds
        if get.status_code == 200:
            print(f"{count} {url}: is reachable. status_code: {get.status_code}")
        else:
            print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
    #Exception
except requests.exceptions.RequestException as e:
        print(e.errno)
        print(f"{url}: is Not reachable \nErr: {e}")

但是E.Errno只是返回一个无值。我不确定它是如何工作的,但我希望它能返回与该特定错误相关的唯一数字,但我想我错了。

我还从请求模块中使用了所有其他E.的e。

为了澄清,我不是在谈论Sslerror或Connectionerror等课程。

TLDR:如何获得我遇到的所有唯一错误的列表,以便我可以搜索如何在线预防这些错误。

I am going through a lot of sites using the request module and I want to see if the site is broken/exists/if I can access it. I am using a try/except function and can see what errors I get.

My issue: I have lots of sites to go through and don't know what errors can happen. I may have seen all of them but I don't know that.

Here are some examples of the errors that occurred:

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: TLSV1_UNRECOGNIZED_NAME] tlsv1 unrecognized name (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001D58BAB4850>, 'Connection to the_site timed out. (connect timeout=10)'))
<class 'requests.exceptions.ConnectTimeout'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BAB48B0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
<class 'requests.exceptions.ConnectionError'>

Err: ('Connection aborted.'the_site', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
<class 'requests.exceptions.ConnectionError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BB44C40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
<class 'requests.exceptions.ConnectionError'>

296 nan: is Not reachable 
Err: Invalid URL 'nan': No schema supplied. Perhaps you meant http://nan?

354 : is Not reachable, status_code: 404

As you can see they are all slightly different (even ignoring the object Id and the host)

I have tried:

try:
        #Get Url
        get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
        # if the request succeeds
        if get.status_code == 200:
            print(f"{count} {url}: is reachable. status_code: {get.status_code}")
        else:
            print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
    #Exception
except requests.exceptions.RequestException as e:
        print(e.errno)
        print(f"{url}: is Not reachable \nErr: {e}")

but e.errno just returns a None value. I am not sure how it works but I expected it to return the unique number associated with that specific error but I was wrong I guess.

I also played around with all the other e.somthing and other things from the request module but I cant seem to find a way to get all the unique types of errors I am getting and will get later.

For clarification I am not talking about the classes like SSLError or ConnectionError.

TLDR: How to I can I get a list of all unique errors I am getting so I can search how to prevent those errors online.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

假装爱人 2025-02-19 09:29:39

收到您收到的错误列表

如果您只想在不停止代码的情况

try:
        #Get Url
        get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
        # if the request succeeds
        if get.status_code == 200:
            print(f"{count} {url}: is reachable. status_code: {get.status_code}")
        else:
            print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
    #Exception
except Exception as e:
        print(f"{url}: is Not reachable \nErr: {e}")

下 捕获所有和所有的错误都比发生的所有错误,因此请确保正确记录它们,以帮助您是否出现需求。

If you only want to produce a list of errors you are receiving without stopping your code, you can just use the base class of all exceptions: Exception:

Your code will then become:

try:
        #Get Url
        get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
        # if the request succeeds
        if get.status_code == 200:
            print(f"{count} {url}: is reachable. status_code: {get.status_code}")
        else:
            print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
    #Exception
except Exception as e:
        print(f"{url}: is Not reachable \nErr: {e}")

Keep in mind that this obviously catches any and all errors than can occur, so make sure you log them properly to aid in debugging if the need ever arises.

千紇 2025-02-19 09:29:39

只是对此回答:

但是E.errno只是返回一个无值。我不确定它是如何工作的,但我希望它能返回与该特定错误相关的唯一数字,但我猜我错了。

不,那不是它的工作方式。

“ errno”是一个旧的UNIX错误约定,例如,请参见 https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html 列出了他们的符号名称。如果您希望这些是“ OS”错误。

现在,如果您查看Python异常机制,它正是使用errno的“ oserror”例外,请参见 https://docs.python.org/3.8/library/exceptions.html

异常Oserror(Errno,strerror [,filename [,winerror [,filename2]]])

因此,如果您愿意,Python异常是OS级别的超集,因此具有一些ernno值。因此,由库和您自己的代码定义的所有其他例外绝对不必依赖这一点,并且没有理由他们拥有errno属性。

(这很好:所有库和代码如何解决一个共享数字序列以编码自己的异常?我根本不会扩展)。

Replying just on that:

but e.errno just returns a None value. I am not sure how it works but I expected it to return the unique number associated with that specific error but I was wrong I guess.

No, that is not how it works.

"errno" is an old Unix error convention, see for example https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html listing them with their symbolic name. If you want those are "OS" errors.

Now if you look at Python exceptions mechanism, it is exactly defining an "OSError" exception which uses errno, see at https://docs.python.org/3.8/library/exceptions.html :

exception OSError(errno, strerror[, filename[, winerror[, filename2]]])

So if you want, Python exceptions are a superset of exceptions being at the OS level hence with some errno value. As such all other exceptions defined by libraries and your own code absolutely do not have to rely on this and there is no reason they would have an errno attribute.

(and this is good: how could all libraries and code settle for a single share sequence of numbers to encode their own exceptions? I wouldn't scale at all).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文