如何在许多网站中搜索时如何获取所有唯一错误

发布于 2025-02-12 09:29:39 字数 3120 浏览 0 评论 0原文

我使用请求模块浏览了很多网站，我想看看该站点是否损坏/存在/是否可以访问它。我正在使用尝试/除功能，并且可以看到我遇到的错误。

我的问题：我有很多网站要浏览，并且不知道会发生什么错误。我可能已经看过所有这些，但我不知道。

以下是发生错误的一些示例：

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: TLSV1_UNRECOGNIZED_NAME] tlsv1 unrecognized name (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001D58BAB4850>, 'Connection to the_site timed out. (connect timeout=10)'))
<class 'requests.exceptions.ConnectTimeout'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BAB48B0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
<class 'requests.exceptions.ConnectionError'>

Err: ('Connection aborted.'the_site', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
<class 'requests.exceptions.ConnectionError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BB44C40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
<class 'requests.exceptions.ConnectionError'>

296 nan: is Not reachable 
Err: Invalid URL 'nan': No schema supplied. Perhaps you meant http://nan?

354 : is Not reachable, status_code: 404

您可以看到它们都略有不同（甚至忽略了对象ID和主机），

我尝试过：

try:
        #Get Url
        get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
        # if the request succeeds
        if get.status_code == 200:
            print(f"{count} {url}: is reachable. status_code: {get.status_code}")
        else:
            print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
    #Exception
except requests.exceptions.RequestException as e:
        print(e.errno)
        print(f"{url}: is Not reachable \nErr: {e}")

但是E.Errno只是返回一个无值。我不确定它是如何工作的，但我希望它能返回与该特定错误相关的唯一数字，但我想我错了。

我还从请求模块中使用了所有其他E.的e。

为了澄清，我不是在谈论Sslerror或Connectionerror等课程。

TLDR：如何获得我遇到的所有唯一错误的列表，以便我可以搜索如何在线预防这些错误。

原文

I am going through a lot of sites using the request module and I want to see if the site is broken/exists/if I can access it. I am using a try/except function and can see what errors I get.

My issue: I have lots of sites to go through and don't know what errors can happen. I may have seen all of them but I don't know that.

Here are some examples of the errors that occurred:

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: TLSV1_UNRECOGNIZED_NAME] tlsv1 unrecognized name (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001D58BAB4850>, 'Connection to the_site timed out. (connect timeout=10)'))
<class 'requests.exceptions.ConnectTimeout'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BAB48B0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
<class 'requests.exceptions.ConnectionError'>

Err: ('Connection aborted.'the_site', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
<class 'requests.exceptions.ConnectionError'>

Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BB44C40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
<class 'requests.exceptions.ConnectionError'>

296 nan: is Not reachable 
Err: Invalid URL 'nan': No schema supplied. Perhaps you meant http://nan?

354 : is Not reachable, status_code: 404

As you can see they are all slightly different (even ignoring the object Id and the host)

I have tried:

try:
        #Get Url
        get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
        # if the request succeeds
        if get.status_code == 200:
            print(f"{count} {url}: is reachable. status_code: {get.status_code}")
        else:
            print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
    #Exception
except requests.exceptions.RequestException as e:
        print(e.errno)
        print(f"{url}: is Not reachable \nErr: {e}")

but e.errno just returns a None value. I am not sure how it works but I expected it to return the unique number associated with that specific error but I was wrong I guess.

I also played around with all the other e.somthing and other things from the request module but I cant seem to find a way to get all the unique types of errors I am getting and will get later.

For clarification I am not talking about the classes like SSLError or ConnectionError.

TLDR: How to I can I get a list of all unique errors I am getting so I can search how to prevent those errors online.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

假装爱人 2025-02-19 09:29:39

收到您收到的错误列表

如果您只想在不停止代码的情况

try:
        #Get Url
        get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
        # if the request succeeds
        if get.status_code == 200:
            print(f"{count} {url}: is reachable. status_code: {get.status_code}")
        else:
            print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
    #Exception
except Exception as e:
        print(f"{url}: is Not reachable \nErr: {e}")

下捕获所有和所有的错误都比发生的所有错误，因此请确保正确记录它们，以帮助您是否出现需求。

If you only want to produce a list of errors you are receiving without stopping your code, you can just use the base class of all exceptions: Exception:

Your code will then become:

try:
        #Get Url
        get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
        # if the request succeeds
        if get.status_code == 200:
            print(f"{count} {url}: is reachable. status_code: {get.status_code}")
        else:
            print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
    #Exception
except Exception as e:
        print(f"{url}: is Not reachable \nErr: {e}")

Keep in mind that this obviously catches any and all errors than can occur, so make sure you log them properly to aid in debugging if the need ever arises.

回复收藏 0 原文