如何在许多网站中搜索时如何获取所有唯一错误
我使用请求模块浏览了很多网站,我想看看该站点是否损坏/存在/是否可以访问它。我正在使用尝试/除功能,并且可以看到我遇到的错误。
我的问题:我有很多网站要浏览,并且不知道会发生什么错误。我可能已经看过所有这些,但我不知道。
以下是发生错误的一些示例:
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: TLSV1_UNRECOGNIZED_NAME] tlsv1 unrecognized name (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001D58BAB4850>, 'Connection to the_site timed out. (connect timeout=10)'))
<class 'requests.exceptions.ConnectTimeout'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BAB48B0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
<class 'requests.exceptions.ConnectionError'>
Err: ('Connection aborted.'the_site', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
<class 'requests.exceptions.ConnectionError'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BB44C40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
<class 'requests.exceptions.ConnectionError'>
296 nan: is Not reachable
Err: Invalid URL 'nan': No schema supplied. Perhaps you meant http://nan?
354 : is Not reachable, status_code: 404
您可以看到它们都略有不同(甚至忽略了对象ID和主机),
我尝试过:
try:
#Get Url
get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
# if the request succeeds
if get.status_code == 200:
print(f"{count} {url}: is reachable. status_code: {get.status_code}")
else:
print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
#Exception
except requests.exceptions.RequestException as e:
print(e.errno)
print(f"{url}: is Not reachable \nErr: {e}")
但是E.Errno只是返回一个无值。我不确定它是如何工作的,但我希望它能返回与该特定错误相关的唯一数字,但我想我错了。
我还从请求模块中使用了所有其他E.的e。
为了澄清,我不是在谈论Sslerror或Connectionerror等课程。
TLDR:如何获得我遇到的所有唯一错误的列表,以便我可以搜索如何在线预防这些错误。
I am going through a lot of sites using the request module and I want to see if the site is broken/exists/if I can access it. I am using a try/except function and can see what errors I get.
My issue: I have lots of sites to go through and don't know what errors can happen. I may have seen all of them but I don't know that.
Here are some examples of the errors that occurred:
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: TLSV1_UNRECOGNIZED_NAME] tlsv1 unrecognized name (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1129)')))
<class 'requests.exceptions.SSLError'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001D58BAB4850>, 'Connection to the_site timed out. (connect timeout=10)'))
<class 'requests.exceptions.ConnectTimeout'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BAB48B0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
<class 'requests.exceptions.ConnectionError'>
Err: ('Connection aborted.'the_site', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
<class 'requests.exceptions.ConnectionError'>
Err: HTTPSConnectionPool(host='the_site', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D58BB44C40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
<class 'requests.exceptions.ConnectionError'>
296 nan: is Not reachable
Err: Invalid URL 'nan': No schema supplied. Perhaps you meant http://nan?
354 : is Not reachable, status_code: 404
As you can see they are all slightly different (even ignoring the object Id and the host)
I have tried:
try:
#Get Url
get = requests.get(url, allow_redirects=True, timeout=1,verify=True,headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"})
# if the request succeeds
if get.status_code == 200:
print(f"{count} {url}: is reachable. status_code: {get.status_code}")
else:
print(f"{count} {url}: is Not reachable, status_code: {get.status_code}")
#Exception
except requests.exceptions.RequestException as e:
print(e.errno)
print(f"{url}: is Not reachable \nErr: {e}")
but e.errno just returns a None value. I am not sure how it works but I expected it to return the unique number associated with that specific error but I was wrong I guess.
I also played around with all the other e.somthing and other things from the request module but I cant seem to find a way to get all the unique types of errors I am getting and will get later.
For clarification I am not talking about the classes like SSLError or ConnectionError.
TLDR: How to I can I get a list of all unique errors I am getting so I can search how to prevent those errors online.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
收到您收到的错误列表
如果您只想在不停止代码的情况
下 捕获所有和所有的错误都比发生的所有错误,因此请确保正确记录它们,以帮助您是否出现需求。
If you only want to produce a list of errors you are receiving without stopping your code, you can just use the base class of all exceptions:
Exception
:Your code will then become:
Keep in mind that this obviously catches any and all errors than can occur, so make sure you log them properly to aid in debugging if the need ever arises.
只是对此回答:
不,那不是它的工作方式。
“ errno”是一个旧的UNIX错误约定,例如,请参见 https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html 列出了他们的符号名称。如果您希望这些是“ OS”错误。
现在,如果您查看Python异常机制,它正是使用
errno
的“ oserror”例外,请参见 https://docs.python.org/3.8/library/exceptions.html :因此,如果您愿意,Python异常是OS级别的超集,因此具有一些ernno值。因此,由库和您自己的代码定义的所有其他例外绝对不必依赖这一点,并且没有理由他们拥有
errno
属性。(这很好:所有库和代码如何解决一个共享数字序列以编码自己的异常?我根本不会扩展)。
Replying just on that:
No, that is not how it works.
"errno" is an old Unix error convention, see for example https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html listing them with their symbolic name. If you want those are "OS" errors.
Now if you look at Python exceptions mechanism, it is exactly defining an "OSError" exception which uses
errno
, see at https://docs.python.org/3.8/library/exceptions.html :So if you want, Python exceptions are a superset of exceptions being at the OS level hence with some errno value. As such all other exceptions defined by libraries and your own code absolutely do not have to rely on this and there is no reason they would have an
errno
attribute.(and this is good: how could all libraries and code settle for a single share sequence of numbers to encode their own exceptions? I wouldn't scale at all).