使用代理
我已经构建了一个脚本(在互联网资源的帮助下),该脚本从特定网站获取可用代理列表,然后逐一检查以找到工作代理。一旦发现它就从该代理构建并打开。这是我的代码。
import urllib2
import urllib
import cookielib
import socket
import time
def getOpener(pip=None):
if pip:
proxy_handler = urllib2.ProxyHandler({'http': pip})
opener = urllib2.build_opener(proxy_handler)
else:
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1')]
urllib2.install_opener(opener)
return opener
def getContent(opnr, url):
req = urllib2.Request(url)
sock = opnr.open(req)
return sock.read()
def is_bad_proxy(pip):
try:
opnr = getOpener(pip)
data = getContent(opnr, 'http://www.google.com')
except urllib2.HTTPError, e:
return e.code
except Exception, detail:
return True
return False
def getProxiesList():
proxies = []
opnr = getOpener()
content = getContent(opnr, 'http://somesite.com/')
urls = re.findall("<a href='([^']+)'[^>]*>.*?HTTP Proxies.*?</a>", content)
for eachURL in urls:
content = getContent(opnr, eachURL)
proxies.extend(re.findall('\d{,3}\.\d{,3}\.\d{,3}\.\d{,3}:\d+', content))
return proxies
def getWorkingProxy(proxyList, i=-1):
for j in range(i+1, len(proxyList)):
currentProxy = proxyList[j]
if not is_bad_proxy(currentProxy):
log("%s is working" % (currentProxy))
return currentProxy, j
else:
log("Bad Proxy %s" % (currentProxy))
return None, -1
if __name__ == "__main__":
socket.setdefaulttimeout(60)
proxyList = getProxiesList()
proxy, index = getWorkingProxy(proxyList)
if proxy:
_web = getOpener(proxy)
当我在某种程度上使用一个代理时,我必须一次又一次地重复这个过程。问题是一次又一次构建开启器会导致问题吗?
因为我遇到以下错误HTTPError:HTTP错误503:打开的连接太多
。请帮我看看错误的原因是什么?提前致谢。
I have build a script(by help of internet resources) which takes list of available proxies from a particular website and then it check one by one to find the working proxy. Once it found it build and opener from that proxy. Here is my code.
import urllib2
import urllib
import cookielib
import socket
import time
def getOpener(pip=None):
if pip:
proxy_handler = urllib2.ProxyHandler({'http': pip})
opener = urllib2.build_opener(proxy_handler)
else:
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1')]
urllib2.install_opener(opener)
return opener
def getContent(opnr, url):
req = urllib2.Request(url)
sock = opnr.open(req)
return sock.read()
def is_bad_proxy(pip):
try:
opnr = getOpener(pip)
data = getContent(opnr, 'http://www.google.com')
except urllib2.HTTPError, e:
return e.code
except Exception, detail:
return True
return False
def getProxiesList():
proxies = []
opnr = getOpener()
content = getContent(opnr, 'http://somesite.com/')
urls = re.findall("<a href='([^']+)'[^>]*>.*?HTTP Proxies.*?</a>", content)
for eachURL in urls:
content = getContent(opnr, eachURL)
proxies.extend(re.findall('\d{,3}\.\d{,3}\.\d{,3}\.\d{,3}:\d+', content))
return proxies
def getWorkingProxy(proxyList, i=-1):
for j in range(i+1, len(proxyList)):
currentProxy = proxyList[j]
if not is_bad_proxy(currentProxy):
log("%s is working" % (currentProxy))
return currentProxy, j
else:
log("Bad Proxy %s" % (currentProxy))
return None, -1
if __name__ == "__main__":
socket.setdefaulttimeout(60)
proxyList = getProxiesList()
proxy, index = getWorkingProxy(proxyList)
if proxy:
_web = getOpener(proxy)
And i have to repeat this process again and again when i utilize one proxy to some extent. The problem is does building an opener again and again will cause issues??
Because i am having following error HTTPError: HTTP Error 503: Too many open connections
. Please help me what would be the reason for the error? Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我检查过,
proxyList
包含重复项。如此多的开启者
尝试使用同一代理,导致错误HTTPError:HTTP错误503:打开的连接太多
I checked and
proxyList
contains duplicates. So manyopeners
were trying to use the same proxy which caused the errorHTTPError: HTTP Error 503: Too many open connections