SAE上抓取https资源出错,调了两天都没调出来,请大神帮忙看看 。
调了两天都没调出来,发到这里请大神帮忙看看可能是什么问题?
描述:在SAE上使用tornado.simple_httpclient.SimpleAsyncHTTPClient来抓取https网页,本地测试是没问题的
错误重现URL:http://droprest.sinaapp.com/article?url=https%3A%2F%2Fpress.taobao.com%2Fdetail.html%3Fspm%3Da21bo.7724922.8439-0.1.K2HoLf%26postId%3D1723845&next=true
环境为python2.7.9,tornado为2.1.1
核心代码
from tornado import httpclient
from tornado import httpserver
from tornado.ioloop import IOLoop
from tornado import web
class Application(web.Application):
def __init__(self, handlers=[], **kwargs):
handlers.extend([
(r"/article", Handler),
])
settings = dict({
'template_path': os.path.join(os.path.dirname(__file__), 'templates'),
"debug": False,
}, **kwargs)
super(Application, self).__init__(handlers, **settings)
class Handler(web.RequestHandler):
@web.asynchronous
def get(self):
self.url = self.get_argument('url', u'')
headers = {
'Accept-Encoding':'gzip',
'Accept-Language': 'zh-CN,zh;q=0.8',
"Accept-Charset": "UTF-8,*;q=0.5",
"User-Agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17",
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}
# asynchronous fetch web page
httpclient.AsyncHTTPClient(max_clients=20).fetch(
httpclient.HTTPRequest(
method='GET',
url=self.url,
headers=headers,
follow_redirects=True),
self.on_fetch,
)
def on_fetch(self, response):
response.rethrow()
content_type = response.headers.get('Content-Type')
if 'text/html' not in content_type and 'application/xhtml' not in content_type:
raise TypeError('not html or xhtml file')
html = response.body
# get content
content = {u'content': html, 'url': self.url}
self.finish()
if __name__ == '__main__':
from tornado.options import parse_command_line
parse_command_line()
application = Application(**{'debug':True})
logging.info('Server running on http://localhost:8080')
http_server = httpserver.HTTPServer(application)
http_server.listen(8080)
IOLoop.instance().start()
详细信息:
- [2015/10/29 18:52:12] - ERROR:root:Exception in I/O handler for fd 10
Traceback (most recent call last):
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/ioloop.py", line 309, in start
self._handlers[fd](fd, events)
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 270, in _handle_events
self._handle_write()
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 614, in _handle_write
self._do_ssl_handshake()
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 584, in _do_ssl_handshake
self.socket.do_handshake()
File "/usr/local/sae/python/lib/python2.7/ssl.py", line 788, in do_handshake
self._sslobj.do_handshake()
SSLError: socket write not completed (_ssl.c:562) yq34
- [2015/10/29 18:52:12] - ERROR:root:Uncaught exception, closing connection.
Traceback (most recent call last):
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 270, in _handle_events
self._handle_write()
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 614, in _handle_write
self._do_ssl_handshake()
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 584, in _do_ssl_handshake
self.socket.do_handshake()
File "/usr/local/sae/python/lib/python2.7/ssl.py", line 788, in do_handshake
self._sslobj.do_handshake()
SSLError: socket write not completed (_ssl.c:562) yq34
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
贴代码出来会比较好= =