SAE上抓取https资源出错,调了两天都没调出来,请大神帮忙看看 。

发布于 2022-09-01 19:53:54 字数 4446 浏览 31 评论 0

调了两天都没调出来,发到这里请大神帮忙看看可能是什么问题?

描述:在SAE上使用tornado.simple_httpclient.SimpleAsyncHTTPClient来抓取https网页,本地测试是没问题的

错误重现URLhttp://droprest.sinaapp.com/article?url=https%3A%2F%2Fpress.taobao.com%2Fdetail.html%3Fspm%3Da21bo.7724922.8439-0.1.K2HoLf%26postId%3D1723845&next=true

环境为python2.7.9,tornado为2.1.1

核心代码

from tornado import httpclient
from tornado import httpserver
from tornado.ioloop import IOLoop
from tornado import web

class Application(web.Application):
    def __init__(self, handlers=[], **kwargs):
        handlers.extend([
            (r"/article", Handler),
        ])

        settings = dict({
            'template_path': os.path.join(os.path.dirname(__file__), 'templates'),
            "debug": False,
        }, **kwargs)

        super(Application, self).__init__(handlers, **settings)


class Handler(web.RequestHandler):
    @web.asynchronous
    def get(self):
        self.url = self.get_argument('url', u'')

        headers = {
            'Accept-Encoding':'gzip',
            'Accept-Language': 'zh-CN,zh;q=0.8',
            "Accept-Charset": "UTF-8,*;q=0.5",
            "User-Agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17",
            "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        }
        # asynchronous fetch web page
        httpclient.AsyncHTTPClient(max_clients=20).fetch(
            httpclient.HTTPRequest(
                method='GET',
                url=self.url,
                headers=headers,
                follow_redirects=True),
            self.on_fetch,
        )


    def on_fetch(self, response):
        response.rethrow()

        content_type = response.headers.get('Content-Type')
        if 'text/html' not in content_type and 'application/xhtml' not in content_type:
            raise TypeError('not html or xhtml file')

        html = response.body

        # get content
        content = {u'content': html, 'url': self.url}
        
        self.finish()
        

if __name__ == '__main__':
    from tornado.options import parse_command_line
    parse_command_line()
    application = Application(**{'debug':True})

    logging.info('Server running on http://localhost:8080')
    http_server = httpserver.HTTPServer(application)
    http_server.listen(8080)
    IOLoop.instance().start()

详细信息:

- [2015/10/29 18:52:12] - ERROR:root:Exception in I/O handler for fd 10
Traceback (most recent call last):
  File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/ioloop.py", line 309, in start
    self._handlers[fd](fd, events)
  File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 270, in _handle_events
    self._handle_write()
  File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 614, in _handle_write
    self._do_ssl_handshake()
  File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 584, in _do_ssl_handshake
    self.socket.do_handshake()
  File "/usr/local/sae/python/lib/python2.7/ssl.py", line 788, in do_handshake
    self._sslobj.do_handshake()
SSLError: socket write not completed (_ssl.c:562) yq34 

  - [2015/10/29 18:52:12] - ERROR:root:Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 270, in _handle_events
    self._handle_write()
  File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 614, in _handle_write
    self._do_ssl_handshake()
  File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 584, in _do_ssl_handshake
    self.socket.do_handshake()
  File "/usr/local/sae/python/lib/python2.7/ssl.py", line 788, in do_handshake
    self._sslobj.do_handshake()
SSLError: socket write not completed (_ssl.c:562) yq34 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我三岁 2022-09-08 19:53:54

贴代码出来会比较好= =

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文