使用 lxml 在 Google App Engine 上导入错误

发布于 2024-12-28 13:35:21 字数 2649 浏览 2 评论 0原文

我使用 lxml 来解析页面。当我使用应用程序引擎 sdk 运行代码时,它可以工作,但是当我在云中部署应用程序时,我在这里收到一条消息:

回溯(最近一次调用最后一次): 文件“/base/data/home/apps/s~testparsercyka/1.356245976008257055/handler_info.py”,第 2 行,位于 导入lxml.html 文件“/base/data/home/apps/s~testparsercyka/1.356245976008257055/lxml/html/init.py”,第 12 行,位于 从 lxml 导入 etree 导入错误:无法导入名称 etree

代码:

app.yaml



    application: testparsercyka
    version: 1
    runtime: python27
    api_version: 1
    threadsafe: false

    handlers:
    - url: /stylesheets
      static_dir: stylesheets

    - url: /.*
      script: handler_info.py

    libraries:
    - name: lxml
      version: "2.3"  # I thought this would allow me to use lxml.etree

handler_info.py



    import lxml
    import lxml.html
    import urllib
    from google.appengine.ext import webapp
    from google.appengine.ext.webapp.util import run_wsgi_app
    from google.appengine.ext.webapp import template
    import os
    import cgi
    class MainPage(webapp.RequestHandler):
        def get(self):
            template_values = {}
            path = os.path.join(os.path.dirname(__file__), 'index.html')
            self.response.out.write(template.render(path, template_values))
    class Handlers(webapp.RequestHandler):
        def post(self):
            #url = "http://habrahabr.ru/"
            url = str(self.request.get('url'))
            url_temp = url
            teg = str(self.request.get('teg'))
            attr = str(self.request.get('attr'))
            n0 = str(self.request.get('n0'))
            n = str(self.request.get('n'))
            a = attr.split(':')
            for i in range(int(n0),int(n)):
                url = url.format(str(i))
                self.response.out.write(url)
                html = urllib.urlopen(url).read()       
                doc = lxml.html.document_fromstring(html)
                url = url_temp
                self.getn(doc.getroottree().getroot(),teg,a)
        def getn(self,node,teg,a):
                if ((node.tag==teg) and (node.get(a[0])==a[1])):
                    #print node.tag,node.keys()
                    self.response.out.write(node.text)
                    self.response.out.write('
') for n in node: self.getn(n,teg,a) application = webapp.WSGIApplication([('/', MainPage),('/sign',Handlers)],debug=True) def main(): run_wsgi_app(application) if __name__ == "__main__": main()

有什么想法为什么这不起作用?

I use lxml to parse the pages. When I run my code with app engine sdk it works, but when I deploy my application in the cloud, I get a messege here:

Traceback (most recent call last):
File "/base/data/home/apps/s~testparsercyka/1.356245976008257055/handler_info.py", line 2, in
import lxml.html
File "/base/data/home/apps/s~testparsercyka/1.356245976008257055/lxml/html/init.py", line 12, in
from lxml import etree
ImportError: cannot import name etree

Code:

app.yaml



    application: testparsercyka
    version: 1
    runtime: python27
    api_version: 1
    threadsafe: false

    handlers:
    - url: /stylesheets
      static_dir: stylesheets

    - url: /.*
      script: handler_info.py

    libraries:
    - name: lxml
      version: "2.3"  # I thought this would allow me to use lxml.etree

handler_info.py



    import lxml
    import lxml.html
    import urllib
    from google.appengine.ext import webapp
    from google.appengine.ext.webapp.util import run_wsgi_app
    from google.appengine.ext.webapp import template
    import os
    import cgi
    class MainPage(webapp.RequestHandler):
        def get(self):
            template_values = {}
            path = os.path.join(os.path.dirname(__file__), 'index.html')
            self.response.out.write(template.render(path, template_values))
    class Handlers(webapp.RequestHandler):
        def post(self):
            #url = "http://habrahabr.ru/"
            url = str(self.request.get('url'))
            url_temp = url
            teg = str(self.request.get('teg'))
            attr = str(self.request.get('attr'))
            n0 = str(self.request.get('n0'))
            n = str(self.request.get('n'))
            a = attr.split(':')
            for i in range(int(n0),int(n)):
                url = url.format(str(i))
                self.response.out.write(url)
                html = urllib.urlopen(url).read()       
                doc = lxml.html.document_fromstring(html)
                url = url_temp
                self.getn(doc.getroottree().getroot(),teg,a)
        def getn(self,node,teg,a):
                if ((node.tag==teg) and (node.get(a[0])==a[1])):
                    #print node.tag,node.keys()
                    self.response.out.write(node.text)
                    self.response.out.write('
') for n in node: self.getn(n,teg,a) application = webapp.WSGIApplication([('/', MainPage),('/sign',Handlers)],debug=True) def main(): run_wsgi_app(application) if __name__ == "__main__": main()

Any ideas why this does not work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

燃情 2025-01-04 13:35:21

我知道这是一个老问题,但这是一个我已确认在部署到 App Engine 时有效的答案:

app.yaml

application: lxml-test
version: 1
runtime: python27
api_version: 1
threadsafe: false

handlers:
- url: /.*
  script: app.app

libraries:
- name: lxml
  version: "2.3"

- name: webapp2
  version: "latest"

app.py

import webapp2
import lxml.etree

class MainPage(webapp2.RequestHandler):
    def get(self):
        root = lxml.etree.XML('<top><content>Hello world!</content></top>')
        self.response.content_type = 'text/xml'
        self.response.write(lxml.etree.tostring(root, xml_declaration=True))

app = webapp2.WSGIApplication(routes=[('/', MainPage)], debug=True)

因此,就比较而言对于您的代码,以下一些更改可能会有所帮助:

  1. script: handler_info.py 更改为 script: handler_info.application
  2. 使用 webapp2,它比 webapp 更好、更新一些。

也有可能自 2012 年提出这个问题以来,这个问题就已经自行解决了。

I know this is an old question but here is an answer that I have confirmed to work when deployed to App Engine:

app.yaml

application: lxml-test
version: 1
runtime: python27
api_version: 1
threadsafe: false

handlers:
- url: /.*
  script: app.app

libraries:
- name: lxml
  version: "2.3"

- name: webapp2
  version: "latest"

app.py

import webapp2
import lxml.etree

class MainPage(webapp2.RequestHandler):
    def get(self):
        root = lxml.etree.XML('<top><content>Hello world!</content></top>')
        self.response.content_type = 'text/xml'
        self.response.write(lxml.etree.tostring(root, xml_declaration=True))

app = webapp2.WSGIApplication(routes=[('/', MainPage)], debug=True)

So in terms of comparing the above with your code, some of the following changes might help:

  1. Change script: hander_info.py to script: handler_info.application.
  2. Use webapp2 which is a bit nicer and newer than webapp.

It is also possible that the issue has simply resolved itself since 2012 when this question was asked.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文