寻找£使用 lxml 签名
我正在努力解决编码和 lxml 问题。我正在阅读网站上的一些 html,并希望使用 lxml 搜索文本中包含 £ 的标签。我可以搜索标签(h3)并让内容打印正常,但如果我尝试在文本中搜索 £ 符号,我会得到 UnicodeDecodeError。我需要做后者,因为这是一个更一般的情况。
tree = lxml.html.fromstring(html)
# prints #£13,999
print tree.cssselect('h3')[0].text_content().encode("utf-8")
# generates "UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)"
# prints £13,999
print tree.cssselect('h3:contains(u"\xa3")')[0].text_content().encode('utf-8')
您能提供的任何帮助将不胜感激...我尝试了几种不同的方法,这让我发疯!
I'm struggling with encodings and lxml. I'm reading in some html from a website and would like to search for a tag that includes a £ in its text using lxml. I can search the the tag(h3) and get the contents to print fine but if I try to search for the £ sign within the text I get a UnicodeDecodeError. I need to do the latter because it's a more general case.
tree = lxml.html.fromstring(html)
# prints #£13,999
print tree.cssselect('h3')[0].text_content().encode("utf-8")
# generates "UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)"
# prints £13,999
print tree.cssselect('h3:contains(u"\xa3")')[0].text_content().encode('utf-8')
Any hep you can provide would be much appreciated... I've tried a several different things and this is driving me crazy!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我对 python 和 lxml 都没有经验,但问题可能是“h3”字符串不是 unicode 字符串并且字节
a3
不是一个 unicode 代码点本身。您可以尝试将: 替换为:
I'm not experienced with neither python nor lxml, but the problem could be that the 'h3' string isn't a unicode string and that the byte
a3
isn't a unicode code point by itself. You could try to replace:with: