寻找£使用 lxml 签名

发布于 2024-11-07 22:01:42 字数 544 浏览 1 评论 0原文

我正在努力解决编码和 lxml 问题。我正在阅读网站上的一些 html，并希望使用 lxml 搜索文本中包含 £ 的标签。我可以搜索标签（h3）并让内容打印正常，但如果我尝试在文本中搜索 £ 符号，我会得到 UnicodeDecodeError。我需要做后者，因为这是一个更一般的情况。

tree = lxml.html.fromstring(html)

# prints #£13,999
print tree.cssselect('h3')[0].text_content().encode("utf-8")

# generates "UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)"

# prints £13,999
print tree.cssselect('h3:contains(u"\xa3")')[0].text_content().encode('utf-8')

您能提供的任何帮助将不胜感激...我尝试了几种不同的方法，这让我发疯！

原文

I'm struggling with encodings and lxml. I'm reading in some html from a website and would like to search for a tag that includes a £ in its text using lxml. I can search the the tag(h3) and get the contents to print fine but if I try to search for the £ sign within the text I get a UnicodeDecodeError. I need to do the latter because it's a more general case.

tree = lxml.html.fromstring(html)

# prints #£13,999
print tree.cssselect('h3')[0].text_content().encode("utf-8")

# generates "UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)"

# prints £13,999
print tree.cssselect('h3:contains(u"\xa3")')[0].text_content().encode('utf-8')

Any hep you can provide would be much appreciated... I've tried a several different things and this is driving me crazy!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兮子 2024-11-14 22:01:42

我对 python 和 lxml 都没有经验，但问题可能是“h3”字符串不是 unicode 字符串并且字节a3不是一个 unicode 代码点本身。您可以尝试将：替换

'h3:contains(u"\xa3")'

为：

u'h3:contains("\u00a3")'

I'm not experienced with neither python nor lxml, but the problem could be that the 'h3' string isn't a unicode string and that the byte a3 isn't a unicode code point by itself. You could try to replace: