使用lxml.html的cssselect选择ID属性中带有冒号的元素
我的页面中有一个如下所示的元素:
<a id="cid-694094:Comment:188384" name="694094:Comment:188384"></a>
如果您执行 document.cssselect("#cid-694094:Comment:188384")
您将得到:
lxml.cssselect.ExpressionError:伪类符号(u'Comment',12)未知
方案在 这个问题(此人正在使用 Java)。
然而,当我在 Python 中尝试这样做时:
document.cssselect(r"#cid-694094\:Comment\:188384")
我得到:
lxml.cssselect.SelectorSyntaxError: 错误符号 'cid-694094\': 'unicodeescape' 编解码器无法解码位置 10 处的字节 0x5c: \ 在 [Token(u'#', 0)] 处字符串末尾 - >无
原因和建议的解决方案可以在 这个问题。如果我理解正确的话我应该这样做:
document.cssselect(r"#cid-694094\\:Comment\\:188384")
但这仍然不起作用。相反,我再次得到:
lxml.cssselect.ExpressionError:伪类符号(u'Comment \',14)未知
有人能告诉我我做错了什么吗?
自己尝试使用:
import lxml.html
document = lxml.html.fromstring(
'<a id="cid-694094:Comment:188384" name="694094:Comment:188384"></a>'
)
document.cssselect(r"#cid-694094\:Comment\:188384")
I have an element in a page that looks like this:
<a id="cid-694094:Comment:188384" name="694094:Comment:188384"></a>
If you do document.cssselect("#cid-694094:Comment:188384")
you will get:
lxml.cssselect.ExpressionError: The psuedo-class Symbol(u'Comment', 12) is unknown
The solution for that is handled in this question (the person was using Java).
However, when I try that in Python as such:
document.cssselect(r"#cid-694094\:Comment\:188384")
I get:
lxml.cssselect.SelectorSyntaxError: Bad symbol 'cid-694094\': 'unicodeescape' codec can't decode byte 0x5c in position 10: \ at end of string at [Token(u'#', 0)] -> None
The reason for that and a proposed solution can be found in this question. If I understand it correctly I should be doing:
document.cssselect(r"#cid-694094\\:Comment\\:188384")
But this still doesn't work. Instead I once again get:
lxml.cssselect.ExpressionError: The psuedo-class Symbol(u'Comment\', 14) is unknown
Can anybody tell me what I'm doing wrong?
Try it yourself using:
import lxml.html
document = lxml.html.fromstring(
'<a id="cid-694094:Comment:188384" name="694094:Comment:188384"></a>'
)
document.cssselect(r"#cid-694094\:Comment\:188384")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不是
:
css 中不允许< /a> 用于 id 或 class?这是一个解决方法:
Isn't
:
not allowed in css for id or class?Here is a work-around:
ID 选择器中通常不允许
:
,这确实是转义它的正确方法:但是,直到最近,选择器解析器才真正被破坏。 (它并没有真正实现反斜杠转义。)我在 cssselect 0.7 中修复了这个问题,它现在是一个独立的项目,从 lxml 中提取。
http://packages.python.org/cssselect/
使用它的“新”方式是更详细一点:
lxml 2.4(尚未发布)将使用新的 cssselect,因此更简单的语法也将起作用。
:
is normally not allowed in ID selectors, and this is indeed the correct way to escape it:However the selector parser in was really broken until recently. (It did not really implement backslash-escapes.) I fixed this in cssselect 0.7 which is now an independent project, extracted from lxml.
http://packages.python.org/cssselect/
The "new" way to use it is a bit more verbose:
lxml 2.4 (not released yet) will use the new cssselect so the simpler syntax will work too.