我如何使用 xpath 和 lxml 从以下可怕的 html 中选择这些元素?
我想仅使用 lxml 和一些聪明的 xpath 从此 html 中选择以下字符串。字符串会改变,但周围的 html 不会改变。
我需要...
19/11/2010
AAAAAA/01
正常
英国
此描述可能包含
html 但我仍然需要所有这些!
来自...
...
<p>
<strong>Date:</strong> 19/11/2010<br>
<strong>Ref:</strong> AAAAAA/01<br>
<b>Type:</b> Normal<br>
<b>Country:</b> United Kingdom<br>
</p>
<hr>
<p>
<br>
<b>1. Title:</b> The Title<br>
<b>2. Description: </b> This description may contains <bold>html</bold> but i still need all of it!<br>
<b>3. Date:</b> 25th October<br>
...
</p>
...
到目前为止,我只想出了使用正则表达式和re:match
来尝试将其拖出来,但即使这样也赢了例如,如果没有某些东西使我能够获取
节点的innerHTML,则无法工作。
有没有办法在不通过正则表达式对字符串进行后处理的情况下做到这一点?
谢谢 :)
i want to select the following strings from this html using just lxml and some clever xpath. The strings will change but the surrounding html will not.
i need...
19/11/2010
AAAAAA/01
Normal
United Kingdom
This description may contains <bold>html</bold> but i still need all of it!
from...
...
<p>
<strong>Date:</strong> 19/11/2010<br>
<strong>Ref:</strong> AAAAAA/01<br>
<b>Type:</b> Normal<br>
<b>Country:</b> United Kingdom<br>
</p>
<hr>
<p>
<br>
<b>1. Title:</b> The Title<br>
<b>2. Description: </b> This description may contains <bold>html</bold> but i still need all of it!<br>
<b>3. Date:</b> 25th October<br>
...
</p>
...
So far i've only come up with using regex expressions and re:match
to try and drag it out, but even that won't work without something which enables me to get innerHTML of a the <p>
nodes for exapmle.
is there any way to do this without post-processing the string through regex?
Thanks :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
非常难看!有了这个格式正确的输入:
最简单的情况:
评估为:
所有这些合而为一:
复杂的情况:
Very ugly! With this properly wellformed input:
Simplest case:
Evaluate to:
All of those in one:
The complex one:
这并不是那么困难。
给定这个 XML 文档:
此 XPath 表达式选择上述所有文本节点:
使用这个:
This isn't so difficult.
Given this XML document:
this XPath expression selects all of the above text nodes:
Use this: