我有一个需要清理的 xhtml 节点,其内部文本如下:
<字体大小=“1”face=Arial>
< ;br>
此处显示图像文本
我自己无法弄清楚返回/查找多次出现的
元素的 xpath 表达式。我是否需要在节点中进行递归并检查最后一个匹配项?
更新:我正在使用 HtmlAgilityPack 来浏览文档。
提前致谢!
问候,
字节从机
I have an xhtml node that I need to clean, with the following innerText:
<img style="width: 402px; height: 312px;" src="http://www.mydomain.com/test.jpg" align="left" border="0" height="312" hspace="5" vspace="5" width="402"> <br><font size="1" face="Arial"><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><font face="Verdana">Image text goes here</font> </font>
I can't figure out by myself the xpath expressions that returns / finds multiple occurrences of the <br>
element. do I need to do recursion in the nodes and check against the last match ?
UPDATE: I'm using HtmlAgilityPack to navigate through the doc.
Thanks in advance!
Regards,
byte_slave
发布评论
评论(2)
不太确定你想用这个做什么。我已经问过你希望它转换成什么作为问题的评论...
猜测你可能想要做什么...
要找出
元素的总数,您只需使用XPathcount(//descendant-or-self::br)
或者,如果您想对另一个相邻的所有
元素执行某些操作
您可以使用 XPath//descendant-or-self::br[following-sibling::br or previous-sibling::br]
来仅返回长长的
列表Not really sure what you want to do with this. I have asked what you want it transformed to as a comment of the question…
Guessing what you might want to do though…
To find out the total number of
<br/>
elements, you just use XPathcount(//descendant-or-self::br)
Or if you want to do something with all the
<br/>
elements that are next to another<br/>
you could use XPath//descendant-or-self::br[following-sibling::br or preceding-sibling::br]
to return just that long list of<br/>
sXPath 无法工作,因为它不是 XHTML。所有 br 标签均未封闭。哎呀,甚至 img 标签本身也不完整......
您需要使用纯文本处理(可能是正则表达式)或 HTML 清理程序来清理它。查看
xmllint
和
HTML 整洁
XPath is not going to work because this is NOT XHTML. All the br tags are unclosed. Heck, even the img tag itself is imcomplete...
You need to clean this with plain text handling (regular expressions, likely) or HTML sanitizers. Look at
xmllint
and
HTML tidy