过滤 xml 文件以删除其中包含某些文本的行?
例如,假设我有:
<div class="info"><p><b>Orange</b>, <b>One</b>, ...
<div class="info"><p><b>Blue</b>, <b>Two</b>, ...
<div class="info"><p><b>Red</b>, <b>Three</b>, ...
<div class="info"><p><b>Yellow</b>, <b>Four</b>, ...
并且我想从列表中删除所有包含单词的行,因此我只会在符合我的条件的行上使用 xpath。例如,我可以使用列表作为 ['Orange', 'Red']
来标记不需要的行,因此在上面的示例中我只想使用第 2 行和第 4 行进行进一步处理。
我该怎么做?
For example, suppose I have:
<div class="info"><p><b>Orange</b>, <b>One</b>, ...
<div class="info"><p><b>Blue</b>, <b>Two</b>, ...
<div class="info"><p><b>Red</b>, <b>Three</b>, ...
<div class="info"><p><b>Yellow</b>, <b>Four</b>, ...
And I'd like to remove all lines that have words from a list so I'll only use xpath on the lines that fit my criteria. For example, I could use the list as ['Orange', 'Red']
to mark the unwanted lines, so in the above example I'd only want to use lines 2 and 4 for further processing.
How can I do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用:
选择 XML 文档中的任何
div
元素,这样它就没有p
子元素,其b
子元素的string value 是管道分隔的字符串列表中用作过滤器的字符串之一。这种方法只需将新的过滤器值添加到管道分隔列表中即可实现扩展,而无需更改 XPath 表达式中的任何其他内容。
注意:当 XML 文档的结构静态已知时,请始终避免使用
//
XPath 伪运算符,因为它会导致效率显着降低(速度变慢)。Use:
This selects any
div
elements in the XML document, such that it has nop
child whoseb
child's string valu is one of the strings in the pipe-separated list of strings to use as filters.This approach allows extensibility by just adding new filter values to the pipe-separated list, without changing anything else in the XPath expression.
Note: When the structure of the XML document is statically known, always avoid using the
//
XPath pseudo-operator, because it leads to significant inefficiency (slowdown).产量
yields