我有一个关于 dom4j 的 XPath contains 的小问题...
假设我的 XML 是
<Home>
<Addr>
<Street>ABC</Street>
<Number>5</Number>
<Comment>BLAH BLAH BLAH <br/><br/>ABC</Comment>
</Addr>
</Home>
假设我想找到给定根元素的文本中具有 ABC 的所有节点...
所以我需要编写的 XPath 是be
//*[contains(text(),'ABC')]
但这不是 dom4j 返回的...这是 dom4j 问题还是我对 XPath 工作原理的理解,因为该查询返回只有 Street
元素而不是 Comment
元素?
DOM 使 Comment
元素成为具有四个标签的复合元素,两个
[Text = 'XYZ'][BR][BR][Text = 'ABC']
我认为查询仍应返回该元素,因为它应该找到该元素并在其上运行 contains ,但事实并非如此。 下面的查询返回元素,但
它返回的不仅仅是元素——它还返回父元素,这对于问题来说是不受欢迎的。
//*[contains(text(),'ABC')]
有谁知道只返回元素
和
的 XPath 查询吗?
I have a small problem with XPath contains with dom4j ...
Let's say my XML is
<Home>
<Addr>
<Street>ABC</Street>
<Number>5</Number>
<Comment>BLAH BLAH BLAH <br/><br/>ABC</Comment>
</Addr>
</Home>
Let's say I want to find all the nodes that have ABC in the text given the root Element...
So the XPath that I would needed to write would be
//*[contains(text(),'ABC')]
However this is not what dom4j returns .... is this a dom4j problem or my understanding how XPath works, since that query returns only the Street
element and not the Comment
element?
The DOM makes the Comment
element a composite element with four tags two
[Text = 'XYZ'][BR][BR][Text = 'ABC']
I would assume that the query should still return the element since it should find the element and run contains on it, but it doesn't ...
The following query returns the element, but it returns far more then just the element – it returns the parent elements as well, which is undesirable to the problem.
//*[contains(text(),'ABC')]
Does any one know the XPath query that would return just the elements <Street/>
and <Comment/>
?
发布评论
评论(6)
这是匹配包含给定文本字符串的节点的另一种方法。首先查询文本节点本身,然后获取父节点:
对我来说,这很容易阅读和理解。
Here is an alternate way to match nodes which contain a given text string. First query for the text node itself, then get the parent:
For me this is easy to read and understand.
回报
returns
接受的答案也将返回所有父节点。即使字符串位于
之后,也仅获取具有 ABC 的实际节点:
The accepted answer will return all the parent nodes too. To get only the actual nodes with ABC even if the string is after
:
XML 文档:
XPath 表达式:
//*
匹配任何 根节点的后代元素。即,除根节点之外的任何元素。[...]
是一个 谓词,它过滤节点集。它返回...
为true
的节点:contains('haystack', 'needle')
返回true
如果haystack
包含针
:但是
contains()
将字符串作为其第一个参数。并且它已经通过了节点。要处理作为第一个参数传递的每个节点或节点集,请使用 通过string()
函数转换为字符串:string()
函数返回 string-value ="noreferrer">第一个节点:元素节点的
字符串值
:文本节点字符串值一个>:
因此,基本上,
string-value
是节点中包含的所有文本(所有后代文本节点的串联)。text()
是节点测试匹配任何文本节点:话虽如此,
//*[contains(text(), 'ABC')]
匹配任何第一个文本节点包含ABC
。由于text()
返回一个节点集,其中包含上下文节点的所有子文本节点(相对于计算表达式的节点)。但contains()
仅采用第一个。因此,对于上面的文档,路径与Street
元素匹配。以下表达式
//*[text()[contains(., 'ABC')]]
匹配至少有一个子文本节点且包含 < 的任何元素(根节点除外)代码>ABC。.
表示上下文节点。在本例中,它是除根节点之外的任何元素的子文本节点。因此,对于上面的文档,路径与Street
和Comment
元素匹配。现在,
//*[contains(., 'ABC')]
匹配包含ABC
的任何元素(但根节点除外)(在后代文本的串联中)节点)。对于上面的文档,它匹配Home
、Addr
、Street
和Comment
元素。因此,//*[contains(., 'BLAH ABC')]
匹配Home
、Addr
和注释
元素。The XML document:
The XPath expression:
//*
matches any descendant element of the root node. That is, any element but the root node.[...]
is a predicate, it filters the node-set. It returns nodes for which...
istrue
:contains('haystack', 'needle')
returnstrue
ifhaystack
containsneedle
:But
contains()
takes a string as its first parameter. And it's passed nodes. To deal with that every node or node-set passed as the first parameter is converted to a string by thestring()
function:string()
function returnsstring-value
of the first node:string-value
of an element node:string-value
of a text node:So, basically
string-value
is all text that is contained in a node (concatenation of all descendant text nodes).text()
is a node test that matches any text node:Having that said,
//*[contains(text(), 'ABC')]
matches any element (but the root node), the first text node of which containsABC
. Sincetext()
returns a node-set that contains all child text nodes of the context node (relative to which an expression is evaluated). Butcontains()
takes only the first one. So for the document above the path matches theStreet
element.The following expression
//*[text()[contains(., 'ABC')]]
matches any element (but the root node), that has at least one child text node, that containsABC
..
represents the context node. In this case, it's a child text node of any element but the root node. So for the document above the path matches theStreet
, and theComment
elements.Now then,
//*[contains(., 'ABC')]
matches any element (but the root node) that containsABC
(in the concatenation of the descendant text nodes). For the document above it matches theHome
, theAddr
, theStreet
, and theComment
elements. As such,//*[contains(., 'BLAH ABC')]
matches theHome
, theAddr
, and theComment
elements.涵盖 XPath 1.0 与 XPath 2.0+ 行为的现代答案...
此 XPath 的
行为与 XPath 1.0 和更高版本的 XPath(2.0 +)。
常见行为
//*
选择文档中的所有元素。[]
根据其中表达的谓词过滤这些元素。contains(string, substring)
将过滤那些substring
是string
中的子字符串的元素。XPath 1.0 行为
contains(arg1, substring)
:如果第一个参数的计算结果为节点集,contains()
会将节点集转换为通过获取节点集中的第一个节点的字符串值来获取字符串。 (如果arg1
是text()
,则contains()
只会考虑所有匹配的text
节点中的第一个>.) 如果您觉得这很奇怪,那么您并不孤单。//*[contains(text(),'ABC')]
,该节点集将是文档中每个元素的所有子文本节点。'ABC'
子字符串包含测试的期望。XPath 1.0 在线示例 显示仅选择了一个
'ABC'
。XPath 2.0+ 行为
contains(arg1, substring)
是错误的。XPath 2.0 在线示例 显示了由于 XPath 2.0+ 特有的转换错误而导致的典型错误消息。
常见解决方案
如果您希望包含后代元素(超出子元素),请针对 作为单个字符串的元素的字符串值,而不是子文本节点的各个字符串值,此 XPath,
选择目标
Street
和Comment
元素及其Addr
和Home
祖先元素,因为它们也有'ABC'
作为其字符串值的子字符串。在线示例显示祖先也被选择。
如果您希望排除后代元素(超出子元素),则此 XPath,
仅选择目标
Street
和Comment
,因为只有这些元素的文本节点子节点的字符串值包含'ABC'
子字符串。对于所有版本的 XPath 都是如此在线示例仅显示
街道
和评论< /code> 被选中。
Modern answer that covers XPath 1.0 vs XPath 2.0+ behavior ...
This XPath,
behaves differently with XPath 1.0 and later versions of XPath (2.0+).
Common behavior
//*
selects all elements within a document.[]
filters those elements according to the predicate expressed therein.contains(string, substring)
within the predicate will filter those elements for whichsubstring
is a substring instring
.XPath 1.0 behavior
contains(arg1, substring)
: If the first argument evaluates to a node set,contains()
will convert the node set to a string by taking the string value of the first node in the node set. (Ifarg1
istext()
, only the first of all matchingtext
nodes will be considered bycontains()
.) If you find this odd, you are not alone.//*[contains(text(),'ABC')]
that node set will be all child text nodes of each element in the document.'ABC'
substring containment is violated.XPath 1.0 online example shows that only one
'ABC'
is selected.XPath 2.0+ behavior
contains(arg1, substring)
with a sequence of more than one item as the first argument.XPath 2.0 online example shows a typical error message due to the conversion error particular to XPath 2.0+.
Common solutions
If you wish to include descendent elements (beyond children), test against the string value of an element as a single string, rather than the individual string values of the child text nodes, this XPath,
selects your targeted
Street
andComment
elements and also theirAddr
andHome
ancestor elements because those too have'ABC'
as substrings of their string values.Online example shows ancestors being selected too.
If you wish to exclude descendent elements (beyond children), this XPath,
selects only your targeted
Street
andComment
because only those elements have text node children whose string values contain the'ABC'
substring. This will be true for all versions of XPathOnline example shows only
Street
andComment
being selected.
标记包含两个文本节点和两个
节点作为子节点。你的 xpath 表达式是
为了打破这个问题,
*
是一个匹配任何元素(即标签)的选择器——它返回一个节点集。[]
是在该节点集中的每个单独节点上运行的条件。如果它所操作的任何单个节点与括号内的条件匹配,则它匹配。text()
是一个选择器,它匹配作为上下文节点子节点的所有文本节点——它返回一个节点集。contains
是一个对字符串进行操作的函数。如果传递节点集,则节点集通过返回转换为字符串节点集中文档顺序第一个节点的字符串值。因此,它只能匹配
元素中的第一个文本节点 - 即BLAH BLAH BLAH
。由于不匹配,因此您在结果中不会看到
。您需要将其更改为
*
是一个匹配任何元素(即标签)的选择器 - 它返回一个节点集。[]
是对节点集中的每个单独节点进行操作的条件 - 这里它对文档中的每个元素进行操作。text()
是一个选择器,它匹配作为上下文节点子节点的所有文本节点——它返回一个节点集。[]
是对该节点集中的每个节点(这里是每个单独的文本节点)进行操作的条件。每个单独的文本节点是括号中任何路径的起点,也可以在括号内显式引用为.
。如果它所操作的任何单个节点与括号内的条件匹配,则它匹配。contains
是一个对字符串进行操作的函数。这里传递的是一个单独的文本节点 (.
)。由于它是单独传递
标记中的第二个文本节点,因此它将看到'ABC'
字符串并能够匹配它。The
<Comment>
tag contains two text nodes and two<br>
nodes as children.Your xpath expression was
To break this down,
*
is a selector that matches any element (i.e. tag) -- it returns a node-set.[]
are a conditional that operates on each individual node in that node set. It matches if any of the individual nodes it operates on match the conditions inside the brackets.text()
is a selector that matches all of the text nodes that are children of the context node -- it returns a node set.contains
is a function that operates on a string. If it is passed a node set, the node set is converted into a string by returning the string-value of the node in the node-set that is first in document order. Hence, it can match only the first text node in your<Comment>
element -- namelyBLAH BLAH BLAH
. Since that doesn't match, you don't get a<Comment>
in your results.You need to change this to
*
is a selector that matches any element (i.e. tag) -- it returns a node-set.[]
are a conditional that operates on each individual node in that node set -- here it operates on each element in the document.text()
is a selector that matches all of the text nodes that are children of the context node -- it returns a node set.[]
are a conditional that operates on each node in that node set -- here each individual text node. Each individual text node is the starting point for any path in the brackets, and can also be referred to explicitly as.
within the brackets. It matches if any of the individual nodes it operates on match the conditions inside the brackets.contains
is a function that operates on a string. Here it is passed an individual text node (.
). Since it is passed the second text node in the<Comment>
tag individually, it will see the'ABC'
string and be able to match it.