XPath 节点到字符串
我如何选择以下节点的字符串内容:
<span class="url">
word
<b class=" ">test</b>
</span>
<span class="url">
word
<b class=" ">test2</b>
more words
</span>
我尝试了一些方法
//span/text()
没有得到粗体标记
//span/string(.)
无效
string(//span)
仅选择1个节点
我在php中使用simple_xml,我认为唯一的其他选项是使用//span它返回:
Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => url
)
[b] => test
)
[1] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => url
)
[b] => test2
)
)
*请注意,它还会从第二个跨度中删除“更多单词”文本。
所以我想我可以如何使用 php 来压平数组中的项目? Xpath 是首选,但任何其他想法也会有帮助。
How can I select the string contents of the following nodes:
<span class="url">
word
<b class=" ">test</b>
</span>
<span class="url">
word
<b class=" ">test2</b>
more words
</span>
I have tried a few things
//span/text()
Doesn't get the bold tag
//span/string(.)
is invalid
string(//span)
only selects 1 node
I am using simple_xml in php and the only other option I think is to use //span which returns:
Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => url
)
[b] => test
)
[1] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => url
)
[b] => test2
)
)
*note that it is also dropping the "more words" text from the second span.
So I guess I could then flatten the item in the array using php some how? Xpath is preferred, but any other ideas would help too.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您甚至不需要 XPath:
在下面的注释后编辑
如果您只想获取字符串,您可以执行
echo $span->textContent;
而不是替换节点值。我知道您想要为跨度使用一个字符串,而不是嵌套结构。在这种情况下,您还应该考虑在跨度代码段上简单运行strip_tags
是否不是更快、更简单的替代方案。使用 PHP5.3,您还可以注册任意 PHP 函数以用作 XPath 查询中的回调。以下代码将获取所有 span 元素及其子节点的内容,并将其作为单个字符串返回。
You dont even need an XPath for this:
EDIT after comment below
If you just want to fetch the string, you can do
echo $span->textContent;
instead of replacing the nodeValue. I understood you wanted to have one string for the span, instead of the nested structure. In this case, you should also consider if simply runningstrip_tags
on the span snippet wouldnt be the faster and easier alternative.With PHP5.3 you can also register arbitrary PHP functions for use as callbacks in XPath queries. The following would fetch the content of all span elements and it's child nodes and return it as a single string.
使用 XMLReader:
输出:
Using XMLReader:
Output:
SimpleXML 不喜欢将文本节点与其他元素混合,这就是您丢失一些内容的原因。然而,DOM 扩展可以很好地处理这个问题。幸运的是,DOM 和 SimpleXML 是同一枚硬币 (libxml) 的两个面,因此很容易混合使用它们。例如:
SimpleXML doesn't like mixing text nodes with other elements, that's why you're losing some content there. The DOM extension, however, handles that just fine. Luckily, DOM and SimpleXML are two faces of the same coin (libxml) so it's very easy to juggle them. For instance:
这可能是你能做的最好的事情了。您将获得多个文本节点,因为文本存储在 DOM 中的单独节点中。如果您想要单个字符串,则必须自己连接文本节点,因为我想不出一种方法来让内置 XPath 函数来完成此操作。
使用
string()
或concat()
不起作用,因为这些函数需要字符串参数。当您将节点集传递给需要字符串的函数时,节点集将通过获取节点集中第一个节点的文本内容转换为字符串。其余节点将被丢弃。This may be the best you can do. You'll get multiple text nodes because the text is stored in separate nodes in the DOM. If you want a single string you'll have to just concatenate the text nodes yourself since I can't think of a way to get the built-in XPath functions to do it.
Using
string()
orconcat()
won't work because these functions expect string arguments. When you pass a node-set to a function expecting a string, the node-set is converted to a string by taking the text content of the first node in the node-set. The rest of the nodes are discarded.首先,我认为你的问题没有表述清楚。
您可以选择后代文本节点,因为 John Kugelman 的回答是
我建议使用绝对路径(不以
//
开头)但是您需要处理从父级
span
中找到它们是子级的文本节点。因此,最好只选择span
元素(例如//span
),然后处理其字符串值。使用 XPath 2.0,您可以使用:
结果:
使用 XSLT 1.0,此输入:
使用此样式表:
输出:
First, I think your question is not clear.
You could select the descendant text nodes as John Kugelman has answer with
I recommend to use the absolute path (not starting with
//
)But with this you would need to process the text nodes finding from wich parent
span
they are childs. So, it would be better to just select thespan
elements (as example,//span
) and then process its string value.With XPath 2.0 you could use:
Result:
With XSLT 1.0, this input:
With this stylesheet:
Output:
沿着 Alejandro 的 XSLT 1.0“但任何其他想法也会有所帮助”的答案...
XML:
XSL:
输出:
Along the lines of Alejandro's XSLT 1.0 "but any other ideas would help too" answer...
XML:
XSL:
OUTPUT: