PHP DOM 解析所有文本节点
有没有办法可以从 HTML 字符串中检索所有纯文本节点的数组?我希望它能够独立检索“嵌套”元素,因此像这样的字符串:
<p>This is a <b>nested <i>HTML</i> tag<b>...</p>
将被检索为 This is a
, nested
, HTML
、tag
和 ...
作为单独的元素。
谷歌搜索和搜索 SO 让我把这些混乱的代码拼凑在一起:
$doc = new DOMDocument();
$doc->loadHTML($contents);
$doc->loadHTML("<p>not in the brackets..</p>");
$xpath = new DOMXPath($doc);
$textnodes = $xpath->evaluate('//text()');
echo '<pre>'.print_r($textnodes,1).'</pre>';die;
这给了我:
DOMNodeList Object
(
)
我以前从未使用过任何 DOM 对象 - 我的 XPath 也不是很好 - 所以我在这里感到非常无水!任何帮助将不胜感激。
Is there a way I can retrieve an array of all plain text nodes from an HTML string? I would like it to retrieve 'nested' elements independently, so a string like this:
<p>This is a <b>nested <i>HTML</i> tag<b>...</p>
would be retrieved as This is a
, nested
, HTML
, tag
, and ...
as separate elements.
Googling and searching SO has led me to piece together this mess of code:
$doc = new DOMDocument();
$doc->loadHTML($contents);
$doc->loadHTML("<p>not in the brackets..</p>");
$xpath = new DOMXPath($doc);
$textnodes = $xpath->evaluate('//text()');
echo '<pre>'.print_r($textnodes,1).'</pre>';die;
This is giving me:
DOMNodeList Object
(
)
I've never used any DOM objects before - nor is my XPath great - so I feel very out of water here! Any help would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
XPath 返回一个 DOMNodeList,需要对其进行正确评估。这是一个基于标签的示例:
XPath returns a DOMNodeList, which need to be evaluated properly. Here is an example based on tags: