PHP DOM 解析所有文本节点

发布于 2024-12-15 02:30:34 字数 748 浏览 0 评论 0原文

有没有办法可以从 HTML 字符串中检索所有纯文本节点的数组?我希望它能够独立检索“嵌套”元素,因此像这样的字符串:

<p>This is a <b>nested <i>HTML</i> tag<b>...</p>

将被检索为 This is a, nested, HTMLtag... 作为单独的元素。

谷歌搜索和搜索 SO 让我把这些混乱的代码拼凑在一起:

$doc = new DOMDocument();
$doc->loadHTML($contents);
$doc->loadHTML("<p>not in the brackets..</p>");
$xpath = new DOMXPath($doc);
$textnodes = $xpath->evaluate('//text()');
echo '<pre>'.print_r($textnodes,1).'</pre>';die;

这给了我:

DOMNodeList Object
(
)

我以前从未使用过任何 DOM 对象 - 我的 XPath 也不是很好 - 所以我在这里感到非常无水!任何帮助将不胜感激。

Is there a way I can retrieve an array of all plain text nodes from an HTML string? I would like it to retrieve 'nested' elements independently, so a string like this:

<p>This is a <b>nested <i>HTML</i> tag<b>...</p>

would be retrieved as This is a, nested, HTML, tag, and ... as separate elements.

Googling and searching SO has led me to piece together this mess of code:

$doc = new DOMDocument();
$doc->loadHTML($contents);
$doc->loadHTML("<p>not in the brackets..</p>");
$xpath = new DOMXPath($doc);
$textnodes = $xpath->evaluate('//text()');
echo '<pre>'.print_r($textnodes,1).'</pre>';die;

This is giving me:

DOMNodeList Object
(
)

I've never used any DOM objects before - nor is my XPath great - so I feel very out of water here! Any help would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

下雨或天晴 2024-12-22 02:30:34

XPath 返回一个 DOMNodeList,需要对其进行正确评估。这是一个基于标签的示例:

$xpath = new DOMXpath( $templateDOM );
$xpath->registerNamespace( "fcm", "http://www.w3.org/1999/xhtml" );
$entries = $xpath->query( "//img" );

foreach( $entries as $entry ) {

    $newVar = array(
          'src'   => @$entry->attributes->getNamedItem( 'src' )->nodeValue, 
          'title' => $entry->attributes->getNamedItem( 'title' )->nodeValue, 
    );

    ...    

}

XPath returns a DOMNodeList, which need to be evaluated properly. Here is an example based on tags:

$xpath = new DOMXpath( $templateDOM );
$xpath->registerNamespace( "fcm", "http://www.w3.org/1999/xhtml" );
$entries = $xpath->query( "//img" );

foreach( $entries as $entry ) {

    $newVar = array(
          'src'   => @$entry->attributes->getNamedItem( 'src' )->nodeValue, 
          'title' => $entry->attributes->getNamedItem( 'title' )->nodeValue, 
    );

    ...    

}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文