我想使用 XPath 和 PHP 将节点的内容提取为字符串
我有一个接受通用 HTML 文件和通用 XPath 表达式的函数。我想提取包含整个文本(包括 HTML 标签)的匹配节点的字符串。 这是一个简化的示例...
<?php
$inDocStg = "
<html><body>
<div>The best-laid<br> schemes o' <span>mice</span> an' men
<img src='./mouse.gif'><br>
</div>
</body></html>
";
$xPathDom = new DOMDocument();
@$xPathDom->loadHTML( $inDocStg );
$xPath = new DOMXPath( $xPathDom );
$matches = $xPath->query( "//div" );
echo $matches->item(0)->nodeValue;
?>
这会产生(我正在查看生成的 HTML 源 - 不是浏览器输出)...
The best-laid schemas o' mouse and' men
(HTML 标签已被剔除)。
但我想要的是...
最好的布局
小鼠方案 an' men
谢谢。
I have a function that accepts a general HTML file and a general XPath expression. I want to extract a string of the matched node containing the entire text including HTML tags.
Here's a simplified example...
<?php
$inDocStg = "
<html><body>
<div>The best-laid<br> schemes o' <span>mice</span> an' men
<img src='./mouse.gif'><br>
</div>
</body></html>
";
$xPathDom = new DOMDocument();
@$xPathDom->loadHTML( $inDocStg );
$xPath = new DOMXPath( $xPathDom );
$matches = $xPath->query( "//div" );
echo $matches->item(0)->nodeValue;
?>
This produces (I'm looking at the generated HTML source - not the browser output)...
The best-laid schemes o' mice an' men
(the HTML tags have been stripped out).
But what I want is...
The best-laid<br> schemes o' <span>mice</span> an' men<img src='./mouse.gif'><br>
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
将输出包裹在
How about you wrap you output arround
<pre>
tagsecho "<pre>" . $matches->item(0)->nodeValue . "</pre>";
尝试一下这两个!
1
2
第一个返回此节点及其后代的文本内容,第二个尝试访问魔术方法
__toString()
.. 根据 DOMDocument 的构建方式,它可能是值你已经得到了。try giving these 2 a go!
1
2
The first one returns the text content of this node and its descendants, and the second one is trying to access the magic method
__toString()
.. depending on how DOMDocument is built it could be the value that your already getting.这可以工作,但没有 XPath;
或者
This will work but without XPath;
or