PHP 用 DOM 解析(无结果)
我正在尝试检索位于此 span 类属性中的正文文本。
<span id="" style="color:#525B64;">The quick brown fox jumped over the lazy dog.</span>
我在我的网络服务器上测试了它,没有收到任何错误,但页面是空白的。我对此很陌生,所以我不知道从这里该去哪里。
这是我的代码。
<?php
// Load remote file, supress parse errors
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://somewebpage.com');
libxml_clear_errors();
// use XPath to find all nodes with a class attribute of header
$xp = new DOMXpath($dom);
$nodes = $xp->query('//span[@class="msgBody"]');
// output first item's content
echo $nodes->item(0)->nodeValue;
?>
I am trying to retreive the body text located in this span class attribute.
<span id="" style="color:#525B64;">The quick brown fox jumped over the lazy dog.</span>
I tested it on my web server and I get no errors but the page is blank. I'm very new to this so I do not know where to go from here.
Here is my code.
<?php
// Load remote file, supress parse errors
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://somewebpage.com');
libxml_clear_errors();
// use XPath to find all nodes with a class attribute of header
$xp = new DOMXpath($dom);
$nodes = $xp->query('//span[@class="msgBody"]');
// output first item's content
echo $nodes->item(0)->nodeValue;
?>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这段代码中一切看起来都很好。
我想做的是:
file_get_contents
加载远程文件,看看它是否//*
之类的 XPath 正确加载查询文档,并循环遍历生成的DOMNodeList
(使用 foreach)查看树是否正确构建。顺便提一句。为了抑制
->loadHTMLFile()
方法报告的解析错误,我使用@
运算符。Everything seems fine in this code.
What I'd try to do is:
file_get_contents
to see if it loads properly//*
and loop over resultingDOMNodeList
(with foreach) to see if the tree is built correctly.Btw. to surpress parse errors reported by
->loadHTMLFile()
method I use@
operator.DOM 为所有内容创建节点:属性、文本、注释、元素,凡是你能想到的东西。因此,您并不是在追求跨度节点的值,即使看起来是这样,您实际上想要获取跨度内的 TextNode 并获取其值。尝试类似的操作:
您也可以直接从 xpath 查询中获取此内容:(
尽管我个人在 xpath 方面从未有过太多运气。)
The DOM creates nodes for everthing: attributes, text, comments, elements, you name it. So you're not after the value of the span node even though it might seem that way, you actually want to get the TextNode inside of the span and get its value instead. Try something like:
You can also get this directly from the xpath query:
(Though I've never had much luck with xpath, personally.)
您确定您正在解析的文档中只有一个包含此类的
span
元素吗?也许
->item(0)
返回空元素并且所需元素是列表中的下一个元素?Are you sure there is only one
span
element with this class in the document you are parsing?Maybe
->item(0)
returns empty element and the desired element is next on the list?这种行为通常是由于默认命名空间(检查是否有类似的内容:
xmlhs="http://www.w3.org/1999/xhtml"< /代码>)。
在 XPath 表达式中使用默认命名空间中的元素名称是 xpath 标记中最常见的常见问题 - 只需搜索“xpath 默认命名空间”即可找到许多好的答案。
Very often such behavior is due to a default namespace (check to see if there is something similar to this:
xmlhs="http://www.w3.org/1999/xhtml"
).Using in XPath expressions element names that are in default namespace, is the most FAQ in the xpath tag -- just search for "xpath default namespace" to find many good answers.