PHP 用 DOM 解析(无结果)

发布于 2024-10-15 03:39:52 字数 664 浏览 8 评论 0原文

我正在尝试检索位于此 span 类属性中的正文文本。

<span id="" style="color:#525B64;">The quick brown fox jumped over the lazy dog.</span>

我在我的网络服务器上测试了它,没有收到任何错误,但页面是空白的。我对此很陌生,所以我不知道从这里该去哪里。

这是我的代码。

<?php
// Load remote file, supress parse errors
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://somewebpage.com');
libxml_clear_errors();

// use XPath to find all nodes with a class attribute of header
$xp = new DOMXpath($dom);
$nodes = $xp->query('//span[@class="msgBody"]');

// output first item's content
echo $nodes->item(0)->nodeValue;
?>

I am trying to retreive the body text located in this span class attribute.

<span id="" style="color:#525B64;">The quick brown fox jumped over the lazy dog.</span>

I tested it on my web server and I get no errors but the page is blank. I'm very new to this so I do not know where to go from here.

Here is my code.

<?php
// Load remote file, supress parse errors
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://somewebpage.com');
libxml_clear_errors();

// use XPath to find all nodes with a class attribute of header
$xp = new DOMXpath($dom);
$nodes = $xp->query('//span[@class="msgBody"]');

// output first item's content
echo $nodes->item(0)->nodeValue;
?>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

情魔剑神 2024-10-22 03:39:52

这段代码中一切看起来都很好。

我想做的是:

  • 删除抑制解析错误的行。
  • 使用 file_get_contents 加载远程文件,看看它是否
  • 使用 //* 之类的 XPath 正确加载查询文档,并循环遍历生成的 DOMNodeList(使用 foreach)查看树是否正确构建。

顺便提一句。为了抑制 ->loadHTMLFile() 方法报告的解析错误,我使用 @ 运算符。

Everything seems fine in this code.

What I'd try to do is:

  • remove the line which supresses parse errors.
  • load the remote file with file_get_contents to see if it loads properly
  • query document with XPath like //* and loop over resulting DOMNodeList (with foreach) to see if the tree is built correctly.

Btw. to surpress parse errors reported by ->loadHTMLFile() method I use @ operator.

茶色山野 2024-10-22 03:39:52

DOM 为所有内容创建节点:属性、文本、注释、元素,凡是你能想到的东西。因此,您并不是在追求跨度节点的值,即使看起来是这样,您实际上想要获取跨度内的 TextNode 并获取其值。尝试类似的操作:

echo $nodes->item(0)->childNodes->item(0)->nodeValue

您也可以直接从 xpath 查询中获取此内容:(

$nodes = $xp->query('//span[@class="msgBody"]/text()');

尽管我个人在 xpath 方面从未有过太多运气。)

The DOM creates nodes for everthing: attributes, text, comments, elements, you name it. So you're not after the value of the span node even though it might seem that way, you actually want to get the TextNode inside of the span and get its value instead. Try something like:

echo $nodes->item(0)->childNodes->item(0)->nodeValue

You can also get this directly from the xpath query:

$nodes = $xp->query('//span[@class="msgBody"]/text()');

(Though I've never had much luck with xpath, personally.)

王权女流氓 2024-10-22 03:39:52

您确定您正在解析的文档中只有一个包含此类的 span 元素吗?

也许 ->item(0) 返回空元素并且所需元素是列表中的下一个元素?

Are you sure there is only one span element with this class in the document you are parsing?

Maybe ->item(0) returns empty element and the desired element is next on the list?

薔薇婲 2024-10-22 03:39:52

这种行为通常是由于默认命名空间(检查是否有类似的内容:xmlhs="http://www.w3.org/1999/xhtml"< /代码>)。

在 XPath 表达式中使用默认命名空间中的元素名称是 xpath 标记中最常见的常见问题 - 只需搜索“xpath 默认命名空间”即可找到许多好的答案。

Very often such behavior is due to a default namespace (check to see if there is something similar to this: xmlhs="http://www.w3.org/1999/xhtml").

Using in XPath expressions element names that are in default namespace, is the most FAQ in the xpath tag -- just search for "xpath default namespace" to find many good answers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文