如何防止 DOMXPath 扩展 HTML 实体?
我在 PHP 中使用 DOMDocument 和 DOMXPath 来查找 HTML 文档中的元素。 该文档包含 HTML 实体,例如 我希望将这些实体保留在 XPath 输出中。
$doc = new DOMDocument();
$doc->loadHTML('<html><head></head><body> Test</body></html>');
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//body');
foreach($nodes as $node) {
echo $node->textContent;
}
此代码产生以下输出(UTF-8):
[space]Test
但我想要这样:
Test
也许它与 PHP 内部使用的 LibXML 有关,但我找不到任何保留 HTML 实体的函数。
你有主意吗?
I'm using DOMDocument and DOMXPath in PHP to find elements in an HTML document.
This document contains HTML entities like   ; and I would like these entities to be preserved in the XPath output.
$doc = new DOMDocument();
$doc->loadHTML('<html><head></head><body> Test</body></html>');
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//body');
foreach($nodes as $node) {
echo $node->textContent;
}
This code produces the following output (UTF-8):
[space]Test
But I would like to have this:
Test
Maybe it has something to do with LibXML that PHP uses internally, but I couldn't find any function that preserves the HTML entities.
Do you have an idea?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
XPath 总是看到 XML 文档的表示形式,其中实体引用已被扩展。防止这种情况的唯一方法是预处理 XML 文档,用不会扩展的内容替换实体引用,例如将
更改为
§nbsp;
。XPath always sees a representation of the XML document in which entity references have been expanded. The only way to prevent this is to preprocess the XML document, replacing the entity references by something that won't be expanded, for example changing
to
§nbsp;
.XPath 处理器不知道非换行空格字符是指定为
还是
' -- 角色始终作为角色实体提供给它 --
`。An XPath processor isn't aware if a non-braking space character was specified as
or as
' -- the character is always provided to it as a character entity --
`.