php DOMDocument->getElementById->nodeValue 抓取html
我正在使用 php 的 DOMDocument->getElementById->nodeValue
来设置特定 DOM 元素的 HTML。问题是字符串被转换为 HTML 实体: 例如: nodeValue = html_entity_decode('test');
应该输出 'test',但它输出 '<b>test</ b>
'
有什么想法吗?即使我不使用 html_entity_decode 函数,也会发生这种情况
这是我更新的脚本......现在正在运行:
// Construct a DOM object for updating the affected node
$html = new DOMDocument("1.0", "utf-8");
if (!$html) return FALSE;
// Load the HTML file in question
$loaded = $html->loadHTMLFile($data['page_path']);
if (!$loaded)
{
print 'Failed to load file';
return FALSE;
}
// Establish the node being updated within the file
foreach ($data['divids'] as $divid)
{
$element = $html->getElementById($divid);
if (is_null($element))
{
print 'Failed to get existing element';
return FALSE;
}
$newelement = $html->createElement('div');
if (is_null($newelement))
{
print 'Failed to create new element';
return FALSE;
}
$newelement->setAttribute('id', $divid);
$newelement->setAttribute('class', 'reusable-block');
// Perform the replacement
$newelement->nodeValue = $replacement;
$parent = $element->parentNode;
$parent->replaceChild($newelement, $element);
// Save the file back to its location
$saved = $html->saveHTMLFile($data['page_path']);
if (!$saved)
{
print 'Failed to save file';
return FALSE;
}
}
// Replace HTML entities left over
$content = files::readFile($data['page_path']);
$content = str_replace('<', '<', $content);
$content = str_replace('>', '>', $content);
if (!@fwrite($handle, $content))
{
print 'Failed to replace entities';
return FALSE;
}
I am using php's DOMDocument->getElementById->nodeValue
to set a particular DOM element's HTML. The problem is that the string is converted to HTML entities:
eg:nodeValue = html_entity_decode('<b>test</b>');
should output 'test' but instead it outputs '<b>test</b>
'
Any ideas why? This happens even if i don't use the html_entity_decode function
Here is my updated script...which is NOW working:
// Construct a DOM object for updating the affected node
$html = new DOMDocument("1.0", "utf-8");
if (!$html) return FALSE;
// Load the HTML file in question
$loaded = $html->loadHTMLFile($data['page_path']);
if (!$loaded)
{
print 'Failed to load file';
return FALSE;
}
// Establish the node being updated within the file
foreach ($data['divids'] as $divid)
{
$element = $html->getElementById($divid);
if (is_null($element))
{
print 'Failed to get existing element';
return FALSE;
}
$newelement = $html->createElement('div');
if (is_null($newelement))
{
print 'Failed to create new element';
return FALSE;
}
$newelement->setAttribute('id', $divid);
$newelement->setAttribute('class', 'reusable-block');
// Perform the replacement
$newelement->nodeValue = $replacement;
$parent = $element->parentNode;
$parent->replaceChild($newelement, $element);
// Save the file back to its location
$saved = $html->saveHTMLFile($data['page_path']);
if (!$saved)
{
print 'Failed to save file';
return FALSE;
}
}
// Replace HTML entities left over
$content = files::readFile($data['page_path']);
$content = str_replace('<', '<', $content);
$content = str_replace('>', '>', $content);
if (!@fwrite($handle, $content))
{
print 'Failed to replace entities';
return FALSE;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是正确的行为 - 您的标记正在转换为字符串,并且 XML 中的字符串不能包含尖括号(只有标记可以)。尝试将 HTML 转换为 DOMNode 并附加它:
使用工作示例更新:
This is proper behavior - your tag is being converted to a string, and strings in XML can't contain angle brackets (only tags can). Try converting the HTML into a DOMNode and appending it instead:
Update with working example: