php DOMDocument->getElementById->nodeValue 抓取html

发布于 2024-10-16 14:29:24 字数 1806 浏览 2 评论 0原文

我正在使用 php 的 DOMDocument->getElementById->nodeValue 来设置特定 DOM 元素的 HTML。问题是字符串被转换为 HTML 实体: 例如: nodeValue = html_entity_decode('test'); 应该输出 'test',但它输出 '<b>test</ b>'

有什么想法吗?即使我不使用 html_entity_decode 函数,也会发生这种情况

这是我更新的脚本......现在正在运行:

// Construct a DOM object for updating the affected node
$html = new DOMDocument("1.0", "utf-8"); 
if (!$html) return FALSE;

// Load the HTML file in question
$loaded = $html->loadHTMLFile($data['page_path']);
if (!$loaded)
{
    print 'Failed to load file';
    return FALSE;
}

// Establish the node being updated within the file
foreach ($data['divids'] as $divid)
{
    $element = $html->getElementById($divid);
    if (is_null($element))
    {
        print 'Failed to get existing element';
        return FALSE;
    }

    $newelement = $html->createElement('div');
    if (is_null($newelement))
    {
        print 'Failed to create new element';
        return FALSE;
    }
    $newelement->setAttribute('id', $divid);
    $newelement->setAttribute('class', 'reusable-block');

    // Perform the replacement
    $newelement->nodeValue = $replacement;
    $parent = $element->parentNode;
    $parent->replaceChild($newelement, $element);

    // Save the file back to its location
    $saved = $html->saveHTMLFile($data['page_path']);
    if (!$saved)
    {
        print 'Failed to save file';
        return FALSE;
    }
}

// Replace HTML entities left over
$content = files::readFile($data['page_path']);
$content = str_replace('&lt;', '<', $content);
$content = str_replace('&gt;', '>', $content);
if (!@fwrite($handle, $content))
{
    print 'Failed to replace entities';
    return FALSE;
}

I am using php's DOMDocument->getElementById->nodeValue to set a particular DOM element's HTML. The problem is that the string is converted to HTML entities:
eg:
nodeValue = html_entity_decode('<b>test</b>'); should output 'test' but instead it outputs '<b>test</b>'

Any ideas why? This happens even if i don't use the html_entity_decode function

Here is my updated script...which is NOW working:

// Construct a DOM object for updating the affected node
$html = new DOMDocument("1.0", "utf-8"); 
if (!$html) return FALSE;

// Load the HTML file in question
$loaded = $html->loadHTMLFile($data['page_path']);
if (!$loaded)
{
    print 'Failed to load file';
    return FALSE;
}

// Establish the node being updated within the file
foreach ($data['divids'] as $divid)
{
    $element = $html->getElementById($divid);
    if (is_null($element))
    {
        print 'Failed to get existing element';
        return FALSE;
    }

    $newelement = $html->createElement('div');
    if (is_null($newelement))
    {
        print 'Failed to create new element';
        return FALSE;
    }
    $newelement->setAttribute('id', $divid);
    $newelement->setAttribute('class', 'reusable-block');

    // Perform the replacement
    $newelement->nodeValue = $replacement;
    $parent = $element->parentNode;
    $parent->replaceChild($newelement, $element);

    // Save the file back to its location
    $saved = $html->saveHTMLFile($data['page_path']);
    if (!$saved)
    {
        print 'Failed to save file';
        return FALSE;
    }
}

// Replace HTML entities left over
$content = files::readFile($data['page_path']);
$content = str_replace('<', '<', $content);
$content = str_replace('>', '>', $content);
if (!@fwrite($handle, $content))
{
    print 'Failed to replace entities';
    return FALSE;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

自在安然 2024-10-23 14:29:24

这是正确的行为 - 您的标记正在转换为字符串,并且 XML 中的字符串不能包含尖括号(只有标记可以)。尝试将 HTML 转换为 DOMNode 并附加它:

$node = $mydoc->createElement("b");
$node->nodeValue = "test";
$mydoc->getElementById("whatever")->appendChild($node);

使用工作示例更新:

$html = '<html>
    <body id="myBody">
        <b id="myBTag">my old element</b>
    </body>
</html>';

$mydoc = new DOMDocument("1.0", "utf-8");
$mydoc->loadXML($html);

// need to do this to get getElementById() to work
$all_tags = $mydoc->documentElement->getElementsByTagName("*");
foreach ($all_tags as $element) {
    $element->setIdAttribute("id", true);
}

$current_b_tag = $mydoc->getElementById("myBTag");
$new_b_tag = $mydoc->createElement("b");
$new_b_tag->nodeValue = "my new element";
$result = $mydoc->getElementById("myBody");
$result->replaceChild($new_b_tag, $current_b_tag);

echo $mydoc->saveXML($mydoc->documentElement);

This is proper behavior - your tag is being converted to a string, and strings in XML can't contain angle brackets (only tags can). Try converting the HTML into a DOMNode and appending it instead:

$node = $mydoc->createElement("b");
$node->nodeValue = "test";
$mydoc->getElementById("whatever")->appendChild($node);

Update with working example:

$html = '<html>
    <body id="myBody">
        <b id="myBTag">my old element</b>
    </body>
</html>';

$mydoc = new DOMDocument("1.0", "utf-8");
$mydoc->loadXML($html);

// need to do this to get getElementById() to work
$all_tags = $mydoc->documentElement->getElementsByTagName("*");
foreach ($all_tags as $element) {
    $element->setIdAttribute("id", true);
}

$current_b_tag = $mydoc->getElementById("myBTag");
$new_b_tag = $mydoc->createElement("b");
$new_b_tag->nodeValue = "my new element";
$result = $mydoc->getElementById("myBody");
$result->replaceChild($new_b_tag, $current_b_tag);

echo $mydoc->saveXML($mydoc->documentElement);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文