剥离 HTML 标签及其内容

发布于 2024-08-06 09:00:36 字数 707 浏览 6 评论 0原文

我正在使用 DOM 来解析字符串。我需要剥离跨度标签及其内容的功能。例如,如果我有:

This is some text that contains photo.
<span class='title'> photobyile</span>

我希望函数返回

This is some text that contains photo.

这是我尝试过的:

    $dom = new domDocument;
    $dom->loadHTML($string);
    $dom->preserveWhiteSpace = false;
    $spans = $dom->getElementsByTagName('span');

    foreach($spans as $span)
    {
        $naslov = $span->nodeValue; 
        echo $naslov;

        $string = preg_replace("/$naslov/", " ", $string);
    }

我知道 $span->nodeValue 返回 span 标记的值而不是整个标记,但我不知道不知道如何获取整个标签以及类名。

谢谢, 岛

I'm using DOM to parse string. I need function that strips span tags and its contents. For example, if I have:

This is some text that contains photo.
<span class='title'> photobyile</span>

I would like function to return

This is some text that contains photo.

This is what I tried:

    $dom = new domDocument;
    $dom->loadHTML($string);
    $dom->preserveWhiteSpace = false;
    $spans = $dom->getElementsByTagName('span');

    foreach($spans as $span)
    {
        $naslov = $span->nodeValue; 
        echo $naslov;

        $string = preg_replace("/$naslov/", " ", $string);
    }

I'm aware that $span->nodeValue returns value of span tag and not whole tag, but I don't know how to get whole tag, together with class name.

Thanks,
Ile

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

唐婉 2024-08-13 09:00:36

尝试直接从 DOM 树中删除跨度。

$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;

$elements = $dom->getElementsByTagName('span');
while($span = $elements->item(0)) {       
   $span->parentNode->removeChild($span);
}

echo $dom->saveHTML();

Try removing the spans directly from the DOM tree.

$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;

$elements = $dom->getElementsByTagName('span');
while($span = $elements->item(0)) {       
   $span->parentNode->removeChild($span);
}

echo $dom->saveHTML();
暖风昔人 2024-08-13 09:00:36

@ile - 我遇到了这个问题 - 这是因为 foreach 迭代器的索引愉快地不断增加,而在 DOM 上调用removeChild()似乎也从 DomNodeList ($spans) 中删除节点。因此,对于您删除的每个跨度,节点列表都会缩小一个元素,然后将其 foreach 计数器加一。最终结果:它跳过一个跨度。

我确信有一种更优雅的方法,但这就是我的做法 - 我将引用从 DomNodeList 移动到第二个数组,在该数组中它们不会被 removeChild() 操作删除。

    foreach($spans as $span) {
        $nodes[] = $span;
    }
    foreach($nodes as $span) {
        $span->parentNode->removeChild($span);
    }

@ile - I've had that problem - it's because the index of the foreach iterator happily keeps incrementing, while calling removeChild() on the DOM also seems to remove the nodes from the DomNodeList ($spans). So for every span you remove, the nodelist shrinks one element and then gets its foreach counter incremented by one. Net result: it skips one span.

I'm sure there is a more elegant way, but this is how I did it - I moved the references from the DomNodeList to a second array, where they would not be removed by the removeChild() operation.

    foreach($spans as $span) {
        $nodes[] = $span;
    }
    foreach($nodes as $span) {
        $span->parentNode->removeChild($span);
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文