使用 simpleHTML 删除嵌套标签

发布于 2024-08-20 02:29:13 字数 1652 浏览 6 评论 0原文

我正在尝试使用 simple_html_dom 删除 HTML 片段中的所有范围,并且我使用以下内容:

$body = "<span class='outer' style='background:red'>x<span class='mid' style='background:purple'>y<span class='inner' style='background:orange'>z</span></span></span>";
$HTML = new simple_html_dom;
$HTML->load($body);   
$spans = $HTML->find('span');
foreach($spans as $span_tag) {
    echo "working on ". $span_tag->class . " ... ";
    echo "setting " . $span_tag->outertext . " equal to " . $span_tag->innertext . "<br/>\n";
    $span_tag->outertext = (string)$span_tag->innertext;
}
$text =  $HTML->save();
$HTML->clear();
unset($HTML);
echo "<br/>The Cleaned TEXT is: $text<br/>";

这是我的浏览器中的结果:

http://www.pixeloution.com/RAC/clean.gif

那么为什么我最终只删除了最外面的跨度呢?

编辑

实际上,如果有更简单的方法可以做到这一点,我很乐意。目的是删除标签,但保留其中的任何内容,包括其他标签,否则我只需使用 $obj->paintext

Edit #2

好吧......显然我得到了它的工作,尽管奇怪的是我仍然喜欢如果有人以前遇到过这个问题,就可以真正理解这个问题。我知道这只是删除了最外层的跨度,所以我这样做了:

function cleanSpansRecursive(&$body) {

    $HTML = new simple_html_dom;
    $HTML->load($body); 
    $spans = $HTML->find('span');
    foreach($spans as $span_tag) {
        $span_tag->outertext = (string)$span_tag->innertext;
    }

    $body =  (string)$HTML;
    if($HTML->find('span')) {
        $HTML->clear();
        unset($HTML);
        cleanSpansRecursive($body);
    } else {
        $HTML->clear();
        unset($HTML);
    }  
}

而且它似乎有效。

I'm trying to use simple_html_dom to remove all the spans from a snippet of HTML, and I'm using the following:

$body = "<span class='outer' style='background:red'>x<span class='mid' style='background:purple'>y<span class='inner' style='background:orange'>z</span></span></span>";
$HTML = new simple_html_dom;
$HTML->load($body);   
$spans = $HTML->find('span');
foreach($spans as $span_tag) {
    echo "working on ". $span_tag->class . " ... ";
    echo "setting " . $span_tag->outertext . " equal to " . $span_tag->innertext . "<br/>\n";
    $span_tag->outertext = (string)$span_tag->innertext;
}
$text =  $HTML->save();
$HTML->clear();
unset($HTML);
echo "<br/>The Cleaned TEXT is: $text<br/>";

And here's the result in my browser:

http://www.pixeloution.com/RAC/clean.gif

So why is it I'm only ending up with the outer most span removed?

Edit

Actually if there's an easier way to do this, I'm game. The object is to remove the tags but keep anything inside them including other tags, or else I'd just use $obj->paintext

Edit #2

Okay ... apparently I got it working, although oddly enough I'd still like to actually understand the problem if anyone ran into this before. Knowing it was only removing the outermost span, I did this:

function cleanSpansRecursive(&$body) {

    $HTML = new simple_html_dom;
    $HTML->load($body); 
    $spans = $HTML->find('span');
    foreach($spans as $span_tag) {
        $span_tag->outertext = (string)$span_tag->innertext;
    }

    $body =  (string)$HTML;
    if($HTML->find('span')) {
        $HTML->clear();
        unset($HTML);
        cleanSpansRecursive($body);
    } else {
        $HTML->clear();
        unset($HTML);
    }  
}

And it seems to work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无语# 2024-08-27 02:29:13

我的计算机或开发服务器上没有安装 simple_html_dom,因此我无法测试,但从外观上看,设置 $span_tag->outertext 将创建外部跨度内有新的 span 对象,因此旧的引用将不再存在于 $HTML 中。从最里面到外面应该可以修复它,因为引用将保持完整。

编辑:在第二次编辑中,每次进行替换时,您都会找到新创建的跨度,这就是它起作用的原因。

I don't have simple_html_dom installed on my machine or dev server so I can't test, but from the looks of it, setting $span_tag->outertext will create new span objects inside the outer span, so the old references will no longer exist in $HTML. Going from innermost to outer should fix it since the references would be kept intact.

EDIT: In your second edit, you are finding the newly-created spans every time you do a replacement, which is why it works.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文