删除字符串中指定标签之外的所有内容 (PHP)

发布于 2024-10-05 02:50:34 字数 498 浏览 8 评论 0原文

问题已更新,以排除正则表达式作为可能的解决方案。

我正在尝试构建一个 php 函数,该函数允许我删除指定标签之外的所有内容,同时保留指定标签及其内容,但我不确定如何做到这一点...

例如:

$string = "lorem ipsum <div><p>Some video content</p><object></object></div><p>dolor sit</p> amet <img>"

some_function($string, "<div><img>");
returns: "<div><p>Some video content</p><object></object></div><img>"

感谢您的帮助!

Question has been updated to exclude regex as a possible solution.

I'm trying to build a php function which will allow me strip everything outside of specified tags while preserving the specified tags and their content and am not sure how to do this...

For example:

$string = "lorem ipsum <div><p>Some video content</p><object></object></div><p>dolor sit</p> amet <img>"

some_function($string, "<div><img>");
returns: "<div><p>Some video content</p><object></object></div><img>"

Thanks for any help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

羞稚 2024-10-12 02:50:34

好吧,所以我想我找到了一种基于我在上面发布的链接的explode_tags函数的修改版本来做到这一点的方法:

function explode_tags($chr, $str) { 
    for ($i=0, $j=0; $i < strlen($str); $i++) { 
        if ($str{$i} == $chr) { 
            while ($str{$i+1} == $chr) $i++; 
            $j++; 
            continue; 
        } 
        if ($str{$i} == "<") { 
            if (strlen($res[$j]) > 0) $j++;
            $s = strpos($str, " ", $i);
            $b = strpos($str, ">", $i);
            if($s<$b) $end = $s; 
            else $end = $b;
            $t = substr($str, $i+1, $end-$i-1);
            $tend = strpos($str, ">", $i);
            $tclose = strpos($str, "</".$t, $tend);
            if($tclose!==false) $pos = strpos($str, ">", $tclose);
            else $pos = strpos($str, ">", $i);
            $res[$j] .= substr($str, $i, $pos - $i+1); 
            $i += ($pos - $i); 
            $j++; 
            continue; 
        } 
        if ((($str{$i} == "\n") || ($str{$i} == "\r")) && (strlen($res[$j]) == 0)) continue; 
        $res[$j] .= $str{$i}; 
    } 
    return $res; 
}
function filter_tags($content, $tags) {
    $content = strip_tags($content, $tags);
    $tags = substr($tags, 1, -1);
    $d = strpos($tags, "><");
    if($d===false) $tags = array($tags);
    else $tags = explode("><", $tags);
    $content = explode_tags("", $content);
    $result="";
    foreach($content as $c) {
        $s = strpos($c, " ");
        $b = strpos($c, ">");
        if($s<$b) $end = $s;
        else $end = $b;
        $tag = substr($c, 1, $end-1);
        if(in_array($tag, $tags)) $result.=$c;
    }
    return $result;
}

filter_tags($content, "<img><div><object><embed><iframe><param><script>");

到目前为止,这似乎工作得很好,尽管我只在几个不同的内容上尝试过它。我不擅长这个,所以如果有人有建议,请自由分享......

感谢您的所有回答!

Ok, so I think I figured out a way to do this based on a modified version of the explode_tags function I posted a link to above:

function explode_tags($chr, $str) { 
    for ($i=0, $j=0; $i < strlen($str); $i++) { 
        if ($str{$i} == $chr) { 
            while ($str{$i+1} == $chr) $i++; 
            $j++; 
            continue; 
        } 
        if ($str{$i} == "<") { 
            if (strlen($res[$j]) > 0) $j++;
            $s = strpos($str, " ", $i);
            $b = strpos($str, ">", $i);
            if($s<$b) $end = $s; 
            else $end = $b;
            $t = substr($str, $i+1, $end-$i-1);
            $tend = strpos($str, ">", $i);
            $tclose = strpos($str, "</".$t, $tend);
            if($tclose!==false) $pos = strpos($str, ">", $tclose);
            else $pos = strpos($str, ">", $i);
            $res[$j] .= substr($str, $i, $pos - $i+1); 
            $i += ($pos - $i); 
            $j++; 
            continue; 
        } 
        if ((($str{$i} == "\n") || ($str{$i} == "\r")) && (strlen($res[$j]) == 0)) continue; 
        $res[$j] .= $str{$i}; 
    } 
    return $res; 
}
function filter_tags($content, $tags) {
    $content = strip_tags($content, $tags);
    $tags = substr($tags, 1, -1);
    $d = strpos($tags, "><");
    if($d===false) $tags = array($tags);
    else $tags = explode("><", $tags);
    $content = explode_tags("", $content);
    $result="";
    foreach($content as $c) {
        $s = strpos($c, " ");
        $b = strpos($c, ">");
        if($s<$b) $end = $s;
        else $end = $b;
        $tag = substr($c, 1, $end-1);
        if(in_array($tag, $tags)) $result.=$c;
    }
    return $result;
}

filter_tags($content, "<img><div><object><embed><iframe><param><script>");

This seems to work perfectly so far, although I have only tried it on a few different pieces of content. I'm not great at this, so if anybody has suggestions please share freely...

Thanks for all of your answers!

提笔落墨 2024-10-12 02:50:34

Jeff Atwood 有一篇非常精彩的博客文章,反对使用正则表达式来解析 HTML。 http://www.codinghorror .com/blog/2008/06/regular-expressions-now-you-have-two-problems.html

但是,在这种情况下,使用正则表达式首先删除无关的结尾可能不是一个坏主意然后使用 DOM 解析器从内部挑选出您想要的结构。

Jeff Atwood has a really great blog post arguing against using regex for parsing HTML. http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html

However, in this situation, it might not be a bad idea to use regex to first remove the extranious ends and then use a DOM parser to pick out the structures you want from the inside.

小伙你站住 2024-10-12 02:50:34

根据评论更新

您可以使用 css 选择器来获取您要查找的 div,然后爬行树以获取您选择的最外层元素。

请参阅 zend.dom.query 框架。
http://framework.zend.com/manual/en/zend.dom .query.html

基本上查询“div img”以立即在div标签内获取img标签。
然后爬上树直到到达目标位置,然后提取并保存该节点的outerHTML...

这可以在Javascript中工作,但我不知道php。

这里需要注意的是,您会失去上面示例的特殊性。即:包含四个图像的 div 将匹配所有子图像...您必须进行一些额外的处理,以确保您确实在做您认为正在做的事情。不过,这比盲目更换琴弦要安全一些。

update based on the comment

You could use css selectors to grab the divs you are looking for, then crawl up the tree to get the outermost element of your selection.

See the zend.dom.query framework.
http://framework.zend.com/manual/en/zend.dom.query.html

Basically query for "div img" to get the img tags immediately inside div tags.
Then crawl up the tree until you reach your target position, and extract and save that node's outerHTML....

This would work in Javascript, but I don't know about php.

The caveats here are that you lose the specificity of your example above. ie: a div containing four images would have matches for all child images... You'd have to do some extra processing to ensure you're really doing what you think you are doing. However, it's a bit safer than blind string replacement.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文