使用 PHP 检查并删除空标签

发布于 2024-10-08 12:58:44 字数 1047 浏览 0 评论 0原文

从字符串中删除空 html 标签的最快方法是什么?

我已经编写了类似的程序来检测空锚标记:

                        $temp = strip_tags($string, "<blockquote><a>");
                        $cmatch = array();
                        if(preg_match_all("~<a.*><\/a>~iU", $temp, $cmatch, PREG_SET_ORDER))
                        {
                            foreach($cmatch as $cm)
                            {
                                foreach($cm as $t) //echo htmlentities($t)."<br />";
                                $temp = trim(str_replace($t, '', $temp));
                            }
                        }

                        if(!empty($temp))
                        {
                            echo '<div class="c" style="margin-top:20px;">';
                            echo $temp;
                            echo '</div>';
                        }
                        //do not output if empty tags (problem with div margin)

必须可以更有效地做到这一点。将字符串转换为 html DOM 并在那里进行检查会更快吗?

What is the fastest way to remove empty html tags from a string?

I have programmed something like this to detect empty anchor tags:

                        $temp = strip_tags($string, "<blockquote><a>");
                        $cmatch = array();
                        if(preg_match_all("~<a.*><\/a>~iU", $temp, $cmatch, PREG_SET_ORDER))
                        {
                            foreach($cmatch as $cm)
                            {
                                foreach($cm as $t) //echo htmlentities($t)."<br />";
                                $temp = trim(str_replace($t, '', $temp));
                            }
                        }

                        if(!empty($temp))
                        {
                            echo '<div class="c" style="margin-top:20px;">';
                            echo $temp;
                            echo '</div>';
                        }
                        //do not output if empty tags (problem with div margin)

It must be possible to do this more efficiently. Would it be faster to convert the string to html DOM and do checking there?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

如此安好 2024-10-15 12:58:44

正则表达式不是解析 HTML 的正确工具.

作为一个非具体的答案,我强烈建议使用 DOM 解析库来完成此任务。举几个使正则表达式成为噩梦的陷阱:

  1. 您可能会捕获 标记,但您会捕获 标签吗? > 标签?
  2. 以下 p 标记是否为空?:

    如果是这样,您的代码会捕获它吗?如果没有,你需要在绳子上跑多少次才能有足够的信心接住所有的球?

  3. 您会发现未正确关闭的标签吗?
  4. 你会捕捉到重叠的标签吗?

Regular expressions are not the right tool for parsing HTML.

As a non-specific answer, I highly recommend using a DOM parsing library to accomplish this. To name a few gotchas that will make regular expressions a nightmare:

  1. You may catch <a></a> tags, but will you catch <a /> tags?
  2. Is the following p tag empty?: <p><a></a></p> If so, will your code catch it? If it doesn't, how many passes will you need to run on the string before you're confident enough to have caught them all?
  3. Will you catch tags which aren't properly closed?
  4. Will you catch tags which overlap?
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文