PHP：仅删除前几个空的
标签

发布于 2024-10-06 22:32:53 字数 791 浏览 0 评论 0原文

我有一个定制开发的 CMS，用户可以在其中输入一些内容到富文本字段（ckeditor）中。

用户只需从另一个文档复制粘贴数据即可。有时，数据开头有空的

标记。以下是数据示例：

<p></p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>

我不想删除所有空

标签，只想删除实际数据之前的标签，即前 3 个

在本例中为 code> 标签。

我怎样才能做到这一点？

编辑：澄清一下，我需要一个 PHP 解决方案。 JavaScript 不行。

有没有一种方法可以将所有

标签收集到一个数组中，然后迭代并删除，直到遇到包含数据的标签？

原文

I have a custom developed CMS where users can enter some content into a rich text field (ckeditor).

Users simply copy-paste data from another document. Sometimes the data has empty  tags at the beginning. Here's a sample of the data:

<p></p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>

I don't want to remove all the empty  tags, only the ones before the actual data, the top 3  tags in this case.

How can I do that?

Edit: To clarify, I need a PHP solution. Javascript won't do.

Is there a way I can gather all  tags in an array, then iterate and delete until I encounter one with data?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笨死的猪 2024-10-13 22:32:53

请不要对不规则字符串使用正则表达式：它搅动沉睡的神。相反，使用 XPath：

function strip_opening_lines($html) {  
  $dom = new DOMDocument();
  $dom->preserveWhitespace = FALSE;
  $dom->loadHTML($html);

  $xpath = new DOMXPath($dom);
  $nodes = $xpath->query("//p");

  foreach ($nodes as $node) {
    // Remove non-significant whitespace.
    $trimmed_value = trim($node->nodeValue);

    // Check to see if the node is empty (i.e. <p></p>). 
    // If so, remove it from the stack.
    if (empty($trimmed_value)) {
      $node->parentNode->removeChild($node);
    }
    // If we found a non-empty node, we're done. Break out.
    else {
      break;
    }
  }
  $parsed_html = $dom->saveHTML();

  // DOMDocument::saveHTML adds a DOCTYPE, <html>, and <body> 
  // tags to the parsed HTML. Since this is regular data, 
  // we can use regular expressions.
  preg_match('#<body>(.*?)<\/body>#is', $parsed_html, $matches);

  return $matches[1];
}

所提供的所有正则表达式解决方案都不好的原因：

不会将空段落元素与属性匹配（例如
）不匹配实际上不为空的空段落元素（例如
）

Please, don't use regular expressions for irregular strings: it stirs the sleeping god. Instead, use XPath:

function strip_opening_lines($html) {  
  $dom = new DOMDocument();
  $dom->preserveWhitespace = FALSE;
  $dom->loadHTML($html);

  $xpath = new DOMXPath($dom);
  $nodes = $xpath->query("//p");

  foreach ($nodes as $node) {
    // Remove non-significant whitespace.
    $trimmed_value = trim($node->nodeValue);

    // Check to see if the node is empty (i.e. <p></p>). 
    // If so, remove it from the stack.
    if (empty($trimmed_value)) {
      $node->parentNode->removeChild($node);
    }
    // If we found a non-empty node, we're done. Break out.
    else {
      break;
    }
  }
  $parsed_html = $dom->saveHTML();

  // DOMDocument::saveHTML adds a DOCTYPE, <html>, and <body> 
  // tags to the parsed HTML. Since this is regular data, 
  // we can use regular expressions.
  preg_match('#<body>(.*?)<\/body>#is', $parsed_html, $matches);

  return $matches[1];
}

Reasons why all the regex solutions presented are bad:

Won't match empty paragraph elements with attributes (e.g. )
Won't match empty paragraph elements that are not literally empty (e.g.  )

回复收藏 0 原文

指尖上得阳光 2024-10-13 22:32:53

通常我建议不要使用正则表达式来解析 HTML，但这似乎是无害的：

$html = preg_replace('!^(<p></p>\s*)+!', '', $html);

Normally I would advise against using a regular expression to parse HTML, but this one seems harmless:

$html = preg_replace('!^(<p></p>\s*)+!', '', $html);

回复收藏 0 原文

暗喜 2024-10-13 22:32:53

使用

$html = preg_replace ("~^(<p><\/p>[\s\n]*)*~iUmx", "", $html);

Use

$html = preg_replace ("~^(<p><\/p>[\s\n]*)*~iUmx", "", $html);

回复收藏 0 原文

﹏半生如梦愿梦如真 2024-10-13 22:32:53

您可以在javascript中执行此操作，一旦执行粘贴操作，使用正则表达式去除不需要的标签，

您的代码将类似于，

document.getElementById("id of rich text field").onkeyup = stripData; 
document.getElementById("id of rich text field").onmouseup = stripData; 

function stripData(){
    document.getElementById("id of rich text field").value = document.getElementById("id of rich text field").value.replace(/\<p\>\<\/p\>/g,"");
}

删除初始空，

编辑：仅

 function stripData(){
        var dataStr = document.getElementById("id of rich text field").value 
        while(dataStr.match(/^\<p\>\<\/p\>/g)) {
           dataStr  = dataStr .replace(/^\<p\>\<\/p\>/g,"");
        }
        document.getElementById("id of rich text field").value = dataStr;
 }

You can do it in javascript, as soon as performs paste operation, strip off unwanted tags using regular expressions,

your code will be like,

document.getElementById("id of rich text field").onkeyup = stripData; 
document.getElementById("id of rich text field").onmouseup = stripData; 

function stripData(){
    document.getElementById("id of rich text field").value = document.getElementById("id of rich text field").value.replace(/\<p\>\<\/p\>/g,"");
}

Edit: To remove initial empty

only,

 function stripData(){
        var dataStr = document.getElementById("id of rich text field").value 
        while(dataStr.match(/^\<p\>\<\/p\>/g)) {
           dataStr  = dataStr .replace(/^\<p\>\<\/p\>/g,"");
        }
        document.getElementById("id of rich text field").value = dataStr;
 }

回复收藏 0 原文

~没有更多了~