使用 PHP 的 DOM 实现返回第一个“n”; HTML 字符串的字符

发布于 2025-01-07 13:28:12 字数 396 浏览 0 评论 0原文

给定一个 HTML 字符串,我想返回具有以下属性的修改后的字符串:

  1. 文本内容的前 n 个字符(除了 HTML 标签)应保留。
  2. 满足 n 个字符之后的元素应完全删除。
  3. 如果 n 个字符不在元素末尾,则同一元素中后面的文本不应保留。
  4. 应保留 n 个字符处和之前的元素标签。

基本上,我只想返回 HTML 的缩短版本,而不中断 DOM 结构,并且仅基于文本内容的长度。

使用 PHP 的 DOM 实现,看起来这会过于复杂。使用模式匹配并不理想,因为修改后的字符串的条件可能会随着时间的推移而改变,并且每次都需要重写。

我是否缺少一种更简单的方法来做到这一点?提前致谢。

Given an HTML string, I would like to return a modified string with the following properties:

  1. The first n characters of the text contents (HTML tags aside) should remain.
  2. Elements after n characters have been met should be removed entirely.
  3. If n characters is not at the end of an element, text afterwards in the same element should not remain.
  4. Tags on elements at and before n characters should remain.

Basically, I just want to return a shortened version of the HTML, without the DOM structure being interrupted, and based on the length of the text contents only.

Using PHP's DOM implementation, it seems this will be overly complex. Using a pattern match isn't ideal as the conditions of the modified string might change over time, and it would require rewriting each time.

Am I missing an easier way of doing this? Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

内心旳酸楚 2025-01-14 13:28:12

“使用 PHP 的 DOM 实现,看起来这会过于复杂。”

真的吗?

如果您想要 内部的前 100 个字符,这里有一个非常简单的 DOM 实现 标签及其子节点。您可以进一步处理它以删除换行符和多余的空格/制表符,或者检查 foreach 内的 $content 字符串的长度,以打破循环并停止连接。已达到一定字符数。

$str = '...';
$dom = new DomDocument;
$dom->loadHTML($str);
$elements = $dom->getElementsByTagName('body');

$content = '';
foreach($elements as $node){
  foreach($node->childNodes as $child) {
    $content .= $child->nodeValue;
  }
}

echo substr($content, 0, 100);

更新

根据您的评论,这里有一种简单的方法来计算 HTML 节点内的字符数,并在达到指定的字符限制后删除所有标签。请注意,您无法在原始 foreach 中执行删除操作,因为它会导致 DOM 重新索引节点,并且您将无法获得预期的结果。相反,我们将要删除的节点存储在数组中,并在初始迭代后删除它们。

$str = '...';
$dom = new DomDocument;
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($str);

$elements = $dom->getElementsByTagName('body');

$remove   = FALSE;
$maxChars = 100;
$content  = '';
$delete   = array();

foreach($elements as $node){
  foreach($node->childNodes as $child) {
    if ($remove) {
      $delete[] = $child;
    } else {
      $content .= $child->nodeValue;
      if ( ! $remove && strlen($content) >= $maxChars) {
        $remove = TRUE;
      }
    }
  }
}

foreach ($delete as $child) {
  $child->parentNode->removeChild($child);
}

$dom->formatOutput = TRUE;
echo $dom->saveHTML();

"Using PHP's DOM implementation, it seems this will be overly complex."

Really?

Here's a very simple DOM implementation if you want the first 100 characters from inside the <body> tag and its child nodes. You could further massage this to remove newline characters and superfluous space/tab characters or check the length of the $content string inside the foreach to break the loop and stop concatenation once you've reached a certain number of characters.

$str = '...';
$dom = new DomDocument;
$dom->loadHTML($str);
$elements = $dom->getElementsByTagName('body');

$content = '';
foreach($elements as $node){
  foreach($node->childNodes as $child) {
    $content .= $child->nodeValue;
  }
}

echo substr($content, 0, 100);

UPDATE

As per your comment, here's a simple way to count the characters inside HTML nodes and delete all the tags after the specified character limit is reached. Note that you can't perform the delete operation inside the original foreach because it causes DOM to reindex the nodes and you won't get the results you expect. Instead, we store the nodes we want to delete in an array and delete them after the initial iteration.

$str = '...';
$dom = new DomDocument;
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($str);

$elements = $dom->getElementsByTagName('body');

$remove   = FALSE;
$maxChars = 100;
$content  = '';
$delete   = array();

foreach($elements as $node){
  foreach($node->childNodes as $child) {
    if ($remove) {
      $delete[] = $child;
    } else {
      $content .= $child->nodeValue;
      if ( ! $remove && strlen($content) >= $maxChars) {
        $remove = TRUE;
      }
    }
  }
}

foreach ($delete as $child) {
  $child->parentNode->removeChild($child);
}

$dom->formatOutput = TRUE;
echo $dom->saveHTML();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文