当前位置：文江博客话题详情

使用 PHP substr() 和 strip_tags()，同时保留格式且不破坏 HTML

发布于 2024-08-24 14:36:15 字数 2199 浏览 13 评论 0 原文

我有各种 HTML 字符串可以剪切到 100 个字符（被剥离的内容，而不是原始内容），而无需剥离标签，也不会破坏 HTML。

原始 HTML 字符串（288 个字符）：

$content = "<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div over <div class='nestedDivClass'>there</div>
</div> and a lot of other nested <strong><em>texts</em> and tags in the air
<span>everywhere</span>, it's a HTML taggy kind of day.</strong></div>";

标准修剪： 修剪为 100 个字符和 HTML 中断，剥离的内容约为 40 个字符：

$content = substr($content, 0, 100)."..."; /* output:
<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div ove... */

剥离的 HTML： > 输出正确的字符数，但明显丢失格式：

$content = substr(strip_tags($content)), 0, 100)."..."; /* output:
With a span over here and a nested div over there and a lot of other nested
texts and tags in the ai... */

部分解决方案：使用 HTML Tidy 或净化器关闭标签输出干净的 HTML，但 100 个 HTML 字符未显示内容。

$content = substr($content, 0, 100)."...";
$tidy = new tidy; $tidy->parseString($content); $tidy->cleanRepair(); /* output:
<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div ove</div></div>... */

挑战：输出干净的 HTML 和 n 个字符（不包括 HTML 元素的字符数）：

$content = cutHTML($content, 100); /* output:
<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div over <div class='nestedDivClass'>there</div>
</div> and a lot of other nested <strong><em>texts</em> and tags in the
ai</strong></div>...";

类似问题

原文

I have various HTML strings to cut to 100 characters (of the stripped content, not the original) without stripping tags and without breaking HTML.

Original HTML string (288 characters):

$content = "<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div over <div class='nestedDivClass'>there</div>
</div> and a lot of other nested <strong><em>texts</em> and tags in the air
<span>everywhere</span>, it's a HTML taggy kind of day.</strong></div>";

Standard trim: Trim to 100 characters and HTML breaks, stripped content comes to ~40 characters:

$content = substr($content, 0, 100)."..."; /* output:
<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div ove... */

Stripped HTML: Outputs correct character count but obviously looses formatting:

$content = substr(strip_tags($content)), 0, 100)."..."; /* output:
With a span over here and a nested div over there and a lot of other nested
texts and tags in the ai... */

Partial solution: using HTML Tidy or purifier to close off tags outputs clean HTML but 100 characters of HTML not displayed content.

$content = substr($content, 0, 100)."...";
$tidy = new tidy; $tidy->parseString($content); $tidy->cleanRepair(); /* output:
<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div ove</div></div>... */

Challenge: To output clean HTML and n characters (excluding character count of HTML elements):

$content = cutHTML($content, 100); /* output:
<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div over <div class='nestedDivClass'>there</div>
</div> and a lot of other nested <strong><em>texts</em> and tags in the
ai</strong></div>...";

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

牵你手 2024-08-31 14:36:15

不神奇，但有效。

function html_cut($text, $max_length)
{
    $tags   = array();
    $result = "";

    $is_open   = false;
    $grab_open = false;
    $is_close  = false;
    $in_double_quotes = false;
    $in_single_quotes = false;
    $tag = "";

    $i = 0;
    $stripped = 0;

    $stripped_text = strip_tags($text);

    while ($i < strlen($text) && $stripped < strlen($stripped_text) && $stripped < $max_length)
    {
        $symbol  = $text{$i};
        $result .= $symbol;

        switch ($symbol)
        {
           case '<':
                $is_open   = true;
                $grab_open = true;
                break;

           case '"':
               if ($in_double_quotes)
                   $in_double_quotes = false;
               else
                   $in_double_quotes = true;

            break;

            case "'":
              if ($in_single_quotes)
                  $in_single_quotes = false;
              else
                  $in_single_quotes = true;

            break;

            case '/':
                if ($is_open && !$in_double_quotes && !$in_single_quotes)
                {
                    $is_close  = true;
                    $is_open   = false;
                    $grab_open = false;
                }

                break;

            case ' ':
                if ($is_open)
                    $grab_open = false;
                else
                    $stripped++;

                break;

            case '>':
                if ($is_open)
                {
                    $is_open   = false;
                    $grab_open = false;
                    array_push($tags, $tag);
                    $tag = "";
                }
                else if ($is_close)
                {
                    $is_close = false;
                    array_pop($tags);
                    $tag = "";
                }

                break;

            default:
                if ($grab_open || $is_close)
                    $tag .= $symbol;

                if (!$is_open && !$is_close)
                    $stripped++;
        }

        $i++;
    }

    while ($tags)
        $result .= "</".array_pop($tags).">";

    return $result;
}

使用示例：

$content = html_cut($content, 100);

Not amazing, but works.

function html_cut($text, $max_length)
{
    $tags   = array();
    $result = "";

    $is_open   = false;
    $grab_open = false;
    $is_close  = false;
    $in_double_quotes = false;
    $in_single_quotes = false;
    $tag = "";

    $i = 0;
    $stripped = 0;

    $stripped_text = strip_tags($text);

    while ($i < strlen($text) && $stripped < strlen($stripped_text) && $stripped < $max_length)
    {
        $symbol  = $text{$i};
        $result .= $symbol;

        switch ($symbol)
        {
           case '<':
                $is_open   = true;
                $grab_open = true;
                break;

           case '"':
               if ($in_double_quotes)
                   $in_double_quotes = false;
               else
                   $in_double_quotes = true;

            break;

            case "'":
              if ($in_single_quotes)
                  $in_single_quotes = false;
              else
                  $in_single_quotes = true;

            break;

            case '/':
                if ($is_open && !$in_double_quotes && !$in_single_quotes)
                {
                    $is_close  = true;
                    $is_open   = false;
                    $grab_open = false;
                }

                break;

            case ' ':
                if ($is_open)
                    $grab_open = false;
                else
                    $stripped++;

                break;

            case '>':
                if ($is_open)
                {
                    $is_open   = false;
                    $grab_open = false;
                    array_push($tags, $tag);
                    $tag = "";
                }
                else if ($is_close)
                {
                    $is_close = false;
                    array_pop($tags);
                    $tag = "";
                }

                break;

            default:
                if ($grab_open || $is_close)
                    $tag .= $symbol;

                if (!$is_open && !$is_close)
                    $stripped++;
        }

        $i++;
    }

    while ($tags)
        $result .= "</".array_pop($tags).">";

    return $result;
}

Usage example:

$content = html_cut($content, 100);

回复收藏 0 原文

感情洁癖 2024-08-31 14:36:15

我并没有声称发明了这个，但是有一个非常完整的 Text::truncate() 方法可以完成您想要的操作：

function truncate($text, $length = 100, $ending = '...', $exact = true, $considerHtml = false) {
    if (is_array($ending)) {
        extract($ending);
    }
    if ($considerHtml) {
        if (mb_strlen(preg_replace('/<.*?>/', '', $text)) <= $length) {
            return $text;
        }
        $totalLength = mb_strlen($ending);
        $openTags = array();
        $truncate = '';
        preg_match_all('/(<\/?([\w+]+)[^>]*>)?([^<>]*)/', $text, $tags, PREG_SET_ORDER);
        foreach ($tags as $tag) {
            if (!preg_match('/img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param/s', $tag[2])) {
                if (preg_match('/<[\w]+[^>]*>/s', $tag[0])) {
                    array_unshift($openTags, $tag[2]);
                } else if (preg_match('/<\/([\w]+)[^>]*>/s', $tag[0], $closeTag)) {
                    $pos = array_search($closeTag[1], $openTags);
                    if ($pos !== false) {
                        array_splice($openTags, $pos, 1);
                    }
                }
            }
            $truncate .= $tag[1];

            $contentLength = mb_strlen(preg_replace('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', ' ', $tag[3]));
            if ($contentLength + $totalLength > $length) {
                $left = $length - $totalLength;
                $entitiesLength = 0;
                if (preg_match_all('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', $tag[3], $entities, PREG_OFFSET_CAPTURE)) {
                    foreach ($entities[0] as $entity) {
                        if ($entity[1] + 1 - $entitiesLength <= $left) {
                            $left--;
                            $entitiesLength += mb_strlen($entity[0]);
                        } else {
                            break;
                        }
                    }
                }

                $truncate .= mb_substr($tag[3], 0 , $left + $entitiesLength);
                break;
            } else {
                $truncate .= $tag[3];
                $totalLength += $contentLength;
            }
            if ($totalLength >= $length) {
                break;
            }
        }

    } else {
        if (mb_strlen($text) <= $length) {
            return $text;
        } else {
            $truncate = mb_substr($text, 0, $length - strlen($ending));
        }
    }
    if (!$exact) {
        $spacepos = mb_strrpos($truncate, ' ');
        if (isset($spacepos)) {
            if ($considerHtml) {
                $bits = mb_substr($truncate, $spacepos);
                preg_match_all('/<\/([a-z]+)>/', $bits, $droppedTags, PREG_SET_ORDER);
                if (!empty($droppedTags)) {
                    foreach ($droppedTags as $closingTag) {
                        if (!in_array($closingTag[1], $openTags)) {
                            array_unshift($openTags, $closingTag[1]);
                        }
                    }
                }
            }
            $truncate = mb_substr($truncate, 0, $spacepos);
        }
    }

    $truncate .= $ending;

    if ($considerHtml) {
        foreach ($openTags as $tag) {
            $truncate .= '</'.$tag.'>';
        }
    }

    return $truncate;
}

I'm not claiming to have invented this, but there is a very complete Text::truncate() method in CakePHP which does what you want:

function truncate($text, $length = 100, $ending = '...', $exact = true, $considerHtml = false) {
    if (is_array($ending)) {
        extract($ending);
    }
    if ($considerHtml) {
        if (mb_strlen(preg_replace('/<.*?>/', '', $text)) <= $length) {
            return $text;
        }
        $totalLength = mb_strlen($ending);
        $openTags = array();
        $truncate = '';
        preg_match_all('/(<\/?([\w+]+)[^>]*>)?([^<>]*)/', $text, $tags, PREG_SET_ORDER);
        foreach ($tags as $tag) {
            if (!preg_match('/img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param/s', $tag[2])) {
                if (preg_match('/<[\w]+[^>]*>/s', $tag[0])) {
                    array_unshift($openTags, $tag[2]);
                } else if (preg_match('/<\/([\w]+)[^>]*>/s', $tag[0], $closeTag)) {
                    $pos = array_search($closeTag[1], $openTags);
                    if ($pos !== false) {
                        array_splice($openTags, $pos, 1);
                    }
                }
            }
            $truncate .= $tag[1];

            $contentLength = mb_strlen(preg_replace('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', ' ', $tag[3]));
            if ($contentLength + $totalLength > $length) {
                $left = $length - $totalLength;
                $entitiesLength = 0;
                if (preg_match_all('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', $tag[3], $entities, PREG_OFFSET_CAPTURE)) {
                    foreach ($entities[0] as $entity) {
                        if ($entity[1] + 1 - $entitiesLength <= $left) {
                            $left--;
                            $entitiesLength += mb_strlen($entity[0]);
                        } else {
                            break;
                        }
                    }
                }

                $truncate .= mb_substr($tag[3], 0 , $left + $entitiesLength);
                break;
            } else {
                $truncate .= $tag[3];
                $totalLength += $contentLength;
            }
            if ($totalLength >= $length) {
                break;
            }
        }

    } else {
        if (mb_strlen($text) <= $length) {
            return $text;
        } else {
            $truncate = mb_substr($text, 0, $length - strlen($ending));
        }
    }
    if (!$exact) {
        $spacepos = mb_strrpos($truncate, ' ');
        if (isset($spacepos)) {
            if ($considerHtml) {
                $bits = mb_substr($truncate, $spacepos);
                preg_match_all('/<\/([a-z]+)>/', $bits, $droppedTags, PREG_SET_ORDER);
                if (!empty($droppedTags)) {
                    foreach ($droppedTags as $closingTag) {
                        if (!in_array($closingTag[1], $openTags)) {
                            array_unshift($openTags, $closingTag[1]);
                        }
                    }
                }
            }
            $truncate = mb_substr($truncate, 0, $spacepos);
        }
    }

    $truncate .= $ending;

    if ($considerHtml) {
        foreach ($openTags as $tag) {
            $truncate .= '</'.$tag.'>';
        }
    }

    return $truncate;
}

回复收藏 0 原文

不必你懂 2024-08-31 14:36:15

使用 PHP 的 DOMDocument 类规范化 HTML 片段：

$dom= new DOMDocument();
$dom->loadHTML('<div><p>Hello World');      
$xpath = new DOMXPath($dom);
$body = $xpath->query('/html/body');
echo($dom->saveXml($body->item(0)));

这个问题类似于之前的问题，我在这里复制并粘贴了一个解决方案。如果 HTML 是由用户提交的，您还需要过滤掉潜在的 Javascript 攻击媒介，例如 onmouseover="do_something_evil()" 或 HTML Purifier 这样的工具旨在捕获和解决这些问题，并且比我所使用的任何代码都要全面得多。可以发帖。

Use PHP's DOMDocument class to normalize an HTML fragment:

$dom= new DOMDocument();
$dom->loadHTML('<div><p>Hello World');      
$xpath = new DOMXPath($dom);
$body = $xpath->query('/html/body');
echo($dom->saveXml($body->item(0)));

This question is similar to an earlier question and I've copied and pasted one solution here. If the HTML is submitted by users you'll also need to filter out potential Javascript attack vectors like onmouseover="do_something_evil()" or <a href="javascript:more_evil();">...</a>. Tools like HTML Purifier were designed to catch and solve these problems and are far more comprehensive than any code that I could post.

回复收藏 0 原文

唐婉 2024-08-31 14:36:15

使用 HTML 解析器并在文本的 100 个字符后停止。

回复收藏 0 原文

好菇凉咱不稀罕他 2024-08-31 14:36:15

我做了另一个函数来做到这一点，它支持 UTF-8：

/**
 * Limit string without break html tags.
 * Supports UTF8
 * 
 * @param string $value
 * @param int $limit Default 100
 */
function str_limit_html($value, $limit = 100)
{

    if (mb_strwidth($value, 'UTF-8') <= $limit) {
        return $value;
    }

    // Strip text with HTML tags, sum html len tags too.
    // Is there another way to do it?
    do {
        $len          = mb_strwidth($value, 'UTF-8');
        $len_stripped = mb_strwidth(strip_tags($value), 'UTF-8');
        $len_tags     = $len - $len_stripped;

        $value = mb_strimwidth($value, 0, $limit + $len_tags, '', 'UTF-8');
    } while ($len_stripped > $limit);

    // Load as HTML ignoring errors
    $dom = new DOMDocument();
    @$dom->loadHTML('<?xml encoding="utf-8" ?>'.$value, LIBXML_HTML_NODEFDTD);

    // Fix the html errors
    $value = $dom->saveHtml($dom->getElementsByTagName('body')->item(0));

    // Remove body tag
    $value = mb_strimwidth($value, 6, mb_strwidth($value, 'UTF-8') - 13, '', 'UTF-8'); // <body> and </body>
    // Remove empty tags
    return preg_replace('/<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*\/?>\s*<\/\1\s*>/', '', $value);
}

查看演示。

我建议在函数开始处使用 html_entity_decode ，这样它会保留 UTF-8 字符：

 $value = html_entity_decode($value);

I made another function to do it, it supports UTF-8:

/**
 * Limit string without break html tags.
 * Supports UTF8
 * 
 * @param string $value
 * @param int $limit Default 100
 */
function str_limit_html($value, $limit = 100)
{

    if (mb_strwidth($value, 'UTF-8') <= $limit) {
        return $value;
    }

    // Strip text with HTML tags, sum html len tags too.
    // Is there another way to do it?
    do {
        $len          = mb_strwidth($value, 'UTF-8');
        $len_stripped = mb_strwidth(strip_tags($value), 'UTF-8');
        $len_tags     = $len - $len_stripped;

        $value = mb_strimwidth($value, 0, $limit + $len_tags, '', 'UTF-8');
    } while ($len_stripped > $limit);

    // Load as HTML ignoring errors
    $dom = new DOMDocument();
    @$dom->loadHTML('<?xml encoding="utf-8" ?>'.$value, LIBXML_HTML_NODEFDTD);

    // Fix the html errors
    $value = $dom->saveHtml($dom->getElementsByTagName('body')->item(0));

    // Remove body tag
    $value = mb_strimwidth($value, 6, mb_strwidth($value, 'UTF-8') - 13, '', 'UTF-8'); // <body> and </body>
    // Remove empty tags
    return preg_replace('/<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*\/?>\s*<\/\1\s*>/', '', $value);
}

SEE DEMO.

I recommend use html_entity_decode at the start of function, so it preserves the UTF-8 characters:

 $value = html_entity_decode($value);

回复收藏 0 原文

§普罗旺斯的薰衣草 2024-08-31 14:36:15

您应该使用 Tidy HTML。您剪断字符串，然后运行 Tidy 来关闭标签。

（应得积分的积分）

回复收藏 0 原文

情魔剑神 2024-08-31 14:36:15

无论您在开始时陈述的 100 个计数问题如何，您都在挑战中指出以下内容：

输出的字符数
strip_tags（字符数
在实际显示的文本中
HTML）
保留 HTML 格式关闭
任何未完成的 HTML 标签

这是我的建议：
基本上，我一边分析一边计算每个字符。我确保不计算任何 HTML 标记中的任何字符。我还会在最后检查一下，以确保当我停下来时我没有在说一个词的中间。一旦我停下来，我就会回到第一个可用的空间或>作为停止点。

$position = 0;
$length = strlen($content)-1;

// process the content putting each 100 character section into an array
while($position < $length)
{
    $next_position = get_position($content, $position, 100);
    $data[] = substr($content, $position, $next_position);
    $position = $next_position;
}

// show the array
print_r($data);

function get_position($content, $position, $chars = 100)
{
    $count = 0;
    // count to 100 characters skipping over all of the HTML
    while($count <> $chars){
        $char = substr($content, $position, 1); 
        if($char == '<'){
            do{
                $position++;
                $char = substr($content, $position, 1);
            } while($char !== '>');
            $position++;
            $char = substr($content, $position, 1);
        }
        $count++;
        $position++;
    }
echo $count."\n";
    // find out where there is a logical break before 100 characters
    $data = substr($content, 0, $position);

    $space = strrpos($data, " ");
    $tag = strrpos($data, ">");

    // return the position of the logical break
    if($space > $tag)
    {
        return $space;
    } else {
        return $tag;
    }  
}

这也会计算返回码等。考虑到它们会占用空间，我没有删除它们。

Regardless of the 100 count issues you state at the beginning, you indicate in the challenge the following:

output the character count of
strip_tags (the number of characters
in the actual displayed text of the
HTML)
retain HTML formatting close
any unfinished HTML tag

Here is my proposal:
Bascially, I parse through each character counting as I go. I make sure NOT to count any characters in any HTML tag. I also check at the end to make sure I am not in the middle of a word when I stop. Once I stop, I back track to the first available SPACE or > as a stopping point.

$position = 0;
$length = strlen($content)-1;

// process the content putting each 100 character section into an array
while($position < $length)
{
    $next_position = get_position($content, $position, 100);
    $data[] = substr($content, $position, $next_position);
    $position = $next_position;
}

// show the array
print_r($data);

function get_position($content, $position, $chars = 100)
{
    $count = 0;
    // count to 100 characters skipping over all of the HTML
    while($count <> $chars){
        $char = substr($content, $position, 1); 
        if($char == '<'){
            do{
                $position++;
                $char = substr($content, $position, 1);
            } while($char !== '>');
            $position++;
            $char = substr($content, $position, 1);
        }
        $count++;
        $position++;
    }
echo $count."\n";
    // find out where there is a logical break before 100 characters
    $data = substr($content, 0, $position);

    $space = strrpos($data, " ");
    $tag = strrpos($data, ">");

    // return the position of the logical break
    if($space > $tag)
    {
        return $space;
    } else {
        return $tag;
    }  
}

This will also count the return codes etc. Considering they will take space, I have not removed them.

回复收藏 0 原文

御弟哥哥 2024-08-31 14:36:15

这是我在我的一个项目中使用的一个函数。它基于 DOMDocument，可与 HTML5 配合使用，比我尝试过的其他解决方案快大约 2 倍（至少在我的机器上，使用 html_cut($text, $max_length) 的时间为 0.22 毫秒 vs 0.43 毫秒） 500 个文本节点字符字符串的最佳答案，限制为 400）。

function cut_html ($html, $limit) {
    $dom = new DOMDocument();
    $dom->loadHTML(mb_convert_encoding("<div>{$html}</div>", "HTML-ENTITIES", "UTF-8"), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    cut_html_recursive($dom->documentElement, $limit);
    return substr($dom->saveHTML($dom->documentElement), 5, -6);
}

function cut_html_recursive ($element, $limit) {
    if($limit > 0) {
        if($element->nodeType == 3) {
            $limit -= strlen($element->nodeValue);
            if($limit < 0) {
                $element->nodeValue = substr($element->nodeValue, 0, strlen($element->nodeValue) + $limit);
            }
        }
        else {
            for($i = 0; $i < $element->childNodes->length; $i++) {
                if($limit > 0) {
                    $limit = cut_html_recursive($element->childNodes->item($i), $limit);
                }
                else {
                    $element->removeChild($element->childNodes->item($i));
                    $i--;
                }
            }
        }
    }
    return $limit;
}

Here is a function I'm using in one of my projects. It's based on DOMDocument, works with HTML5 and is about 2x faster than other solutions I've tried (at least on my machine, 0.22 ms vs 0.43 ms using html_cut($text, $max_length) from the top answer on a 500 text-node-characters string with a limit of 400).

function cut_html ($html, $limit) {
    $dom = new DOMDocument();
    $dom->loadHTML(mb_convert_encoding("<div>{$html}</div>", "HTML-ENTITIES", "UTF-8"), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    cut_html_recursive($dom->documentElement, $limit);
    return substr($dom->saveHTML($dom->documentElement), 5, -6);
}

function cut_html_recursive ($element, $limit) {
    if($limit > 0) {
        if($element->nodeType == 3) {
            $limit -= strlen($element->nodeValue);
            if($limit < 0) {
                $element->nodeValue = substr($element->nodeValue, 0, strlen($element->nodeValue) + $limit);
            }
        }
        else {
            for($i = 0; $i < $element->childNodes->length; $i++) {
                if($limit > 0) {
                    $limit = cut_html_recursive($element->childNodes->item($i), $limit);
                }
                else {
                    $element->removeChild($element->childNodes->item($i));
                    $i--;
                }
            }
        }
    }
    return $limit;
}

回复收藏 0 原文

她说她爱他 2024-08-31 14:36:15

这是我对刀具的尝试。也许你们能发现一些错误。我发现其他解析器的问题是它们没有正确关闭标签并且它们在单词中间切入（等等）

function cutHTML($string, $length, $patternsReplace = false) {
    $i = 0;
    $count = 0;
    $isParagraphCut = false;
    $htmlOpen = false;
    $openTag = false;
    $tagsStack = array();

    while ($i < strlen($string)) {
        $char = substr($string, $i, 1);
        if ($count >= $length) {
            $isParagraphCut = true;
            break;
        }

        if ($htmlOpen) {
            if ($char === ">") {
                $htmlOpen = false;
            }
        } else {
            if ($char === "<") {
                $j = $i;
                $char = substr($string, $j, 1);

                while ($j < strlen($string)) {
                    if($char === '/'){
                        $i++;
                        break;
                    }
                    elseif ($char === ' ') {
                        $tagsStack[] = substr($string, $i, $j);
                    }
                    $j++;
                }
                $htmlOpen = true;
            }
        }

        if (!$htmlOpen && $char != ">") {
            $count++;
        }

        $i++;
    }

    if ($isParagraphCut) {
        $j = $i;
        while ($j > 0) {
            $char = substr($string, $j, 1);
            if ($char === " " || $char === ";" || $char === "." || $char === "," || $char === "<" || $char === "(" || $char === "[") {
                break;
            } else if ($char === ">") {
                $j++;
                break;
            }
            $j--;
        }
        $string = substr($string, 0, $j);
        foreach($tagsStack as $tag){
            $tag = strtolower($tag);
            if($tag !== "img" && $tag !== "br"){
                $string .= "</$tag>";
            }
        }
        $string .= "...";
    }

    if ($patternsReplace) {
        foreach ($patternsReplace as $value) {
            if (isset($value['pattern']) && isset($value["replace"])) {
                $string = preg_replace($value["pattern"], $value["replace"], $string);
            }
        }
    }
    return $string;
}

Here is my try at the cutter. Maybe you guys can catch some bugs. The problem, i found with the other parsers, is that they don't close tags properly and they cut in the middle of a word (blah)

function cutHTML($string, $length, $patternsReplace = false) {
    $i = 0;
    $count = 0;
    $isParagraphCut = false;
    $htmlOpen = false;
    $openTag = false;
    $tagsStack = array();

    while ($i < strlen($string)) {
        $char = substr($string, $i, 1);
        if ($count >= $length) {
            $isParagraphCut = true;
            break;
        }

        if ($htmlOpen) {
            if ($char === ">") {
                $htmlOpen = false;
            }
        } else {
            if ($char === "<") {
                $j = $i;
                $char = substr($string, $j, 1);

                while ($j < strlen($string)) {
                    if($char === '/'){
                        $i++;
                        break;
                    }
                    elseif ($char === ' ') {
                        $tagsStack[] = substr($string, $i, $j);
                    }
                    $j++;
                }
                $htmlOpen = true;
            }
        }

        if (!$htmlOpen && $char != ">") {
            $count++;
        }

        $i++;
    }

    if ($isParagraphCut) {
        $j = $i;
        while ($j > 0) {
            $char = substr($string, $j, 1);
            if ($char === " " || $char === ";" || $char === "." || $char === "," || $char === "<" || $char === "(" || $char === "[") {
                break;
            } else if ($char === ">") {
                $j++;
                break;
            }
            $j--;
        }
        $string = substr($string, 0, $j);
        foreach($tagsStack as $tag){
            $tag = strtolower($tag);
            if($tag !== "img" && $tag !== "br"){
                $string .= "</$tag>";
            }
        }
        $string .= "...";
    }

    if ($patternsReplace) {
        foreach ($patternsReplace as $value) {
            if (isset($value['pattern']) && isset($value["replace"])) {
                $string = preg_replace($value["pattern"], $value["replace"], $string);
            }
        }
    }
    return $string;
}

回复收藏 0 原文

捶死心动 2024-08-31 14:36:15

尝试以下操作：

<?php echo strip_tags(mb_strimwidth($VARIABLE_HERE, 0, 160, "...")); ?>

这会将 HTML (strip_tags) 和限制字符 (mb_strimwidth) 限制为 160 个字符

Try the following:

<?php echo strip_tags(mb_strimwidth($VARIABLE_HERE, 0, 160, "...")); ?>

This will strip the HTML (strip_tags) amd limit characters (mb_strimwidth) to 160 characters

回复收藏 0 原文

岁吢 2024-08-31 14:36:15

尝试这个功能

// trim the string function
function trim_word($text, $length, $startPoint=0, $allowedTags=""){
    $text = html_entity_decode(htmlspecialchars_decode($text));
    $text = strip_tags($text, $allowedTags);
    return $text = substr($text, $startPoint, $length);
}

并且

echo trim_word("<h2 class='zzzz'>abcasdsdasasdas</h2>","6");

try this function

// trim the string function
function trim_word($text, $length, $startPoint=0, $allowedTags=""){
    $text = html_entity_decode(htmlspecialchars_decode($text));
    $text = strip_tags($text, $allowedTags);
    return $text = substr($text, $startPoint, $length);
}

and

echo trim_word("<h2 class='zzzz'>abcasdsdasasdas</h2>","6");

回复收藏 0 原文

酷炫老祖宗 2024-08-31 14:36:15

我知道这已经很老了，但我最近制作了一个用于剪切 HTML 进行预览的小类： https:// /github.com/Simbiat/HTMLCut/
您为什么要使用它而不是其他建议？以下是我想到的一些事情（摘自自述文件）：

保留 HTML 标签，除非它们为空。
保留言语。
删除剪切字符串末尾的一些孤立标点符号。
删除您在预览中不需要的 HTML 标签（可选）。
限制段落数量（可选）。
如果文本被剪切，则添加省略号（可选）。

Class使用DOM进行操作，但在某些地方也使用了Regex（主要用于剪切和修剪）。也许它对某些人有用。

回复收藏 0 原文

↘人皮目录ツ 2024-08-31 14:36:15

function html_cut($text, $max_length){
$tags   = array();
$result = "";

$is_open   = false;
$grab_open = false;
$is_close  = false;
$in_double_quotes = false;
$in_single_quotes = false;
$tag = "";

$i = 0;
$stripped = 0;

$stripped_text = strip_tags($text);

while ($i < mb_strlen($text) && $stripped < mb_strlen($stripped_text) && $stripped < $max_length)
{
    $symbol  = mb_substr($text, $i, 1);
    $result .= $symbol;

    switch ($symbol)
    {
       case '<':
            $is_open   = true;
            $grab_open = true;
            break;

       case '"':
           if ($in_double_quotes)
               $in_double_quotes = false;
           else
               $in_double_quotes = true;

        break;

        case "'":
          if ($in_single_quotes)
              $in_single_quotes = false;
          else
              $in_single_quotes = true;

        break;

        case '/':
            if ($is_open && !$in_double_quotes && !$in_single_quotes)
            {
                $is_close  = true;
                $is_open   = false;
                $grab_open = false;
            }

            break;

        case ' ':
            if ($is_open)
                $grab_open = false;
            else
                $stripped++;

            break;

        case '>':
            if ($is_open)
            {
                $is_open   = false;
                $grab_open = false;
                array_push($tags, $tag);
                $tag = "";
            }
            else if ($is_close)
            {
                $is_close = false;
                array_pop($tags);
                $tag = "";
            }

            break;

        default:
            if ($grab_open || $is_close)
                $tag .= $symbol;

            if (!$is_open && !$is_close)
                $stripped++;
    }

    $i++;
}

while ($tags)
    $result .= "</".array_pop($tags).">";

return $result;
}

html_cut 的多字节版本，工作正常。

function html_cut($text, $max_length){
$tags   = array();
$result = "";

$is_open   = false;
$grab_open = false;
$is_close  = false;
$in_double_quotes = false;
$in_single_quotes = false;
$tag = "";

$i = 0;
$stripped = 0;

$stripped_text = strip_tags($text);

while ($i < mb_strlen($text) && $stripped < mb_strlen($stripped_text) && $stripped < $max_length)
{
    $symbol  = mb_substr($text, $i, 1);
    $result .= $symbol;

    switch ($symbol)
    {
       case '<':
            $is_open   = true;
            $grab_open = true;
            break;

       case '"':
           if ($in_double_quotes)
               $in_double_quotes = false;
           else
               $in_double_quotes = true;

        break;

        case "'":
          if ($in_single_quotes)
              $in_single_quotes = false;
          else
              $in_single_quotes = true;

        break;

        case '/':
            if ($is_open && !$in_double_quotes && !$in_single_quotes)
            {
                $is_close  = true;
                $is_open   = false;
                $grab_open = false;
            }

            break;

        case ' ':
            if ($is_open)
                $grab_open = false;
            else
                $stripped++;

            break;

        case '>':
            if ($is_open)
            {
                $is_open   = false;
                $grab_open = false;
                array_push($tags, $tag);
                $tag = "";
            }
            else if ($is_close)
            {
                $is_close = false;
                array_pop($tags);
                $tag = "";
            }

            break;

        default:
            if ($grab_open || $is_close)
                $tag .= $symbol;

            if (!$is_open && !$is_close)
                $stripped++;
    }

    $i++;
}

while ($tags)
    $result .= "</".array_pop($tags).">";

return $result;
}

multibyte version of html_cut,it`s working fine.

回复收藏 0 原文

~没有更多了~