将字符串修剪为忽略 HTML 的长度

发布于 2024-07-17 09:36:26 字数 692 浏览 9 评论 0原文

这个问题是一个具有挑战性的问题。 我们的应用程序允许用户在主页上发布新闻。 该新闻是通过允许 HTML 的富文本编辑器输入的。 在主页上,我们只想显示新闻项目的截断摘要。

例如,这是我们显示的全文,包括 HTML


为了在办公室和厨房腾出更多空间,我随机拿出了所有杯子并将它们放在午餐室的桌子上。 除非您对 1992 年的 Cheyenne Courier 马克杯或 1997 年的 BC Tel Advanced Communications 马克杯的所有权有强烈的感觉,否则它们将被放入盒子中并捐赠给比我们更需要马克杯的办公室。

我们希望将新闻项目修剪为 250 个字符,但排除 HTML。

目前我们使用的修剪方法包括 HTML,这会导致一些 HTML 较多的新闻帖子被大幅截断。

例如,如果上面的示例包含大量 HTML,它可能如下所示:

为了在办公室、厨房腾出更多空间,我拉了......

这不是我们想要的。

有没有人有办法标记 HTML 标签,以便保持字符串中的位置,对字符串执行长度检查和/或修剪,并将字符串内的 HTML 恢复到其旧位置?

This problem is a challenging one. Our application allows users to post news on the homepage. That news is input via a rich text editor which allows HTML. On the homepage we want to only display a truncated summary of the news item.

For example, here is the full text we are displaying, including HTML

In an attempt to make a bit more space in the office, kitchen, I've pulled out all of the random mugs and put them onto the lunch room table. Unless you feel strongly about the ownership of that Cheyenne Courier mug from 1992 or perhaps that BC Tel Advanced Communications mug from 1997, they will be put in a box and donated to an office in more need of mugs than us.

We want to trim the news item to 250 characters, but exclude HTML.

The method we are using for trimming currently includes the HTML, and this results in some news posts that are HTML heavy getting truncated considerably.

For instance, if the above example included tons of HTML, it could potentially look like this:

In an attempt to make a bit more space in the office, kitchen, I've pulled...

This is not what we want.

Does anyone have a way of tokenizing HTML tags in order to maintain position in the string, perform a length check and/or trim on the string, and restore the HTML inside the string at its old location?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

绮筵 2024-07-24 09:36:26

从帖子的第一个字符开始,跨过每个字符。 每次你跨过一个角色时,都会增加一个计数器。 当你找到“<”时 字符,停止增加计数器,直到您点击“>” 特点。 当计数器达到 250 时,你的位置就是你真正想要切断的位置。

请注意,当 HTML 标记打开但在截止之前未关闭时,您将不得不处理另一个问题。

Start at the first character of the post, stepping over each character. Every time you step over a character, increment a counter. When you find a '<' character, stop incrementing the counter until you hit a '>' character. Your position when the counter gets to 250 is where you actually want to cut off.

Take note that this will have another problem that you'll have to deal with when an HTML tag is opened but not closed before the cutoff.

盗心人 2024-07-24 09:36:26

遵循 2 状态有限机建议,我刚刚为此目的用 Java 开发了一个简单的 HTML 解析器:

http:// Pastebin.com/jCRqiwNH

这里有一个测试用例:

http://pastebin.com/37gCS4tV

并且这里是 Java 代码:

import java.util.Collections;
import java.util.LinkedList;
import java.util.List;

public class HtmlShortener {

    private static final String TAGS_TO_SKIP = "br,hr,img,link";
    private static final String[] tagsToSkip = TAGS_TO_SKIP.split(",");
    private static final int STATUS_READY = 0;

        private int cutPoint = -1;
    private String htmlString = "";

    final List<String> tags = new LinkedList<String>();

    StringBuilder sb = new StringBuilder("");
    StringBuilder tagSb = new StringBuilder("");

    int charCount = 0;
    int status = STATUS_READY;

    public HtmlShortener(String htmlString, int cutPoint){
        this.cutPoint = cutPoint;
        this.htmlString = htmlString;
    }

    public String cut(){

        // reset 
        tags.clear();
        sb = new StringBuilder("");
        tagSb = new StringBuilder("");
        charCount = 0;
        status = STATUS_READY;

        String tag = "";

        if (cutPoint < 0){
            return htmlString;
        }

        if (null != htmlString){

            if (cutPoint == 0){
                return "";
            }

            for (int i = 0; i < htmlString.length(); i++){

                String strC = htmlString.substring(i, i+1);


                if (strC.equals("<")){

                    // new tag or tag closure

                    // previous tag reset
                    tagSb = new StringBuilder("");
                    tag = "";

                    // find tag type and name
                    for (int k = i; k < htmlString.length(); k++){

                        String tagC = htmlString.substring(k, k+1);
                        tagSb.append(tagC);

                        if (tagC.equals(">")){
                            tag = getTag(tagSb.toString());
                            if (tag.startsWith("/")){

                                // closure
                                if (!isToSkip(tag)){
                                    sb.append("</").append(tags.get(tags.size() - 1)).append(">");
                                    tags.remove((tags.size() - 1));
                                }

                            } else {

                                // new tag
                                sb.append(tagSb.toString());

                                if (!isToSkip(tag)){
                                    tags.add(tag);  
                                }

                            }

                            i = k;
                            break;
                        }

                    }

                } else {

                    sb.append(strC);
                    charCount++;

                }

                // cut check
                if (charCount >= cutPoint){

                    // close previously open tags
                    Collections.reverse(tags);
                    for (String t : tags){
                        sb.append("</").append(t).append(">");
                    }
                    break;
                } 

            }

            return sb.toString();

        } else {
            return null;
        }

    }

    private boolean isToSkip(String tag) {

        if (tag.startsWith("/")){
            tag = tag.substring(1, tag.length());
        }

        for (String tagToSkip : tagsToSkip){
            if (tagToSkip.equals(tag)){
                return true;
            }
        }

        return false;
    }

    private String getTag(String tagString) {

        if (tagString.contains(" ")){
            // tag with attributes
            return tagString.substring(tagString.indexOf("<") + 1, tagString.indexOf(" "));
        } else {
            // simple tag
            return tagString.substring(tagString.indexOf("<") + 1, tagString.indexOf(">"));
        }


    }

}

Following the 2-state finite machine suggestion, I've just developed a simple HTML parser for this purpose, in Java:

http://pastebin.com/jCRqiwNH

and here a test case:

http://pastebin.com/37gCS4tV

And here the Java code:

import java.util.Collections;
import java.util.LinkedList;
import java.util.List;

public class HtmlShortener {

    private static final String TAGS_TO_SKIP = "br,hr,img,link";
    private static final String[] tagsToSkip = TAGS_TO_SKIP.split(",");
    private static final int STATUS_READY = 0;

        private int cutPoint = -1;
    private String htmlString = "";

    final List<String> tags = new LinkedList<String>();

    StringBuilder sb = new StringBuilder("");
    StringBuilder tagSb = new StringBuilder("");

    int charCount = 0;
    int status = STATUS_READY;

    public HtmlShortener(String htmlString, int cutPoint){
        this.cutPoint = cutPoint;
        this.htmlString = htmlString;
    }

    public String cut(){

        // reset 
        tags.clear();
        sb = new StringBuilder("");
        tagSb = new StringBuilder("");
        charCount = 0;
        status = STATUS_READY;

        String tag = "";

        if (cutPoint < 0){
            return htmlString;
        }

        if (null != htmlString){

            if (cutPoint == 0){
                return "";
            }

            for (int i = 0; i < htmlString.length(); i++){

                String strC = htmlString.substring(i, i+1);


                if (strC.equals("<")){

                    // new tag or tag closure

                    // previous tag reset
                    tagSb = new StringBuilder("");
                    tag = "";

                    // find tag type and name
                    for (int k = i; k < htmlString.length(); k++){

                        String tagC = htmlString.substring(k, k+1);
                        tagSb.append(tagC);

                        if (tagC.equals(">")){
                            tag = getTag(tagSb.toString());
                            if (tag.startsWith("/")){

                                // closure
                                if (!isToSkip(tag)){
                                    sb.append("</").append(tags.get(tags.size() - 1)).append(">");
                                    tags.remove((tags.size() - 1));
                                }

                            } else {

                                // new tag
                                sb.append(tagSb.toString());

                                if (!isToSkip(tag)){
                                    tags.add(tag);  
                                }

                            }

                            i = k;
                            break;
                        }

                    }

                } else {

                    sb.append(strC);
                    charCount++;

                }

                // cut check
                if (charCount >= cutPoint){

                    // close previously open tags
                    Collections.reverse(tags);
                    for (String t : tags){
                        sb.append("</").append(t).append(">");
                    }
                    break;
                } 

            }

            return sb.toString();

        } else {
            return null;
        }

    }

    private boolean isToSkip(String tag) {

        if (tag.startsWith("/")){
            tag = tag.substring(1, tag.length());
        }

        for (String tagToSkip : tagsToSkip){
            if (tagToSkip.equals(tag)){
                return true;
            }
        }

        return false;
    }

    private String getTag(String tagString) {

        if (tagString.contains(" ")){
            // tag with attributes
            return tagString.substring(tagString.indexOf("<") + 1, tagString.indexOf(" "));
        } else {
            // simple tag
            return tagString.substring(tagString.indexOf("<") + 1, tagString.indexOf(">"));
        }


    }

}

一个人的旅程 2024-07-24 09:36:26

您可以尝试以下 npm 包

trim-html

它会切断 html 标签内的足够文本,保存原始 html 限制,达到限制后删除 html 标签并关闭打开的标签。

You can try the following npm package

trim-html

It cutting off sufficient text inside html tags, save original html stricture, remove html tags after limit is reached and closing opened tags.

水波映月 2024-07-24 09:36:26

如果我正确理解问题,您希望保留 HTML 格式,但不希望将其计入您保留的字符串长度的一部分。

您可以使用实现简单有限状态机的代码来完成此任务。

2 种状态:InTag、OutOfTag
标签内:
- 如果遇到 > 字符,则转到 OutOfTag
- 遇到任何其他字符则返回自身
标签外:
- 如果遇到 < 字符,则转到 InTag
- 遇到任何其他字符则自行返回

您的起始状态将为 OutOfTag。

您可以通过一次处理 1 个字符来实现有限状态机。 每个角色的处理都会带你进入一个新的状态。

当您通过有限状态机运行文本时,您还希望保留输出缓冲区和到目前为止遇到的变量的长度(以便您知道何时停止)。

  1. 每次处于 OutOfTag 状态并处理另一个字符时,递增 Length 变量。 如果有空格字符,您可以选择不增加此变量。
  2. 当没有更多字符或达到 #1 中提到的所需长度时,算法结束。
  3. 在输出缓冲区中,包含您遇到的字符,直到 #1 中提到的长度为止。
  4. 保留一堆未关闭的标签。 当达到长度时,为堆栈中的每个元素添加一个结束标记。 当您运行算法时,您可以通过保留 current_tag 变量知道何时遇到标签。 current_tag 变量在进入 InTag 状态时开始,在进入 OutOfTag 状态时结束(或者在 InTag 状态中遇到空格字符时)。 如果您有开始标记,请将其放入堆栈中。 如果有结束标记,则将其从堆栈中弹出。

If I understand the problem correctly, you want to keep the HTML formatting, but you want to not count it as part of the length of the string you are keeping.

You can accomplish this with code that implements a simple finite state machine.

2 states: InTag, OutOfTag
InTag:
- Goes to OutOfTag if > character is encountered
- Goes to itself any other character is encountered
OutOfTag:
- Goes to InTag if < character is encountered
- Goes to itself any other character is encountered

Your starting state will be OutOfTag.

You implement a finite state machine by procesing 1 character at a time. The processing of each character brings you to a new state.

As you run your text through the finite state machine, you want to also keep an output buffer and a length so far encountered varaible (so you know when to stop).

  1. Increment your Length variable each time you are in the state OutOfTag and you process another character. You can optionally not increment this variable if you have a whitespace character.
  2. You end the algorithm when you have no more characters or you have the desired length mentioned in #1.
  3. In your output buffer, include characters you encounter up until the length mentioned in #1.
  4. Keep a stack of unclosed tags. When you reach the length, for each element in the stack, add an end tag. As you run through your algorithm you can know when you encounter a tag by keeping a current_tag variable. This current_tag variable is started when you enter the InTag state, and it is ended when you enter the OutOfTag state (or when a whitepsace character is encountered while in the InTag state). If you have a start tag you put it in the stack. If you have an end tag, you pop it from the stack.
燃情 2024-07-24 09:36:26

这是我在 C# 中提出的实现:

public static string TrimToLength(string input, int length)
{
  if (string.IsNullOrEmpty(input))
    return string.Empty;

  if (input.Length <= length)
    return input;

  bool inTag = false;
  int targetLength = 0;

  for (int i = 0; i < input.Length; i++)
  {
    char c = input[i];

    if (c == '>')
    {
      inTag = false;
      continue;
    }

    if (c == '<')
    {
      inTag = true;
      continue;
    }

    if (inTag || char.IsWhiteSpace(c))
    {
      continue;
    }

    targetLength++;

    if (targetLength == length)
    {
      return ConvertToXhtml(input.Substring(0, i + 1));
    }
  }

  return input;
}

以及我通过 TDD 使用的一些单元测试:

[Test]
public void Html_TrimReturnsEmptyStringWhenNullPassed()
{
  Assert.That(Html.TrimToLength(null, 1000), Is.Empty);
}

[Test]
public void Html_TrimReturnsEmptyStringWhenEmptyPassed()
{
  Assert.That(Html.TrimToLength(string.Empty, 1000), Is.Empty);
}

[Test]
public void Html_TrimReturnsUnmodifiedStringWhenSameAsLength()
{
  string source = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
                  "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
                  "<br/>" +
                  "In an attempt to make a bit more space in the office, kitchen, I";

  Assert.That(Html.TrimToLength(source, 250), Is.EqualTo(source));
}

[Test]
public void Html_TrimWellFormedHtml()
{
  string source = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
             "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
             "<br/>" +
             "In an attempt to make a bit more space in the office, kitchen, I've pulled out all of the random mugs and put them onto the lunch room table. Unless you feel strongly about the ownership of that Cheyenne Courier mug from 1992 or perhaps that BC Tel Advanced Communications mug from 1997, they will be put in a box and donated to an office in more need of mugs than us. <br/><br/>" +
             "In the meantime we have a nice selection of white Ikea mugs, some random Starbucks mugs, and others that have made their way into the office over the years. Hopefully that will suffice. <br/><br/>" +
             "</div>";

  string expected = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
                    "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
                    "<br/>" +
                    "In an attempt to make a bit more space in the office, kitchen, I've pulled out all of the random mugs and put them onto the lunch room table. Unless you feel strongly about the ownership of that Cheyenne Courier mug from 1992 or perhaps that BC Tel Advanced Communications mug from 1997, they will be put in";

  Assert.That(Html.TrimToLength(source, 250), Is.EqualTo(expected));
}

[Test]
public void Html_TrimMalformedHtml()
{
  string malformedHtml = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
                         "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
                         "<br/>" +
                         "In an attempt to make a bit more space in the office, kitchen, I've pulled out all of the random mugs and put them onto the lunch room table. Unless you feel strongly about the ownership of that Cheyenne Courier mug from 1992 or perhaps that BC Tel Advanced Communications mug from 1997, they will be put in a box and donated to an office in more need of mugs than us. <br/><br/>" +
                         "In the meantime we have a nice selection of white Ikea mugs, some random Starbucks mugs, and others that have made their way into the office over the years. Hopefully that will suffice. <br/><br/>";

  string expected = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
              "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
              "<br/>" +
              "In an attempt to make a bit more space in the office, kitchen, I've pulled out all of the random mugs and put them onto the lunch room table. Unless you feel strongly about the ownership of that Cheyenne Courier mug from 1992 or perhaps that BC Tel Advanced Communications mug from 1997, they will be put in";

  Assert.That(Html.TrimToLength(malformedHtml, 250), Is.EqualTo(expected));
}

Here's the implementation that I came up with, in C#:

public static string TrimToLength(string input, int length)
{
  if (string.IsNullOrEmpty(input))
    return string.Empty;

  if (input.Length <= length)
    return input;

  bool inTag = false;
  int targetLength = 0;

  for (int i = 0; i < input.Length; i++)
  {
    char c = input[i];

    if (c == '>')
    {
      inTag = false;
      continue;
    }

    if (c == '<')
    {
      inTag = true;
      continue;
    }

    if (inTag || char.IsWhiteSpace(c))
    {
      continue;
    }

    targetLength++;

    if (targetLength == length)
    {
      return ConvertToXhtml(input.Substring(0, i + 1));
    }
  }

  return input;
}

And a few unit tests I used via TDD:

[Test]
public void Html_TrimReturnsEmptyStringWhenNullPassed()
{
  Assert.That(Html.TrimToLength(null, 1000), Is.Empty);
}

[Test]
public void Html_TrimReturnsEmptyStringWhenEmptyPassed()
{
  Assert.That(Html.TrimToLength(string.Empty, 1000), Is.Empty);
}

[Test]
public void Html_TrimReturnsUnmodifiedStringWhenSameAsLength()
{
  string source = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
                  "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
                  "<br/>" +
                  "In an attempt to make a bit more space in the office, kitchen, I";

  Assert.That(Html.TrimToLength(source, 250), Is.EqualTo(source));
}

[Test]
public void Html_TrimWellFormedHtml()
{
  string source = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
             "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
             "<br/>" +
             "In an attempt to make a bit more space in the office, kitchen, I've pulled out all of the random mugs and put them onto the lunch room table. Unless you feel strongly about the ownership of that Cheyenne Courier mug from 1992 or perhaps that BC Tel Advanced Communications mug from 1997, they will be put in a box and donated to an office in more need of mugs than us. <br/><br/>" +
             "In the meantime we have a nice selection of white Ikea mugs, some random Starbucks mugs, and others that have made their way into the office over the years. Hopefully that will suffice. <br/><br/>" +
             "</div>";

  string expected = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
                    "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
                    "<br/>" +
                    "In an attempt to make a bit more space in the office, kitchen, I've pulled out all of the random mugs and put them onto the lunch room table. Unless you feel strongly about the ownership of that Cheyenne Courier mug from 1992 or perhaps that BC Tel Advanced Communications mug from 1997, they will be put in";

  Assert.That(Html.TrimToLength(source, 250), Is.EqualTo(expected));
}

[Test]
public void Html_TrimMalformedHtml()
{
  string malformedHtml = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
                         "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
                         "<br/>" +
                         "In an attempt to make a bit more space in the office, kitchen, I've pulled out all of the random mugs and put them onto the lunch room table. Unless you feel strongly about the ownership of that Cheyenne Courier mug from 1992 or perhaps that BC Tel Advanced Communications mug from 1997, they will be put in a box and donated to an office in more need of mugs than us. <br/><br/>" +
                         "In the meantime we have a nice selection of white Ikea mugs, some random Starbucks mugs, and others that have made their way into the office over the years. Hopefully that will suffice. <br/><br/>";

  string expected = "<div lang=\"en\" class=\"textBody localizable\" id=\"pageBody_en\">" +
              "<img photoid=\"4041\" src=\"http://xxxxxxxx/imagethumb/562103830000/4041/300x300/False/mugs.jpg\" style=\"float: right;\" class=\"photoRight\" alt=\"\"/>" +
              "<br/>" +
              "In an attempt to make a bit more space in the office, kitchen, I've pulled out all of the random mugs and put them onto the lunch room table. Unless you feel strongly about the ownership of that Cheyenne Courier mug from 1992 or perhaps that BC Tel Advanced Communications mug from 1997, they will be put in";

  Assert.That(Html.TrimToLength(malformedHtml, 250), Is.EqualTo(expected));
}
所谓喜欢 2024-07-24 09:36:26

我知道这已经是发布日期之后很久了,但我遇到了类似的问题,这就是我最终解决它的方法。 我担心的是正则表达式与通过数组进行交互的速度。

另外,如果 html 标签之前有一个空格,而在这之后并不能解决这个问题

private string HtmlTrimmer(string input, int len)
{
    if (string.IsNullOrEmpty(input))
        return string.Empty;
    if (input.Length <= len)
        return input;

    // this is necissary because regex "^"  applies to the start of the string, not where you tell it to start from
    string inputCopy;
    string tag;

    string result = "";
    int strLen = 0;
    int strMarker = 0;
    int inputLength = input.Length;     

    Stack stack = new Stack(10);
    Regex text = new Regex("^[^<&]+");                
    Regex singleUseTag = new Regex("^<[^>]*?/>");            
    Regex specChar = new Regex("^&[^;]*?;");
    Regex htmlTag = new Regex("^<.*?>");

    while (strLen < len)
    {
        inputCopy = input.Substring(strMarker);
        //If the marker is at the end of the string OR 
        //the sum of the remaining characters and those analyzed is less then the maxlength
        if (strMarker >= inputLength || (inputLength - strMarker) + strLen < len)
            break;

        //Match regular text
        result += text.Match(inputCopy,0,len-strLen);
        strLen += result.Length - strMarker;
        strMarker = result.Length;

        inputCopy = input.Substring(strMarker);
        if (singleUseTag.IsMatch(inputCopy))
            result += singleUseTag.Match(inputCopy);
        else if (specChar.IsMatch(inputCopy))
        {
            //think of   as 1 character instead of 5
            result += specChar.Match(inputCopy);
            ++strLen;
        }
        else if (htmlTag.IsMatch(inputCopy))
        {
            tag = htmlTag.Match(inputCopy).ToString();
            //This only works if this is valid Markup...
            if(tag[1]=='/')         //Closing tag
                stack.Pop();
            else                    //not a closing tag
                stack.Push(tag);
            result += tag;
        }
        else    //Bad syntax
            result += input[strMarker];

        strMarker = result.Length;
    }

    while (stack.Count > 0)
    {
        tag = stack.Pop().ToString();
        result += tag.Insert(1, "/");
    }
    if (strLen == len)
        result += "...";
    return result;
}

I'm aware this is quite a bit after the posted date, but i had a similiar issue and this is how i ended up solving it. My concern would be the speed of regex versus interating through an array.

Also if you have a space before an html tag, and after this doesn't fix that

private string HtmlTrimmer(string input, int len)
{
    if (string.IsNullOrEmpty(input))
        return string.Empty;
    if (input.Length <= len)
        return input;

    // this is necissary because regex "^"  applies to the start of the string, not where you tell it to start from
    string inputCopy;
    string tag;

    string result = "";
    int strLen = 0;
    int strMarker = 0;
    int inputLength = input.Length;     

    Stack stack = new Stack(10);
    Regex text = new Regex("^[^<&]+");                
    Regex singleUseTag = new Regex("^<[^>]*?/>");            
    Regex specChar = new Regex("^&[^;]*?;");
    Regex htmlTag = new Regex("^<.*?>");

    while (strLen < len)
    {
        inputCopy = input.Substring(strMarker);
        //If the marker is at the end of the string OR 
        //the sum of the remaining characters and those analyzed is less then the maxlength
        if (strMarker >= inputLength || (inputLength - strMarker) + strLen < len)
            break;

        //Match regular text
        result += text.Match(inputCopy,0,len-strLen);
        strLen += result.Length - strMarker;
        strMarker = result.Length;

        inputCopy = input.Substring(strMarker);
        if (singleUseTag.IsMatch(inputCopy))
            result += singleUseTag.Match(inputCopy);
        else if (specChar.IsMatch(inputCopy))
        {
            //think of   as 1 character instead of 5
            result += specChar.Match(inputCopy);
            ++strLen;
        }
        else if (htmlTag.IsMatch(inputCopy))
        {
            tag = htmlTag.Match(inputCopy).ToString();
            //This only works if this is valid Markup...
            if(tag[1]=='/')         //Closing tag
                stack.Pop();
            else                    //not a closing tag
                stack.Push(tag);
            result += tag;
        }
        else    //Bad syntax
            result += input[strMarker];

        strMarker = result.Length;
    }

    while (stack.Count > 0)
    {
        tag = stack.Pop().ToString();
        result += tag.Insert(1, "/");
    }
    if (strLen == len)
        result += "...";
    return result;
}
岁吢 2024-07-24 09:36:26

最快的方法不是使用 jQuery 的 text() 方法吗?

例如:

<ul>
  <li>One</li>
  <li>Two</li>
  <li>Three</li>
</ul>

var text = $('ul').text();

将在 text 变量中给出值 OneTwoThree。 这将允许您获取不包含 HTML 的文本的实际长度。

Wouldn't the fastest way be to use jQuery's text() method?

For example:

<ul>
  <li>One</li>
  <li>Two</li>
  <li>Three</li>
</ul>

var text = $('ul').text();

Would give the value OneTwoThree in the text variable. This would allow you to get the actual length of the text without the HTML included.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文