字符数减去 HTML 字符 C#

发布于 2024-09-26 20:21:44 字数 1502 浏览 1 评论 0原文

我试图找出一种方法来计算字符串中的字符数,截断字符串,然后返回它。但是,我需要这个函数来不计算 HTML 标签。问题是,如果它计算 HTML 标签,那么如果截断点位于标签的中间,那么页面就会出现损坏。

这就是我到目前为止所拥有的......

public string Truncate(string input, int characterLimit, string currID) {
    string output = input;

    // Check if the string is longer than the allowed amount
    // otherwise do nothing
    if (output.Length > characterLimit && characterLimit > 0) {

        // cut the string down to the maximum number of characters
        output = output.Substring(0, characterLimit);

        // Check if the character right after the truncate point was a space
        // if not, we are in the middle of a word and need to remove the rest of it
        if (input.Substring(output.Length, 1) != " ") {
            int LastSpace = output.LastIndexOf(" ");

            // if we found a space then, cut back to that space
            if (LastSpace != -1)
            {
                output = output.Substring(0, LastSpace);
            }
        }
        // end any anchors
        if (output.Contains("<a href")) {
            output += "</a>";
        }
        // Finally, add the "..." and end the paragraph
        output += "<br /><br />...<a href='Announcements.aspx?ID=" + currID + "'>see more</a></p>";
    }
    return output;
}

但我对此不满意。有更好的方法吗?如果您可以为此提供一个新的解决方案,或者也许可以建议在我目前拥有的内容中添加哪些内容,那就太好了。

免责声明:我从未使用过 C#,所以我不熟悉与该语言相关的概念...我这样做是因为我必须这样做,而不是出于选择。

谢谢, 赫里斯托

I'm trying to figure out a way to count the number of characters in a string, truncate the string, then returns it. However, I need this function to NOT count HTML tags. The problem is that if it counts HTML tags, then if the truncate point is in the middle of a tag, then the page will appear broken.

This is what I have so far...

public string Truncate(string input, int characterLimit, string currID) {
    string output = input;

    // Check if the string is longer than the allowed amount
    // otherwise do nothing
    if (output.Length > characterLimit && characterLimit > 0) {

        // cut the string down to the maximum number of characters
        output = output.Substring(0, characterLimit);

        // Check if the character right after the truncate point was a space
        // if not, we are in the middle of a word and need to remove the rest of it
        if (input.Substring(output.Length, 1) != " ") {
            int LastSpace = output.LastIndexOf(" ");

            // if we found a space then, cut back to that space
            if (LastSpace != -1)
            {
                output = output.Substring(0, LastSpace);
            }
        }
        // end any anchors
        if (output.Contains("<a href")) {
            output += "</a>";
        }
        // Finally, add the "..." and end the paragraph
        output += "<br /><br />...<a href='Announcements.aspx?ID=" + currID + "'>see more</a></p>";
    }
    return output;
}

But I'm not happy with this. Is there a better way to do this? If you could provide a new solution to this, or perhaps suggestions on what to add to what I have so far, that would be great.

Disclaimer: I've never worked with C#, so I'm not familiar with the concepts related to the language... I'm doing this because I have to, not by choice.

Thanks,
Hristo

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

安静被遗忘 2024-10-03 20:21:44

使用正确的工具来解决问题。

HTML 不是一种易于解析的格式。我建议您使用经过验证的现有解析器,而不是自行开发。如果您知道您只会解析 XHTML - 那么您可以使用 XML 解析器。

这些是在 HTML 上执行操作并保留语义表示的唯一可靠方法。

不要尝试使用正则表达式。 HTML 不是一种常规语言,朝这个方向发展只会给自己带来悲伤和痛苦。

Use the right tool for the problem.

HTML is not a simple format to parse. I would advise that you use a proven, existing parser rather than rolling your own. If you know that you will only ever parse XHTML - then you could use an XML parser instead.

These are the only reliable ways to perform operations on HTML that will preserve the semantic representation.

Don't try to use regular expressions. HTML is not a regular language and you can only cause yourself grief and misery going in that direction.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文