在 Java 中修剪字符串,同时保留完整单词

发布于 2024-12-09 13:11:06 字数 427 浏览 0 评论 0原文

我需要在java中修剪一个字符串,以便:

敏捷的棕色狐狸跳过了懒惰的狗。

变成

快速棕色...

在上面的示例中,我将其修剪为 12 个字符。如果我只使用子字符串我会得到:

快速...

我已经有了一种使用子字符串执行此操作的方法,但我想知道执行此操作最快(最有效)的方法是什么,因为页面可能有许多修剪操作。

我能想到的唯一方法是将字符串按空格分开,然后将其放回一起,直到其长度超过给定长度。还有其他办法吗?也许是一种更有效的方法,我可以使用相同的方法进行“软”修剪,其中保留最后一个单词(如上例所示)和硬修剪,这几乎是一个子字符串。

谢谢,

I need to trim a String in java so that:

The quick brown fox jumps over the laz dog.

becomes

The quick brown...

In the example above, I'm trimming to 12 characters. If I just use substring I would get:

The quick br...

I already have a method for doing this using substring, but I wanted to know what is the fastest (most efficient) way to do this because a page may have many trim operations.

The only way I can think off is to split the string on spaces and put it back together until its length passes the given length. Is there an other way? Perhaps a more efficient way in which I can use the same method to do a "soft" trim where I preserve the last word (as shown in the example above) and a hard trim which is pretty much a substring.

Thanks,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

五里雾 2024-12-16 13:11:06

下面是我用来修剪网络应用程序中长字符串的方法。
正如您所说,“软”布尔值如果设置为 true 将保留最后一个单词。
这是我能想到的最简洁的方法,它使用 StringBuffer,它比重新创建不可变的字符串要有效得多。

public static String trimString(String string, int length, boolean soft) {
    if(string == null || string.trim().isEmpty()){
        return string;
    }

    StringBuffer sb = new StringBuffer(string);
    int actualLength = length - 3;
    if(sb.length() > actualLength){
        // -3 because we add 3 dots at the end. Returned string length has to be length including the dots.
        if(!soft)
            return escapeHtml(sb.insert(actualLength, "...").substring(0, actualLength+3));
        else {
            int endIndex = sb.indexOf(" ",actualLength);
            return escapeHtml(sb.insert(endIndex,"...").substring(0, endIndex+3));
        }
    }
    return string;
}

更新

我已经更改了代码,以便将...附加到StringBuffer中,这是为了防止隐式创建不必要的String既缓慢又浪费。

注意: escapeHtml 是来自 apache commons 的静态导入:

import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;

您可以将其删除并且代码应该工作相同。

Below is a method I use to trim long strings in my webapps.
The "soft" boolean as you put it, if set to true will preserve the last word.
This is the most concise way of doing it that I could come up with that uses a StringBuffer which is a lot more efficient than recreating a string which is immutable.

public static String trimString(String string, int length, boolean soft) {
    if(string == null || string.trim().isEmpty()){
        return string;
    }

    StringBuffer sb = new StringBuffer(string);
    int actualLength = length - 3;
    if(sb.length() > actualLength){
        // -3 because we add 3 dots at the end. Returned string length has to be length including the dots.
        if(!soft)
            return escapeHtml(sb.insert(actualLength, "...").substring(0, actualLength+3));
        else {
            int endIndex = sb.indexOf(" ",actualLength);
            return escapeHtml(sb.insert(endIndex,"...").substring(0, endIndex+3));
        }
    }
    return string;
}

Update

I've changed the code so that the ... is appended in the StringBuffer, this is to prevent needless creations of String implicitly which is slow and wasteful.

Note: escapeHtml is a static import from apache commons:

import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;

You can remove it and the code should work the same.

迟到的我 2024-12-16 13:11:06

下面是一个简单的、基于正则表达式的 1 行解决方案:

str.replaceAll("(?<=.{12})\\b.*", "..."); // How easy was that!? :)

说明:

  • (?<=.{12}) 是一个否定的后面,它断言存在匹配左侧至少有 12 个字符,但它是非捕获(即零宽度)匹配
  • \b.* 匹配第一个单词边界(至少 12 个字符之后 - 上面)到最后

这是替换为“...”

这是一个测试:

public static void main(String[] args) {
    String input = "The quick brown fox jumps over the lazy dog.";
    String trimmed = input.replaceAll("(?<=.{12})\\b.*", "...");
    System.out.println(trimmed);
}

输出:

The quick brown...

来预编译正则表达式,以获得大约 5 倍的加速 (YMMV) :

static Pattern pattern = Pattern.compile("(?<=.{12})\\b.*");

如果性能是一个问题,则通过编译一次并重用它

String trimmed = pattern.matcher(input).replaceAll("...");

Here is a simple, regex-based, 1-line solution:

str.replaceAll("(?<=.{12})\\b.*", "..."); // How easy was that!? :)

Explanation:

  • (?<=.{12}) is a negative look behind, which asserts that there are at least 12 characters to the left of the match, but it is a non-capturing (ie zero-width) match
  • \b.* matches the first word boundary (after at least 12 characters - above) to the end

This is replaced with "..."

Here's a test:

public static void main(String[] args) {
    String input = "The quick brown fox jumps over the lazy dog.";
    String trimmed = input.replaceAll("(?<=.{12})\\b.*", "...");
    System.out.println(trimmed);
}

Output:

The quick brown...

If performance is an issue, pre-compile the regex for an approximately 5x speed up (YMMV) by compiling it once:

static Pattern pattern = Pattern.compile("(?<=.{12})\\b.*");

and reusing it:

String trimmed = pattern.matcher(input).replaceAll("...");
独守阴晴ぅ圆缺 2024-12-16 13:11:06

请尝试以下代码:

private String trim(String src, int size) {
    if (src.length() <= size) return src;
    int pos = src.lastIndexOf(" ", size - 3);
    if (pos < 0) return src.substring(0, size);
    return src.substring(0, pos) + "...";
}

Please try following code:

private String trim(String src, int size) {
    if (src.length() <= size) return src;
    int pos = src.lastIndexOf(" ", size - 3);
    if (pos < 0) return src.substring(0, size);
    return src.substring(0, pos) + "...";
}
雪化雨蝶 2024-12-16 13:11:06

尝试搜索最后一次出现的小于或大于 11 的空格,并通过添加“...”来修剪那里的字符串。

Try searching for the last occurence of a space that is in a position less or more than 11 and trim the string there, by adding "...".

抹茶夏天i‖ 2024-12-16 13:11:06

你的要求不明确。如果您无法用自然语言表达它们,那么它们很难翻译成 Java 等计算机语言也就不足为奇了。

“保留最后一个单词”意味着算法将知道“单词”是什么,因此您必须首先告诉它。分裂是一种方法。具有语法的扫描器/解析器是另一个。

在我关心效率之前,我会先考虑让它发挥作用。让它发挥作用,对其进行衡量,然后看看您可以对性能做些什么。其他一切都是没有数据的猜测。

Your requirements aren't clear. If you have trouble articulating them in a natural language, it's no surprise that they'll be difficult to translate into a computer language like Java.

"preserve the last word" implies that the algorithm will know what a "word" is, so you'll have to tell it that first. The split is a way to do it. A scanner/parser with a grammar is another.

I'd worry about making it work before I concerned myself with efficiency. Make it work, measure it, then see what you can do about performance. Everything else is speculation without data.

初吻给了烟 2024-12-16 13:11:06

怎么样:

mystring = mystring.replaceAll("^(.{12}.*?)\b.*$", "$1...");

How about:

mystring = mystring.replaceAll("^(.{12}.*?)\b.*$", "$1...");
梦中的蝴蝶 2024-12-16 13:11:06

我使用这个技巧:假设修剪后的字符串必须有 120 的长度:

String textToDisplay = textToTrim.substring(0,(textToTrim.length() > 120) ? 120 : textToTrim.length());

        if (textToDisplay.lastIndexOf(' ') != textToDisplay.length() &&textToDisplay.length()!=textToTrim().length()) {

            textToDisplay = textToDisplay + textToTrim.substring(textToDisplay.length(),textToTrim.indexOf(" ", textToDisplay.length()-1))+ " ...";
        }

I use this hack : suppose that the trimmed string must have 120 of length :

String textToDisplay = textToTrim.substring(0,(textToTrim.length() > 120) ? 120 : textToTrim.length());

        if (textToDisplay.lastIndexOf(' ') != textToDisplay.length() &&textToDisplay.length()!=textToTrim().length()) {

            textToDisplay = textToDisplay + textToTrim.substring(textToDisplay.length(),textToTrim.indexOf(" ", textToDisplay.length()-1))+ " ...";
        }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文