在 Java 中修剪字符串,同时保留完整单词
我需要在java中修剪一个字符串,以便:
敏捷的棕色狐狸跳过了懒惰的狗。
变成
快速棕色...
在上面的示例中,我将其修剪为 12 个字符。如果我只使用子字符串我会得到:
快速...
我已经有了一种使用子字符串执行此操作的方法,但我想知道执行此操作最快(最有效)的方法是什么,因为页面可能有许多修剪操作。
我能想到的唯一方法是将字符串按空格分开,然后将其放回一起,直到其长度超过给定长度。还有其他办法吗?也许是一种更有效的方法,我可以使用相同的方法进行“软”修剪,其中保留最后一个单词(如上例所示)和硬修剪,这几乎是一个子字符串。
谢谢,
I need to trim a String in java so that:
The quick brown fox jumps over the laz dog.
becomes
The quick brown...
In the example above, I'm trimming to 12 characters. If I just use substring I would get:
The quick br...
I already have a method for doing this using substring, but I wanted to know what is the fastest (most efficient) way to do this because a page may have many trim operations.
The only way I can think off is to split the string on spaces and put it back together until its length passes the given length. Is there an other way? Perhaps a more efficient way in which I can use the same method to do a "soft" trim where I preserve the last word (as shown in the example above) and a hard trim which is pretty much a substring.
Thanks,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
下面是我用来修剪网络应用程序中长字符串的方法。
正如您所说,“软”布尔值如果设置为 true 将保留最后一个单词。
这是我能想到的最简洁的方法,它使用 StringBuffer,它比重新创建不可变的字符串要有效得多。
更新
我已经更改了代码,以便将
...
附加到StringBuffer中,这是为了防止隐式创建不必要的String
既缓慢又浪费。注意:
escapeHtml
是来自 apache commons 的静态导入:import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
您可以将其删除并且代码应该工作相同。
Below is a method I use to trim long strings in my webapps.
The "soft"
boolean
as you put it, if set totrue
will preserve the last word.This is the most concise way of doing it that I could come up with that uses a StringBuffer which is a lot more efficient than recreating a string which is immutable.
Update
I've changed the code so that the
...
is appended in the StringBuffer, this is to prevent needless creations ofString
implicitly which is slow and wasteful.Note:
escapeHtml
is a static import from apache commons:import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
You can remove it and the code should work the same.
下面是一个简单的、基于正则表达式的 1 行解决方案:
说明:
(?<=.{12})
是一个否定的后面,它断言存在匹配左侧至少有 12 个字符,但它是非捕获(即零宽度)匹配\b.*
匹配第一个单词边界(至少 12 个字符之后 - 上面)到最后这是替换为“...”
这是一个测试:
输出:
来预编译正则表达式,以获得大约 5 倍的加速 (YMMV) :
如果性能是一个问题,则通过编译一次并重用它
Here is a simple, regex-based, 1-line solution:
Explanation:
(?<=.{12})
is a negative look behind, which asserts that there are at least 12 characters to the left of the match, but it is a non-capturing (ie zero-width) match\b.*
matches the first word boundary (after at least 12 characters - above) to the endThis is replaced with "..."
Here's a test:
Output:
If performance is an issue, pre-compile the regex for an approximately 5x speed up (YMMV) by compiling it once:
and reusing it:
请尝试以下代码:
Please try following code:
尝试搜索最后一次出现的小于或大于 11 的空格,并通过添加“...”来修剪那里的字符串。
Try searching for the last occurence of a space that is in a position less or more than 11 and trim the string there, by adding "...".
你的要求不明确。如果您无法用自然语言表达它们,那么它们很难翻译成 Java 等计算机语言也就不足为奇了。
“保留最后一个单词”意味着算法将知道“单词”是什么,因此您必须首先告诉它。分裂是一种方法。具有语法的扫描器/解析器是另一个。
在我关心效率之前,我会先考虑让它发挥作用。让它发挥作用,对其进行衡量,然后看看您可以对性能做些什么。其他一切都是没有数据的猜测。
Your requirements aren't clear. If you have trouble articulating them in a natural language, it's no surprise that they'll be difficult to translate into a computer language like Java.
"preserve the last word" implies that the algorithm will know what a "word" is, so you'll have to tell it that first. The split is a way to do it. A scanner/parser with a grammar is another.
I'd worry about making it work before I concerned myself with efficiency. Make it work, measure it, then see what you can do about performance. Everything else is speculation without data.
怎么样:
How about:
我使用这个技巧:假设修剪后的字符串必须有 120 的长度:
I use this hack : suppose that the trimmed string must have 120 of length :