Java - 将字符串分割成字符限制的句子
我想将文本拆分为句子(按 . 或 BreakIterator 拆分)。 但是:每句话不得超过100个字符。
示例:
Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores.
致:(3 个元素,不破坏一个单词,而是一个句子)
" Lorem ipsum dolor sit. ",
" Amet consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna",
" aliquyam erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores. "
我怎样才能正确地做到这一点?
I want to split a text into sentences (split by . or BreakIterator).
But: Each sentence mustn't have more than 100 characters.
Example:
Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores.
To: (3 elements, without breaking a word, but a sentence)
" Lorem ipsum dolor sit. ",
" Amet consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna",
" aliquyam erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores. "
How can I do this properly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
可能有更好的方法来做到这一点,但这里是:
“分割字符串大小”正则表达式来自 这个问题。您可能可以集成这两个正则表达式,但我不确定这是否是一个明智的想法(:
如果正则表达式不在 Andrond 中运行(
\G
运算符在任何地方都无法识别) ,尝试链接的其他解决方案来根据它的大小。There's probably a better way to do it, but here it goes:
The "split string in size" regex is from this SO question. You probably could integrate the two regex'es, but I'm not sure that would be a wise idea (:
If the regex doesn't run in Andrond (the
\G
operator is not recognized everywhere), try the other solutions linked to split a string based on its size.正则表达式在这种情况下不会给你太大帮助。
我会使用空格或
.
分割文本,然后开始连接。像这样的东西:伪代码
Regex will not help you a lot with this kind of situations.
I would split the text using spaces or
.
and afterwards start concatenating. Something like this:Pseudo code
解决了(感谢 Macarse 的启发):
Solved (thank you Macarse for the inspiration):
按照前面的解决方案,我很快遇到了一个无限循环的问题,即每个单词可能超出限制的情况(非常不可能,但不幸的是我的环境非常受限)。所以,我为这个边缘情况添加了一个修复(有点)(我认为)。
Following the previous solutions, I quickly got into a problem with an infinite loop for the case when each word may exceed the limit (very unlikely, but unfortunately I have a very constrained environment). So, I added a fix (kinda) for this edge case (I think).